Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode

2024-06-11 Thread Stefan Hajnoczi
On Tue, Jun 11, 2024 at 07:19:08AM +0200, Christoph Hellwig wrote:
> virtblk_update_cache_mode boils down to a single call to
> blk_queue_write_cache.  Remove it in preparation for moving the cache
> control flags into the queue_limits.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/virtio_blk.c | 13 +++--
>  1 file changed, 3 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[PULL 5/6] Replace "iothread lock" with "BQL" in comments

2024-01-08 Thread Stefan Hajnoczi
The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
in their names. Update the code comments to use "BQL" instead of
"iothread lock".

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Paul Durrant 
Reviewed-by: Akihiko Odaki 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Harsh Prateek Bora 
Message-id: 20240102153529.486531-5-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 docs/devel/reset.rst |  2 +-
 hw/display/qxl.h |  2 +-
 include/exec/cpu-common.h|  2 +-
 include/exec/memory.h|  4 ++--
 include/exec/ramblock.h  |  2 +-
 include/migration/register.h |  8 
 target/arm/internals.h   |  4 ++--
 accel/tcg/cputlb.c   |  4 ++--
 accel/tcg/tcg-accel-ops-icount.c |  2 +-
 hw/remote/mpqemu-link.c  |  2 +-
 migration/block-dirty-bitmap.c   | 10 +-
 migration/block.c| 22 +++---
 migration/colo.c |  2 +-
 migration/migration.c|  2 +-
 migration/ram.c  |  4 ++--
 system/physmem.c |  6 +++---
 target/arm/helper.c  |  2 +-
 ui/spice-core.c  |  2 +-
 util/rcu.c   |  2 +-
 audio/coreaudio.m|  4 ++--
 ui/cocoa.m   |  6 +++---
 21 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/docs/devel/reset.rst b/docs/devel/reset.rst
index 38ed1790f7..d4e79718ba 100644
--- a/docs/devel/reset.rst
+++ b/docs/devel/reset.rst
@@ -19,7 +19,7 @@ Triggering reset
 
 This section documents the APIs which "users" of a resettable object should use
 to control it. All resettable control functions must be called while holding
-the iothread lock.
+the BQL.
 
 You can apply a reset to an object using ``resettable_assert_reset()``. You 
need
 to call ``resettable_release_reset()`` to release the object from reset. To
diff --git a/hw/display/qxl.h b/hw/display/qxl.h
index fdac14edad..e0a85a5ca4 100644
--- a/hw/display/qxl.h
+++ b/hw/display/qxl.h
@@ -159,7 +159,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PCIQXLDevice, PCI_QXL)
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 41115d8919..fef3138d29 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -92,7 +92,7 @@ RAMBlock *qemu_ram_block_by_name(const char *name);
  *
  * By the time this function returns, the returned pointer is not protected
  * by RCU anymore.  If the caller is not within an RCU critical section and
- * does not hold the iothread lock, it must have other means of protecting the
+ * does not hold the BQL, it must have other means of protecting the
  * pointer, such as a reference to the memory region that owns the RAMBlock.
  */
 RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 48c11ca743..177be23db7 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1982,7 +1982,7 @@ int memory_region_get_fd(MemoryRegion *mr);
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
@@ -1999,7 +1999,7 @@ MemoryRegion *memory_region_from_host(void *ptr, 
ram_addr_t *offset);
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 69c6a53902..3eb79723c6 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -34,7 +34,7 @@ struct RAMBlock {
 ram_addr_t max_length;
 void (*resized)(const char*, uint64_t length, void *host);
 uint32_t flags;
-/* Protected by iothread lock.  */
+/* Protected by the BQL.  */
 char idstr[256];
 /* RCU-enabled, writes protected by the ramlist lock */
 QLIST_ENTRY(RAMBlock) next;
diff --git a/include/migration/register.h b/inclu

[PULL 6/6] Rename "QEMU global mutex" to "BQL" in comments and docs

2024-01-08 Thread Stefan Hajnoczi
The term "QEMU global mutex" is identical to the more widely used Big
QEMU Lock ("BQL"). Update the code comments and documentation to use
"BQL" instead of "QEMU global mutex".

Signed-off-by: Stefan Hajnoczi 
Acked-by: Markus Armbruster 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Paul Durrant 
Reviewed-by: Akihiko Odaki 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Harsh Prateek Bora 
Message-id: 20240102153529.486531-6-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 docs/devel/multi-thread-tcg.rst   |  7 +++
 docs/devel/qapi-code-gen.rst  |  2 +-
 docs/devel/replay.rst |  2 +-
 docs/devel/multiple-iothreads.txt | 14 +++---
 include/block/blockjob.h  |  6 +++---
 include/io/task.h |  2 +-
 include/qemu/coroutine-core.h |  2 +-
 include/qemu/coroutine.h  |  2 +-
 hw/block/dataplane/virtio-blk.c   |  8 
 hw/block/virtio-blk.c |  2 +-
 hw/scsi/virtio-scsi-dataplane.c   |  6 +++---
 net/tap.c |  2 +-
 12 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.rst b/docs/devel/multi-thread-tcg.rst
index c9541a7b20..7302c3bf53 100644
--- a/docs/devel/multi-thread-tcg.rst
+++ b/docs/devel/multi-thread-tcg.rst
@@ -226,10 +226,9 @@ instruction. This could be a future optimisation.
 Emulated hardware state
 ---
 
-Currently thanks to KVM work any access to IO memory is automatically
-protected by the global iothread mutex, also known as the BQL (Big
-QEMU Lock). Any IO region that doesn't use global mutex is expected to
-do its own locking.
+Currently thanks to KVM work any access to IO memory is automatically protected
+by the BQL (Big QEMU Lock). Any IO region that doesn't use the BQL is expected
+to do its own locking.
 
 However IO memory isn't the only way emulated hardware state can be
 modified. Some architectures have model specific registers that
diff --git a/docs/devel/qapi-code-gen.rst b/docs/devel/qapi-code-gen.rst
index 7f78183cd4..ea8228518c 100644
--- a/docs/devel/qapi-code-gen.rst
+++ b/docs/devel/qapi-code-gen.rst
@@ -594,7 +594,7 @@ blocking the guest and other background operations.
 Coroutine safety can be hard to prove, similar to thread safety.  Common
 pitfalls are:
 
-- The global mutex isn't held across ``qemu_coroutine_yield()``, so
+- The BQL isn't held across ``qemu_coroutine_yield()``, so
   operations that used to assume that they execute atomically may have
   to be more careful to protect against changes in the global state.
 
diff --git a/docs/devel/replay.rst b/docs/devel/replay.rst
index 0244be8b9c..effd856f0c 100644
--- a/docs/devel/replay.rst
+++ b/docs/devel/replay.rst
@@ -184,7 +184,7 @@ modes.
 Reading and writing requests are created by CPU thread of QEMU. Later these
 requests proceed to block layer which creates "bottom halves". Bottom
 halves consist of callback and its parameters. They are processed when
-main loop locks the global mutex. These locks are not synchronized with
+main loop locks the BQL. These locks are not synchronized with
 replaying process because main loop also processes the events that do not
 affect the virtual machine state (like user interaction with monitor).
 
diff --git a/docs/devel/multiple-iothreads.txt 
b/docs/devel/multiple-iothreads.txt
index 4865196bde..de85767b12 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -5,7 +5,7 @@ the COPYING file in the top-level directory.
 
 
 This document explains the IOThread feature and how to write code that runs
-outside the QEMU global mutex.
+outside the BQL.
 
 The main loop and IOThreads
 ---
@@ -29,13 +29,13 @@ scalability bottleneck on hosts with many CPUs.  Work can 
be spread across
 several IOThreads instead of just one main loop.  When set up correctly this
 can improve I/O latency and reduce jitter seen by the guest.
 
-The main loop is also deeply associated with the QEMU global mutex, which is a
-scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
-global mutex to serialize execution of QEMU code.  This mutex is necessary
-because a lot of QEMU's code historically was not thread-safe.
+The main loop is also deeply associated with the BQL, which is a
+scalability bottleneck in itself.  vCPU threads and the main loop use the BQL
+to serialize execution of QEMU code.  This mutex is necessary because a lot of
+QEMU's code historically was not thread-safe.
 
 The fact that all I/O processing is done in a single main loop and that the
-QEMU global mutex is contended by all vCPU threads and the main loop explain
+BQL is contended by all vCPU threads and the main loop explain
 why it is desirable to place work into IOThreads.
 
 The experimental virtio-blk data-plane implementation has been benchmarked and
@@ -66,7 +66,7 @@ There are several old A

[PULL 3/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to BQL_LOCK_GUARD

2024-01-08 Thread Stefan Hajnoczi
The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Paul Durrant 
Acked-by: David Woodhouse 
Reviewed-by: Cédric Le Goater 
Acked-by: Ilya Leoshkevich 
Reviewed-by: Harsh Prateek Bora 
Reviewed-by: Akihiko Odaki 
Message-id: 20240102153529.486531-3-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/main-loop.h  | 19 +--
 hw/i386/kvm/xen_evtchn.c  | 14 +++---
 hw/i386/kvm/xen_gnttab.c  |  2 +-
 hw/mips/mips_int.c|  2 +-
 hw/ppc/ppc.c  |  2 +-
 target/i386/kvm/xen-emu.c |  2 +-
 target/ppc/excp_helper.c  |  2 +-
 target/ppc/helper_regs.c  |  2 +-
 target/riscv/cpu_helper.c |  4 ++--
 9 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 72ebc0cb3a..c26ad2a029 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -343,33 +343,32 @@ void bql_lock_impl(const char *file, int line);
 void bql_unlock(void);
 
 /**
- * QEMU_IOTHREAD_LOCK_GUARD
+ * BQL_LOCK_GUARD
  *
  * Wrap a block of code in a conditional bql_{lock,unlock}.
  */
-typedef struct IOThreadLockAuto IOThreadLockAuto;
+typedef struct BQLLockAuto BQLLockAuto;
 
-static inline IOThreadLockAuto *qemu_iothread_auto_lock(const char *file,
-int line)
+static inline BQLLockAuto *bql_auto_lock(const char *file, int line)
 {
 if (bql_locked()) {
 return NULL;
 }
 bql_lock_impl(file, line);
 /* Anything non-NULL causes the cleanup function to be called */
-return (IOThreadLockAuto *)(uintptr_t)1;
+return (BQLLockAuto *)(uintptr_t)1;
 }
 
-static inline void qemu_iothread_auto_unlock(IOThreadLockAuto *l)
+static inline void bql_auto_unlock(BQLLockAuto *l)
 {
 bql_unlock();
 }
 
-G_DEFINE_AUTOPTR_CLEANUP_FUNC(IOThreadLockAuto, qemu_iothread_auto_unlock)
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(BQLLockAuto, bql_auto_unlock)
 
-#define QEMU_IOTHREAD_LOCK_GUARD() \
-g_autoptr(IOThreadLockAuto) _iothread_lock_auto __attribute__((unused)) \
-= qemu_iothread_auto_lock(__FILE__, __LINE__)
+#define BQL_LOCK_GUARD() \
+g_autoptr(BQLLockAuto) _bql_lock_auto __attribute__((unused)) \
+= bql_auto_lock(__FILE__, __LINE__)
 
 /*
  * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 4a835a1010..0171ef6d59 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1127,7 +1127,7 @@ int xen_evtchn_reset_op(struct evtchn_reset *reset)
 return -ESRCH;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 return xen_evtchn_soft_reset();
 }
 
@@ -1145,7 +1145,7 @@ int xen_evtchn_close_op(struct evtchn_close *close)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 qemu_mutex_lock(&s->port_lock);
 
 ret = close_port(s, close->port, &flush_kvm_routes);
@@ -1272,7 +1272,7 @@ int xen_evtchn_bind_pirq_op(struct evtchn_bind_pirq *pirq)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 
 if (s->pirq[pirq->pirq].port) {
 return -EBUSY;
@@ -1824,7 +1824,7 @@ int xen_physdev_map_pirq(struct physdev_map_pirq *map)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (map->domid != DOMID_SELF && map->domid != xen_domid) {
@@ -1884,7 +1884,7 @@ int xen_physdev_unmap_pirq(struct physdev_unmap_pirq 
*unmap)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 qemu_mutex_lock(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
@@ -1924,7 +1924,7 @@ int xen_physdev_eoi_pirq(struct physdev_eoi *eoi)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
@@ -1956,7 +1956,7 @@ int xen_physdev_query_pirq(struct 
physdev_irq_status_query *query)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index a0cc30f619..245e4b15db 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -176,7 +176,7 @@ int xen_gnttab_map_page(uint64_t idx, uint64_t gfn)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->gnt_lock);
 
 xen_overlay_do_map_page(&s->gnt_aliases[idx], gpa);
diff --git a/hw/mips/mips_int.c b/hw/mips/mips_int.c
index 6c32e466a3..eef2fd2cd1 100644
--- a/hw/mips/mips_int.c
+++ b/hw/mips/mips_int.c
@@ -36,7 +36,7 

[PULL 4/6] qemu/main-loop: rename qemu_cond_wait_iothread() to qemu_cond_wait_bql()

2024-01-08 Thread Stefan Hajnoczi
The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Paul Durrant 
Reviewed-by: Harsh Prateek Bora 
Reviewed-by: Akihiko Odaki 
Message-id: 20240102153529.486531-4-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/main-loop.h  | 10 +-
 accel/tcg/tcg-accel-ops-rr.c  |  4 ++--
 hw/display/virtio-gpu.c   |  2 +-
 hw/ppc/spapr_events.c |  2 +-
 system/cpu-throttle.c |  2 +-
 system/cpus.c |  4 ++--
 target/i386/nvmm/nvmm-accel-ops.c |  2 +-
 target/i386/whpx/whpx-accel-ops.c |  2 +-
 8 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index c26ad2a029..5764db157c 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -371,17 +371,17 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(BQLLockAuto, 
bql_auto_unlock)
 = bql_auto_lock(__FILE__, __LINE__)
 
 /*
- * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
+ * qemu_cond_wait_bql: Wait on condition for the Big QEMU Lock (BQL)
  *
- * This function atomically releases the main loop mutex and causes
+ * This function atomically releases the Big QEMU Lock (BQL) and causes
  * the calling thread to block on the condition.
  */
-void qemu_cond_wait_iothread(QemuCond *cond);
+void qemu_cond_wait_bql(QemuCond *cond);
 
 /*
- * qemu_cond_timedwait_iothread: like the previous, but with timeout
+ * qemu_cond_timedwait_bql: like the previous, but with timeout
  */
-void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
+void qemu_cond_timedwait_bql(QemuCond *cond, int ms);
 
 /* internal interfaces */
 
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index c4ea372a3f..5794e5a9ce 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -111,7 +111,7 @@ static void rr_wait_io_event(void)
 
 while (all_cpu_threads_idle()) {
 rr_stop_kick_timer();
-qemu_cond_wait_iothread(first_cpu->halt_cond);
+qemu_cond_wait_bql(first_cpu->halt_cond);
 }
 
 rr_start_kick_timer();
@@ -198,7 +198,7 @@ static void *rr_cpu_thread_fn(void *arg)
 
 /* wait for initial kick-off after machine start */
 while (first_cpu->stopped) {
-qemu_cond_wait_iothread(first_cpu->halt_cond);
+qemu_cond_wait_bql(first_cpu->halt_cond);
 
 /* process any pending work */
 CPU_FOREACH(cpu) {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index bae1c2a803..f8a675eb30 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1512,7 +1512,7 @@ void virtio_gpu_reset(VirtIODevice *vdev)
 g->reset_finished = false;
 qemu_bh_schedule(g->reset_bh);
 while (!g->reset_finished) {
-qemu_cond_wait_iothread(&g->reset_cond);
+qemu_cond_wait_bql(&g->reset_cond);
 }
 } else {
 virtio_gpu_reset_bh(g);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index deb4641505..cb0587 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -899,7 +899,7 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 }
 return;
 }
-qemu_cond_wait_iothread(&spapr->fwnmi_machine_check_interlock_cond);
+qemu_cond_wait_bql(&spapr->fwnmi_machine_check_interlock_cond);
 if (spapr->fwnmi_machine_check_addr == -1) {
 /*
  * If the machine was reset while waiting for the interlock,
diff --git a/system/cpu-throttle.c b/system/cpu-throttle.c
index 786a9a5639..c951a6c65e 100644
--- a/system/cpu-throttle.c
+++ b/system/cpu-throttle.c
@@ -54,7 +54,7 @@ static void cpu_throttle_thread(CPUState *cpu, 
run_on_cpu_data opaque)
 endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
 while (sleeptime_ns > 0 && !cpu->stop) {
 if (sleeptime_ns > SCALE_MS) {
-qemu_cond_timedwait_iothread(cpu->halt_cond,
+qemu_cond_timedwait_bql(cpu->halt_cond,
  sleeptime_ns / SCALE_MS);
 } else {
 bql_unlock();
diff --git a/system/cpus.c b/system/cpus.c
index 1ede629f1f..68d161d96b 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -533,12 +533,12 @@ void bql_unlock(void)
 qemu_mutex_unlock(&bql);
 }
 
-void qemu_cond_wait_iothread(QemuCond *cond)
+void qemu_cond_wait_bql(QemuCond *cond)
 {
 qemu_cond_wait(cond, &bql);
 }
 
-void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
+void qemu_cond_timedwait_bql(QemuCond *cond, int ms)
 {
 qemu_cond_timedwait(cond, &bql, ms);
 }
diff --git a/target/i386/nvmm/nvmm-accel-ops.c 
b/target/i386/nvmm/nvmm-accel-ops.c
index f9d5e9a37a..6b2bfd

[PULL 2/6] system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()

2024-01-08 Thread Stefan Hajnoczi
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Paul Durrant 
Acked-by: Fabiano Rosas 
Acked-by: David Woodhouse 
Reviewed-by: Cédric Le Goater 
Acked-by: Peter Xu 
Acked-by: Eric Farman 
Reviewed-by: Harsh Prateek Bora 
Acked-by: Hyman Huang 
Reviewed-by: Akihiko Odaki 
Message-id: 20240102153529.486531-2-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio-wait.h |   2 +-
 include/qemu/main-loop.h |  39 +
 include/qemu/thread.h|   2 +-
 accel/accel-blocker.c|  10 +--
 accel/dummy-cpus.c   |   8 +-
 accel/hvf/hvf-accel-ops.c|   4 +-
 accel/kvm/kvm-accel-ops.c|   4 +-
 accel/kvm/kvm-all.c  |  22 ++---
 accel/tcg/cpu-exec.c |  26 +++---
 accel/tcg/cputlb.c   |  16 ++--
 accel/tcg/tcg-accel-ops-icount.c |   4 +-
 accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
 accel/tcg/tcg-accel-ops-rr.c |  14 ++--
 accel/tcg/tcg-accel-ops.c|   2 +-
 accel/tcg/translate-all.c|   2 +-
 cpu-common.c |   4 +-
 dump/dump.c  |   4 +-
 hw/core/cpu-common.c |   6 +-
 hw/i386/intel_iommu.c|   6 +-
 hw/i386/kvm/xen_evtchn.c |  16 ++--
 hw/i386/kvm/xen_overlay.c|   2 +-
 hw/i386/kvm/xen_xenstore.c   |   2 +-
 hw/intc/arm_gicv3_cpuif.c|   2 +-
 hw/intc/s390_flic.c  |  18 ++--
 hw/misc/edu.c|   4 +-
 hw/misc/imx6_src.c   |   2 +-
 hw/misc/imx7_src.c   |   2 +-
 hw/net/xen_nic.c |   8 +-
 hw/ppc/pegasos2.c|   2 +-
 hw/ppc/ppc.c |   4 +-
 hw/ppc/spapr.c   |   2 +-
 hw/ppc/spapr_rng.c   |   4 +-
 hw/ppc/spapr_softmmu.c   |   4 +-
 hw/remote/mpqemu-link.c  |  20 ++---
 hw/remote/vfio-user-obj.c|   2 +-
 hw/s390x/s390-skeys.c|   2 +-
 migration/block-dirty-bitmap.c   |   4 +-
 migration/block.c|  16 ++--
 migration/colo.c |  60 +++---
 migration/dirtyrate.c|  12 +--
 migration/migration.c|  52 ++--
 migration/ram.c  |  12 +--
 replay/replay-internal.c |   2 +-
 semihosting/console.c|   8 +-
 stubs/iothread-lock.c|   6 +-
 system/cpu-throttle.c|   4 +-
 system/cpus.c|  51 ++--
 system/dirtylimit.c  |   4 +-
 system/memory.c  |   2 +-
 system/physmem.c |   8 +-
 system/runstate.c|   2 +-
 system/watchpoint.c  |   4 +-
 target/arm/arm-powerctl.c|  14 ++--
 target/arm/helper.c  |   4 +-
 target/arm/hvf/hvf.c |   8 +-
 target/arm/kvm.c |   8 +-
 target/arm/ptw.c |   6 +-
 target/arm/tcg/helper-a64.c  |   8 +-
 target/arm/tcg/m_helper.c|   6 +-
 target/arm/tcg/op_helper.c   |  24 +++---
 target/arm/tcg/psci.c|   2 +-
 target/hppa/int_helper.c |   8 +-
 target/i386/hvf/hvf.c|   6 +-
 target/i386/kvm/hyperv.c |   4 +-
 target/i386/kvm/kvm.c|  28 +++
 target/i386/kvm/xen-emu.c|  14 ++--
 target/i386/nvmm/nvmm-accel-ops.c|   4 +-
 target/i386/nvmm/nvmm-all.c  |  20 ++---
 target/i386/tcg/sysemu/fpu_helper.c  |   6 +-
 target/i386/tcg/sysemu/misc_helper.c |   4 +-
 target/i386/whpx/whpx-accel-ops.c|   4 +-
 target/i386/whpx/whpx-all.c  |  24 +++---
 target/loongarch/tcg/csr_helper.c|   4 +-
 target/mips/kvm.c|   4 +-
 target/mips/tcg/sysemu/cp0_helper.c  |   4 +-
 target/openrisc/sys_helper.c |  16 ++--
 target/ppc/ex

[PULL 1/6] iothread: Remove unused Error** argument in aio_context_set_aio_params

2024-01-08 Thread Stefan Hajnoczi
From: Philippe Mathieu-Daudé 

aio_context_set_aio_params() doesn't use its undocumented
Error** argument. Remove it to simplify.

Note this removes a use of "unchecked Error**" in
iothread_set_aio_context_params().

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Markus Armbruster 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20231120171806.19361-1-phi...@linaro.org>
---
 include/block/aio.h | 3 +--
 iothread.c  | 3 +--
 util/aio-posix.c| 3 +--
 util/aio-win32.c| 3 +--
 util/main-loop.c| 5 +
 5 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index af05512a7d..c802a392e5 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -699,8 +699,7 @@ void aio_context_set_poll_params(AioContext *ctx, int64_t 
max_ns,
  * @max_batch: maximum number of requests in a batch, 0 means that the
  * engine will use its default
  */
-void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
-Error **errp);
+void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch);
 
 /**
  * aio_context_set_thread_pool_params:
diff --git a/iothread.c b/iothread.c
index b753286414..6c1fc8c856 100644
--- a/iothread.c
+++ b/iothread.c
@@ -170,8 +170,7 @@ static void iothread_set_aio_context_params(EventLoopBase 
*base, Error **errp)
 }
 
 aio_context_set_aio_params(iothread->ctx,
-   iothread->parent_obj.aio_max_batch,
-   errp);
+   iothread->parent_obj.aio_max_batch);
 
 aio_context_set_thread_pool_params(iothread->ctx, base->thread_pool_min,
base->thread_pool_max, errp);
diff --git a/util/aio-posix.c b/util/aio-posix.c
index 7f2c99729d..266c9dd35f 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -777,8 +777,7 @@ void aio_context_set_poll_params(AioContext *ctx, int64_t 
max_ns,
 aio_notify(ctx);
 }
 
-void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
-Error **errp)
+void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch)
 {
 /*
  * No thread synchronization here, it doesn't matter if an incorrect value
diff --git a/util/aio-win32.c b/util/aio-win32.c
index 948ef47a4d..d144f9391f 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -438,7 +438,6 @@ void aio_context_set_poll_params(AioContext *ctx, int64_t 
max_ns,
 }
 }
 
-void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
-Error **errp)
+void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch)
 {
 }
diff --git a/util/main-loop.c b/util/main-loop.c
index 797b640c41..63b4cda84a 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -192,10 +192,7 @@ static void main_loop_update_params(EventLoopBase *base, 
Error **errp)
 return;
 }
 
-aio_context_set_aio_params(qemu_aio_context, base->aio_max_batch, errp);
-if (*errp) {
-return;
-}
+aio_context_set_aio_params(qemu_aio_context, base->aio_max_batch);
 
 aio_context_set_thread_pool_params(qemu_aio_context, base->thread_pool_min,
base->thread_pool_max, errp);
-- 
2.43.0




[PULL 0/6] Block patches

2024-01-08 Thread Stefan Hajnoczi
The following changes since commit ffd454c67e38cc6df792733ebc5d967eee28ac0d:

  Merge tag 'pull-vfio-20240107' of https://github.com/legoater/qemu into 
staging (2024-01-08 10:28:42 +)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 0b2675c473f68f13bc5ca1dd1c43ce421542e7b8:

  Rename "QEMU global mutex" to "BQL" in comments and docs (2024-01-08 10:45:43 
-0500)


Pull request



Philippe Mathieu-Daudé (1):
  iothread: Remove unused Error** argument in aio_context_set_aio_params

Stefan Hajnoczi (5):
  system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
  qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to BQL_LOCK_GUARD
  qemu/main-loop: rename qemu_cond_wait_iothread() to
qemu_cond_wait_bql()
  Replace "iothread lock" with "BQL" in comments
  Rename "QEMU global mutex" to "BQL" in comments and docs

 docs/devel/multi-thread-tcg.rst  |   7 +-
 docs/devel/qapi-code-gen.rst |   2 +-
 docs/devel/replay.rst|   2 +-
 docs/devel/reset.rst |   2 +-
 docs/devel/multiple-iothreads.txt|  14 ++--
 hw/display/qxl.h |   2 +-
 include/block/aio-wait.h |   2 +-
 include/block/aio.h  |   3 +-
 include/block/blockjob.h |   6 +-
 include/exec/cpu-common.h|   2 +-
 include/exec/memory.h|   4 +-
 include/exec/ramblock.h  |   2 +-
 include/io/task.h|   2 +-
 include/migration/register.h |   8 +-
 include/qemu/coroutine-core.h|   2 +-
 include/qemu/coroutine.h |   2 +-
 include/qemu/main-loop.h |  68 ---
 include/qemu/thread.h|   2 +-
 target/arm/internals.h   |   4 +-
 accel/accel-blocker.c|  10 +--
 accel/dummy-cpus.c   |   8 +-
 accel/hvf/hvf-accel-ops.c|   4 +-
 accel/kvm/kvm-accel-ops.c|   4 +-
 accel/kvm/kvm-all.c  |  22 ++---
 accel/tcg/cpu-exec.c |  26 +++---
 accel/tcg/cputlb.c   |  20 ++---
 accel/tcg/tcg-accel-ops-icount.c |   6 +-
 accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
 accel/tcg/tcg-accel-ops-rr.c |  18 ++--
 accel/tcg/tcg-accel-ops.c|   2 +-
 accel/tcg/translate-all.c|   2 +-
 cpu-common.c |   4 +-
 dump/dump.c  |   4 +-
 hw/block/dataplane/virtio-blk.c  |   8 +-
 hw/block/virtio-blk.c|   2 +-
 hw/core/cpu-common.c |   6 +-
 hw/display/virtio-gpu.c  |   2 +-
 hw/i386/intel_iommu.c|   6 +-
 hw/i386/kvm/xen_evtchn.c |  30 +++
 hw/i386/kvm/xen_gnttab.c |   2 +-
 hw/i386/kvm/xen_overlay.c|   2 +-
 hw/i386/kvm/xen_xenstore.c   |   2 +-
 hw/intc/arm_gicv3_cpuif.c|   2 +-
 hw/intc/s390_flic.c  |  18 ++--
 hw/mips/mips_int.c   |   2 +-
 hw/misc/edu.c|   4 +-
 hw/misc/imx6_src.c   |   2 +-
 hw/misc/imx7_src.c   |   2 +-
 hw/net/xen_nic.c |   8 +-
 hw/ppc/pegasos2.c|   2 +-
 hw/ppc/ppc.c |   6 +-
 hw/ppc/spapr.c   |   2 +-
 hw/ppc/spapr_events.c|   2 +-
 hw/ppc/spapr_rng.c   |   4 +-
 hw/ppc/spapr_softmmu.c   |   4 +-
 hw/remote/mpqemu-link.c  |  22 ++---
 hw/remote/vfio-user-obj.c|   2 +-
 hw/s390x/s390-skeys.c|   2 +-
 hw/scsi/virtio-scsi-dataplane.c  |   6 +-
 iothread.c   |   3 +-
 migration/block-dirty-bitmap.c   |  14 ++--
 migration/block.c|  38 -
 migration/colo.c |  62 +++---
 migration/dirtyrate.c|  12 +--
 migration/migration.c|  54 ++--
 migration/ram.c  |  16 ++--
 net/tap.c|   2 +-
 replay/replay-internal.c |   2 +-
 semihosting/console.c|   8 +-
 stubs/iothread-lock.c|   6 +-
 system/cpu-throttle.c|   6 +-
 system/cpus.c|  55 +++--
 system/dirtylimit.c  |   4 +-
 system/memory.c  |   2 +-
 system/physmem.c |  14 ++--
 system/runstate.c|   2 +-
 system/watchpoint.c  |   4 +-
 target/arm/arm-powerctl.c|  14 ++--
 target/arm/helper.c  |   6 +-
 target/arm/hvf/hvf.c |   8 +-
 target/arm/kvm.c  

Re: [PATCH v3 0/5] Make Big QEMU Lock naming consistent

2024-01-08 Thread Stefan Hajnoczi
On Tue, Jan 02, 2024 at 10:35:24AM -0500, Stefan Hajnoczi wrote:
> v3:
> - Rebase
> - Define bql_lock() macro on a single line [Akihiko Odaki]
> v2:
> - Rename APIs bql_*() [PeterX]
> - Spell out "Big QEMU Lock (BQL)" in doc comments [PeterX]
> - Rename "iolock" variables in hw/remote/mpqemu-link.c [Harsh]
> - Fix bql_auto_lock() indentation in Patch 2 [Ilya]
> - "with BQL taken" -> "with the BQL taken" [Philippe]
> - "under BQL" -> "under the BQL" [Philippe]
> 
> The Big QEMU Lock ("BQL") has two other names: "iothread lock" and "QEMU 
> global
> mutex". The term "iothread lock" is easily confused with the unrelated 
> --object
> iothread (iothread.c).
> 
> This series updates the code and documentation to consistently use "BQL". This
> makes the code easier to understand.
> 
> Stefan Hajnoczi (5):
>   system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
>   qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to BQL_LOCK_GUARD
>   qemu/main-loop: rename qemu_cond_wait_iothread() to
> qemu_cond_wait_bql()
>   Replace "iothread lock" with "BQL" in comments
>   Rename "QEMU global mutex" to "BQL" in comments and docs
> 
>  docs/devel/multi-thread-tcg.rst  |   7 +-
>  docs/devel/qapi-code-gen.rst |   2 +-
>  docs/devel/replay.rst|   2 +-
>  docs/devel/reset.rst |   2 +-
>  docs/devel/multiple-iothreads.txt|  14 ++--
>  hw/display/qxl.h |   2 +-
>  include/block/aio-wait.h |   2 +-
>  include/block/blockjob.h |   6 +-
>  include/exec/cpu-common.h|   2 +-
>  include/exec/memory.h|   4 +-
>  include/exec/ramblock.h  |   2 +-
>  include/io/task.h|   2 +-
>  include/migration/register.h |   8 +-
>  include/qemu/coroutine-core.h|   2 +-
>  include/qemu/coroutine.h |   2 +-
>  include/qemu/main-loop.h |  68 ---
>  include/qemu/thread.h|   2 +-
>  target/arm/internals.h   |   4 +-
>  accel/accel-blocker.c|  10 +--
>  accel/dummy-cpus.c   |   8 +-
>  accel/hvf/hvf-accel-ops.c|   4 +-
>  accel/kvm/kvm-accel-ops.c|   4 +-
>  accel/kvm/kvm-all.c  |  22 ++---
>  accel/tcg/cpu-exec.c |  26 +++---
>  accel/tcg/cputlb.c   |  20 ++---
>  accel/tcg/tcg-accel-ops-icount.c |   6 +-
>  accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
>  accel/tcg/tcg-accel-ops-rr.c |  18 ++--
>  accel/tcg/tcg-accel-ops.c|   2 +-
>  accel/tcg/translate-all.c|   2 +-
>  cpu-common.c |   4 +-
>  dump/dump.c  |   4 +-
>  hw/block/dataplane/virtio-blk.c  |   8 +-
>  hw/block/virtio-blk.c|   2 +-
>  hw/core/cpu-common.c |   6 +-
>  hw/display/virtio-gpu.c  |   2 +-
>  hw/i386/intel_iommu.c|   6 +-
>  hw/i386/kvm/xen_evtchn.c |  30 +++
>  hw/i386/kvm/xen_gnttab.c |   2 +-
>  hw/i386/kvm/xen_overlay.c|   2 +-
>  hw/i386/kvm/xen_xenstore.c   |   2 +-
>  hw/intc/arm_gicv3_cpuif.c|   2 +-
>  hw/intc/s390_flic.c  |  18 ++--
>  hw/mips/mips_int.c   |   2 +-
>  hw/misc/edu.c|   4 +-
>  hw/misc/imx6_src.c   |   2 +-
>  hw/misc/imx7_src.c   |   2 +-
>  hw/net/xen_nic.c |   8 +-
>  hw/ppc/pegasos2.c|   2 +-
>  hw/ppc/ppc.c |   6 +-
>  hw/ppc/spapr.c   |   2 +-
>  hw/ppc/spapr_events.c|   2 +-
>  hw/ppc/spapr_rng.c   |   4 +-
>  hw/ppc/spapr_softmmu.c   |   4 +-
>  hw/remote/mpqemu-link.c  |  22 ++---
>  hw/remote/vfio-user-obj.c|   2 +-
>  hw/s390x/s390-skeys.c|   2 +-
>  hw/scsi/virtio-scsi-dataplane.c  |   6 +-
>  migration/block-dirty-bitmap.c   |  14 ++--
>  migration/block.c|  38 -
>  migration/colo.c |  62 +++---
>  migration/dirtyrate.c|  12 +--
>  migration/migration.c|  54 ++--
>  migration/ram.c  |  16 ++--
>  net/tap.c|   2 +-
>  replay/replay-internal.c |   2 +-
>  semihosting/console.c|   8 +-
>  stubs/iothread-lock.c  

[PATCH v3 1/5] system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()

2024-01-02 Thread Stefan Hajnoczi
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Paul Durrant 
Acked-by: Fabiano Rosas 
Acked-by: David Woodhouse 
Reviewed-by: Cédric Le Goater 
Acked-by: Peter Xu 
Acked-by: Eric Farman 
Reviewed-by: Harsh Prateek Bora 
Acked-by: Hyman Huang 
---
 include/block/aio-wait.h |   2 +-
 include/qemu/main-loop.h |  39 +
 include/qemu/thread.h|   2 +-
 accel/accel-blocker.c|  10 +--
 accel/dummy-cpus.c   |   8 +-
 accel/hvf/hvf-accel-ops.c|   4 +-
 accel/kvm/kvm-accel-ops.c|   4 +-
 accel/kvm/kvm-all.c  |  22 ++---
 accel/tcg/cpu-exec.c |  26 +++---
 accel/tcg/cputlb.c   |  16 ++--
 accel/tcg/tcg-accel-ops-icount.c |   4 +-
 accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
 accel/tcg/tcg-accel-ops-rr.c |  14 ++--
 accel/tcg/tcg-accel-ops.c|   2 +-
 accel/tcg/translate-all.c|   2 +-
 cpu-common.c |   4 +-
 dump/dump.c  |   4 +-
 hw/core/cpu-common.c |   6 +-
 hw/i386/intel_iommu.c|   6 +-
 hw/i386/kvm/xen_evtchn.c |  16 ++--
 hw/i386/kvm/xen_overlay.c|   2 +-
 hw/i386/kvm/xen_xenstore.c   |   2 +-
 hw/intc/arm_gicv3_cpuif.c|   2 +-
 hw/intc/s390_flic.c  |  18 ++--
 hw/misc/edu.c|   4 +-
 hw/misc/imx6_src.c   |   2 +-
 hw/misc/imx7_src.c   |   2 +-
 hw/net/xen_nic.c |   8 +-
 hw/ppc/pegasos2.c|   2 +-
 hw/ppc/ppc.c |   4 +-
 hw/ppc/spapr.c   |   2 +-
 hw/ppc/spapr_rng.c   |   4 +-
 hw/ppc/spapr_softmmu.c   |   4 +-
 hw/remote/mpqemu-link.c  |  20 ++---
 hw/remote/vfio-user-obj.c|   2 +-
 hw/s390x/s390-skeys.c|   2 +-
 migration/block-dirty-bitmap.c   |   4 +-
 migration/block.c|  16 ++--
 migration/colo.c |  60 +++---
 migration/dirtyrate.c|  12 +--
 migration/migration.c|  52 ++--
 migration/ram.c  |  12 +--
 replay/replay-internal.c |   2 +-
 semihosting/console.c|   8 +-
 stubs/iothread-lock.c|   6 +-
 system/cpu-throttle.c|   4 +-
 system/cpus.c|  51 ++--
 system/dirtylimit.c  |   4 +-
 system/memory.c  |   2 +-
 system/physmem.c |   8 +-
 system/runstate.c|   2 +-
 system/watchpoint.c  |   4 +-
 target/arm/arm-powerctl.c|  14 ++--
 target/arm/helper.c  |   4 +-
 target/arm/hvf/hvf.c |   8 +-
 target/arm/kvm.c |   8 +-
 target/arm/ptw.c |   6 +-
 target/arm/tcg/helper-a64.c  |   8 +-
 target/arm/tcg/m_helper.c|   6 +-
 target/arm/tcg/op_helper.c   |  24 +++---
 target/arm/tcg/psci.c|   2 +-
 target/hppa/int_helper.c |   8 +-
 target/i386/hvf/hvf.c|   6 +-
 target/i386/kvm/hyperv.c |   4 +-
 target/i386/kvm/kvm.c|  28 +++
 target/i386/kvm/xen-emu.c|  14 ++--
 target/i386/nvmm/nvmm-accel-ops.c|   4 +-
 target/i386/nvmm/nvmm-all.c  |  20 ++---
 target/i386/tcg/sysemu/fpu_helper.c  |   6 +-
 target/i386/tcg/sysemu/misc_helper.c |   4 +-
 target/i386/whpx/whpx-accel-ops.c|   4 +-
 target/i386/whpx/whpx-all.c  |  24 +++---
 target/loongarch/csr_helper.c|   4 +-
 target/mips/kvm.c|   4 +-
 target/mips/tcg/sysemu/cp0_helper.c  |   4 +-
 target/openrisc/sys_helper.c |  16 ++--
 target/ppc/excp_helper.c |  12 +--
 target/ppc/kvm.c |   4 +-
 target/ppc/misc_helper.c  

[PATCH v3 5/5] Rename "QEMU global mutex" to "BQL" in comments and docs

2024-01-02 Thread Stefan Hajnoczi
The term "QEMU global mutex" is identical to the more widely used Big
QEMU Lock ("BQL"). Update the code comments and documentation to use
"BQL" instead of "QEMU global mutex".

Signed-off-by: Stefan Hajnoczi 
Acked-by: Markus Armbruster 
Reviewed-by: Philippe Mathieu-Daudé 
---
 docs/devel/multi-thread-tcg.rst   |  7 +++
 docs/devel/qapi-code-gen.rst  |  2 +-
 docs/devel/replay.rst |  2 +-
 docs/devel/multiple-iothreads.txt | 14 +++---
 include/block/blockjob.h  |  6 +++---
 include/io/task.h |  2 +-
 include/qemu/coroutine-core.h |  2 +-
 include/qemu/coroutine.h  |  2 +-
 hw/block/dataplane/virtio-blk.c   |  8 
 hw/block/virtio-blk.c |  2 +-
 hw/scsi/virtio-scsi-dataplane.c   |  6 +++---
 net/tap.c |  2 +-
 12 files changed, 27 insertions(+), 28 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.rst b/docs/devel/multi-thread-tcg.rst
index c9541a7b20..7302c3bf53 100644
--- a/docs/devel/multi-thread-tcg.rst
+++ b/docs/devel/multi-thread-tcg.rst
@@ -226,10 +226,9 @@ instruction. This could be a future optimisation.
 Emulated hardware state
 ---
 
-Currently thanks to KVM work any access to IO memory is automatically
-protected by the global iothread mutex, also known as the BQL (Big
-QEMU Lock). Any IO region that doesn't use global mutex is expected to
-do its own locking.
+Currently thanks to KVM work any access to IO memory is automatically protected
+by the BQL (Big QEMU Lock). Any IO region that doesn't use the BQL is expected
+to do its own locking.
 
 However IO memory isn't the only way emulated hardware state can be
 modified. Some architectures have model specific registers that
diff --git a/docs/devel/qapi-code-gen.rst b/docs/devel/qapi-code-gen.rst
index 7f78183cd4..ea8228518c 100644
--- a/docs/devel/qapi-code-gen.rst
+++ b/docs/devel/qapi-code-gen.rst
@@ -594,7 +594,7 @@ blocking the guest and other background operations.
 Coroutine safety can be hard to prove, similar to thread safety.  Common
 pitfalls are:
 
-- The global mutex isn't held across ``qemu_coroutine_yield()``, so
+- The BQL isn't held across ``qemu_coroutine_yield()``, so
   operations that used to assume that they execute atomically may have
   to be more careful to protect against changes in the global state.
 
diff --git a/docs/devel/replay.rst b/docs/devel/replay.rst
index 0244be8b9c..effd856f0c 100644
--- a/docs/devel/replay.rst
+++ b/docs/devel/replay.rst
@@ -184,7 +184,7 @@ modes.
 Reading and writing requests are created by CPU thread of QEMU. Later these
 requests proceed to block layer which creates "bottom halves". Bottom
 halves consist of callback and its parameters. They are processed when
-main loop locks the global mutex. These locks are not synchronized with
+main loop locks the BQL. These locks are not synchronized with
 replaying process because main loop also processes the events that do not
 affect the virtual machine state (like user interaction with monitor).
 
diff --git a/docs/devel/multiple-iothreads.txt 
b/docs/devel/multiple-iothreads.txt
index 4865196bde..de85767b12 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -5,7 +5,7 @@ the COPYING file in the top-level directory.
 
 
 This document explains the IOThread feature and how to write code that runs
-outside the QEMU global mutex.
+outside the BQL.
 
 The main loop and IOThreads
 ---
@@ -29,13 +29,13 @@ scalability bottleneck on hosts with many CPUs.  Work can 
be spread across
 several IOThreads instead of just one main loop.  When set up correctly this
 can improve I/O latency and reduce jitter seen by the guest.
 
-The main loop is also deeply associated with the QEMU global mutex, which is a
-scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
-global mutex to serialize execution of QEMU code.  This mutex is necessary
-because a lot of QEMU's code historically was not thread-safe.
+The main loop is also deeply associated with the BQL, which is a
+scalability bottleneck in itself.  vCPU threads and the main loop use the BQL
+to serialize execution of QEMU code.  This mutex is necessary because a lot of
+QEMU's code historically was not thread-safe.
 
 The fact that all I/O processing is done in a single main loop and that the
-QEMU global mutex is contended by all vCPU threads and the main loop explain
+BQL is contended by all vCPU threads and the main loop explain
 why it is desirable to place work into IOThreads.
 
 The experimental virtio-blk data-plane implementation has been benchmarked and
@@ -66,7 +66,7 @@ There are several old APIs that use the main loop AioContext:
 
 Since they implicitly work on the main loop they cannot be used in code that
 runs in an IOThread.  They might cause a crash or deadlock if called from an
-

[PATCH v3 2/5] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to BQL_LOCK_GUARD

2024-01-02 Thread Stefan Hajnoczi
The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Paul Durrant 
Acked-by: David Woodhouse 
Reviewed-by: Cédric Le Goater 
Acked-by: Ilya Leoshkevich 
---
 include/qemu/main-loop.h  | 19 +--
 hw/i386/kvm/xen_evtchn.c  | 14 +++---
 hw/i386/kvm/xen_gnttab.c  |  2 +-
 hw/mips/mips_int.c|  2 +-
 hw/ppc/ppc.c  |  2 +-
 target/i386/kvm/xen-emu.c |  2 +-
 target/ppc/excp_helper.c  |  2 +-
 target/ppc/helper_regs.c  |  2 +-
 target/riscv/cpu_helper.c |  4 ++--
 9 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 72ebc0cb3a..c26ad2a029 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -343,33 +343,32 @@ void bql_lock_impl(const char *file, int line);
 void bql_unlock(void);
 
 /**
- * QEMU_IOTHREAD_LOCK_GUARD
+ * BQL_LOCK_GUARD
  *
  * Wrap a block of code in a conditional bql_{lock,unlock}.
  */
-typedef struct IOThreadLockAuto IOThreadLockAuto;
+typedef struct BQLLockAuto BQLLockAuto;
 
-static inline IOThreadLockAuto *qemu_iothread_auto_lock(const char *file,
-int line)
+static inline BQLLockAuto *bql_auto_lock(const char *file, int line)
 {
 if (bql_locked()) {
 return NULL;
 }
 bql_lock_impl(file, line);
 /* Anything non-NULL causes the cleanup function to be called */
-return (IOThreadLockAuto *)(uintptr_t)1;
+return (BQLLockAuto *)(uintptr_t)1;
 }
 
-static inline void qemu_iothread_auto_unlock(IOThreadLockAuto *l)
+static inline void bql_auto_unlock(BQLLockAuto *l)
 {
 bql_unlock();
 }
 
-G_DEFINE_AUTOPTR_CLEANUP_FUNC(IOThreadLockAuto, qemu_iothread_auto_unlock)
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(BQLLockAuto, bql_auto_unlock)
 
-#define QEMU_IOTHREAD_LOCK_GUARD() \
-g_autoptr(IOThreadLockAuto) _iothread_lock_auto __attribute__((unused)) \
-= qemu_iothread_auto_lock(__FILE__, __LINE__)
+#define BQL_LOCK_GUARD() \
+g_autoptr(BQLLockAuto) _bql_lock_auto __attribute__((unused)) \
+= bql_auto_lock(__FILE__, __LINE__)
 
 /*
  * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index d7d15cfaf7..bd077eda6d 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1127,7 +1127,7 @@ int xen_evtchn_reset_op(struct evtchn_reset *reset)
 return -ESRCH;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 return xen_evtchn_soft_reset();
 }
 
@@ -1145,7 +1145,7 @@ int xen_evtchn_close_op(struct evtchn_close *close)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 qemu_mutex_lock(&s->port_lock);
 
 ret = close_port(s, close->port, &flush_kvm_routes);
@@ -1272,7 +1272,7 @@ int xen_evtchn_bind_pirq_op(struct evtchn_bind_pirq *pirq)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 
 if (s->pirq[pirq->pirq].port) {
 return -EBUSY;
@@ -1824,7 +1824,7 @@ int xen_physdev_map_pirq(struct physdev_map_pirq *map)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (map->domid != DOMID_SELF && map->domid != xen_domid) {
@@ -1884,7 +1884,7 @@ int xen_physdev_unmap_pirq(struct physdev_unmap_pirq 
*unmap)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 qemu_mutex_lock(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
@@ -1924,7 +1924,7 @@ int xen_physdev_eoi_pirq(struct physdev_eoi *eoi)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
@@ -1956,7 +1956,7 @@ int xen_physdev_query_pirq(struct 
physdev_irq_status_query *query)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index 0a24f53f20..d9477ae927 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -176,7 +176,7 @@ int xen_gnttab_map_page(uint64_t idx, uint64_t gfn)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->gnt_lock);
 
 xen_overlay_do_map_page(&s->gnt_aliases[idx], gpa);
diff --git a/hw/mips/mips_int.c b/hw/mips/mips_int.c
index 6c32e466a3..eef2fd2cd1 100644
--- a/hw/mips/mips_int.c
+++ b/hw/mips/mips_int.c
@@ -36,7 +36,7 @@ static void cpu_mips_irq_request(void *opaque, int irq, int 
level)
 return;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 
 if (level) {
 e

[PATCH v3 4/5] Replace "iothread lock" with "BQL" in comments

2024-01-02 Thread Stefan Hajnoczi
The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
in their names. Update the code comments to use "BQL" instead of
"iothread lock".

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
---
 docs/devel/reset.rst |  2 +-
 hw/display/qxl.h |  2 +-
 include/exec/cpu-common.h|  2 +-
 include/exec/memory.h|  4 ++--
 include/exec/ramblock.h  |  2 +-
 include/migration/register.h |  8 
 target/arm/internals.h   |  4 ++--
 accel/tcg/cputlb.c   |  4 ++--
 accel/tcg/tcg-accel-ops-icount.c |  2 +-
 hw/remote/mpqemu-link.c  |  2 +-
 migration/block-dirty-bitmap.c   | 10 +-
 migration/block.c| 22 +++---
 migration/colo.c |  2 +-
 migration/migration.c|  2 +-
 migration/ram.c  |  4 ++--
 system/physmem.c |  6 +++---
 target/arm/helper.c  |  2 +-
 ui/spice-core.c  |  2 +-
 util/rcu.c   |  2 +-
 audio/coreaudio.m|  4 ++--
 ui/cocoa.m   |  6 +++---
 21 files changed, 47 insertions(+), 47 deletions(-)

diff --git a/docs/devel/reset.rst b/docs/devel/reset.rst
index 38ed1790f7..d4e79718ba 100644
--- a/docs/devel/reset.rst
+++ b/docs/devel/reset.rst
@@ -19,7 +19,7 @@ Triggering reset
 
 This section documents the APIs which "users" of a resettable object should use
 to control it. All resettable control functions must be called while holding
-the iothread lock.
+the BQL.
 
 You can apply a reset to an object using ``resettable_assert_reset()``. You 
need
 to call ``resettable_release_reset()`` to release the object from reset. To
diff --git a/hw/display/qxl.h b/hw/display/qxl.h
index fdac14edad..e0a85a5ca4 100644
--- a/hw/display/qxl.h
+++ b/hw/display/qxl.h
@@ -159,7 +159,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PCIQXLDevice, PCI_QXL)
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 41115d8919..fef3138d29 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -92,7 +92,7 @@ RAMBlock *qemu_ram_block_by_name(const char *name);
  *
  * By the time this function returns, the returned pointer is not protected
  * by RCU anymore.  If the caller is not within an RCU critical section and
- * does not hold the iothread lock, it must have other means of protecting the
+ * does not hold the BQL, it must have other means of protecting the
  * pointer, such as a reference to the memory region that owns the RAMBlock.
  */
 RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index f172e82ac9..991ab8c6e8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1962,7 +1962,7 @@ int memory_region_get_fd(MemoryRegion *mr);
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
@@ -1979,7 +1979,7 @@ MemoryRegion *memory_region_from_host(void *ptr, 
ram_addr_t *offset);
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 69c6a53902..3eb79723c6 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -34,7 +34,7 @@ struct RAMBlock {
 ram_addr_t max_length;
 void (*resized)(const char*, uint64_t length, void *host);
 uint32_t flags;
-/* Protected by iothread lock.  */
+/* Protected by the BQL.  */
 char idstr[256];
 /* RCU-enabled, writes protected by the ramlist lock */
 QLIST_ENTRY(RAMBlock) next;
diff --git a/include/migration/register.h b/include/migration/register.h
index fed1d04a3c..9ab1f79512 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -17,7 +17,7 @@
 #include "hw/vmstate-if.h"
 
 typedef struct SaveVMHan

[PATCH v3 3/5] qemu/main-loop: rename qemu_cond_wait_iothread() to qemu_cond_wait_bql()

2024-01-02 Thread Stefan Hajnoczi
The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/qemu/main-loop.h  | 10 +-
 accel/tcg/tcg-accel-ops-rr.c  |  4 ++--
 hw/display/virtio-gpu.c   |  2 +-
 hw/ppc/spapr_events.c |  2 +-
 system/cpu-throttle.c |  2 +-
 system/cpus.c |  4 ++--
 target/i386/nvmm/nvmm-accel-ops.c |  2 +-
 target/i386/whpx/whpx-accel-ops.c |  2 +-
 8 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index c26ad2a029..5764db157c 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -371,17 +371,17 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(BQLLockAuto, 
bql_auto_unlock)
 = bql_auto_lock(__FILE__, __LINE__)
 
 /*
- * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
+ * qemu_cond_wait_bql: Wait on condition for the Big QEMU Lock (BQL)
  *
- * This function atomically releases the main loop mutex and causes
+ * This function atomically releases the Big QEMU Lock (BQL) and causes
  * the calling thread to block on the condition.
  */
-void qemu_cond_wait_iothread(QemuCond *cond);
+void qemu_cond_wait_bql(QemuCond *cond);
 
 /*
- * qemu_cond_timedwait_iothread: like the previous, but with timeout
+ * qemu_cond_timedwait_bql: like the previous, but with timeout
  */
-void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
+void qemu_cond_timedwait_bql(QemuCond *cond, int ms);
 
 /* internal interfaces */
 
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index c4ea372a3f..5794e5a9ce 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -111,7 +111,7 @@ static void rr_wait_io_event(void)
 
 while (all_cpu_threads_idle()) {
 rr_stop_kick_timer();
-qemu_cond_wait_iothread(first_cpu->halt_cond);
+qemu_cond_wait_bql(first_cpu->halt_cond);
 }
 
 rr_start_kick_timer();
@@ -198,7 +198,7 @@ static void *rr_cpu_thread_fn(void *arg)
 
 /* wait for initial kick-off after machine start */
 while (first_cpu->stopped) {
-qemu_cond_wait_iothread(first_cpu->halt_cond);
+qemu_cond_wait_bql(first_cpu->halt_cond);
 
 /* process any pending work */
 CPU_FOREACH(cpu) {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index b016d3bac8..67c5be1a4e 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1512,7 +1512,7 @@ void virtio_gpu_reset(VirtIODevice *vdev)
 g->reset_finished = false;
 qemu_bh_schedule(g->reset_bh);
 while (!g->reset_finished) {
-qemu_cond_wait_iothread(&g->reset_cond);
+qemu_cond_wait_bql(&g->reset_cond);
 }
 } else {
 virtio_gpu_reset_bh(g);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index deb4641505..cb0587 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -899,7 +899,7 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 }
 return;
 }
-qemu_cond_wait_iothread(&spapr->fwnmi_machine_check_interlock_cond);
+qemu_cond_wait_bql(&spapr->fwnmi_machine_check_interlock_cond);
 if (spapr->fwnmi_machine_check_addr == -1) {
 /*
  * If the machine was reset while waiting for the interlock,
diff --git a/system/cpu-throttle.c b/system/cpu-throttle.c
index 786a9a5639..c951a6c65e 100644
--- a/system/cpu-throttle.c
+++ b/system/cpu-throttle.c
@@ -54,7 +54,7 @@ static void cpu_throttle_thread(CPUState *cpu, 
run_on_cpu_data opaque)
 endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
 while (sleeptime_ns > 0 && !cpu->stop) {
 if (sleeptime_ns > SCALE_MS) {
-qemu_cond_timedwait_iothread(cpu->halt_cond,
+qemu_cond_timedwait_bql(cpu->halt_cond,
  sleeptime_ns / SCALE_MS);
 } else {
 bql_unlock();
diff --git a/system/cpus.c b/system/cpus.c
index 9b68dc9c7c..c8e2772b5f 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -514,12 +514,12 @@ void bql_unlock(void)
 qemu_mutex_unlock(&bql);
 }
 
-void qemu_cond_wait_iothread(QemuCond *cond)
+void qemu_cond_wait_bql(QemuCond *cond)
 {
 qemu_cond_wait(cond, &bql);
 }
 
-void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
+void qemu_cond_timedwait_bql(QemuCond *cond, int ms)
 {
 qemu_cond_timedwait(cond, &bql, ms);
 }
diff --git a/target/i386/nvmm/nvmm-accel-ops.c 
b/target/i386/nvmm/nvmm-accel-ops.c
index f9d5e9a37a..6b2bfd9b9c 100644
--- a/target/i386/nvmm/nvmm-accel-ops.c
+++ b/target/i386/nvmm/nvmm-accel-ops.c
@@ -48,7 +48,7 @@ static void *qemu_nvmm_cpu_thread_fn(void *arg)
 

[PATCH v3 0/5] Make Big QEMU Lock naming consistent

2024-01-02 Thread Stefan Hajnoczi
v3:
- Rebase
- Define bql_lock() macro on a single line [Akihiko Odaki]
v2:
- Rename APIs bql_*() [PeterX]
- Spell out "Big QEMU Lock (BQL)" in doc comments [PeterX]
- Rename "iolock" variables in hw/remote/mpqemu-link.c [Harsh]
- Fix bql_auto_lock() indentation in Patch 2 [Ilya]
- "with BQL taken" -> "with the BQL taken" [Philippe]
- "under BQL" -> "under the BQL" [Philippe]

The Big QEMU Lock ("BQL") has two other names: "iothread lock" and "QEMU global
mutex". The term "iothread lock" is easily confused with the unrelated --object
iothread (iothread.c).

This series updates the code and documentation to consistently use "BQL". This
makes the code easier to understand.

Stefan Hajnoczi (5):
  system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
  qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to BQL_LOCK_GUARD
  qemu/main-loop: rename qemu_cond_wait_iothread() to
qemu_cond_wait_bql()
  Replace "iothread lock" with "BQL" in comments
  Rename "QEMU global mutex" to "BQL" in comments and docs

 docs/devel/multi-thread-tcg.rst  |   7 +-
 docs/devel/qapi-code-gen.rst |   2 +-
 docs/devel/replay.rst|   2 +-
 docs/devel/reset.rst |   2 +-
 docs/devel/multiple-iothreads.txt|  14 ++--
 hw/display/qxl.h |   2 +-
 include/block/aio-wait.h |   2 +-
 include/block/blockjob.h |   6 +-
 include/exec/cpu-common.h|   2 +-
 include/exec/memory.h|   4 +-
 include/exec/ramblock.h  |   2 +-
 include/io/task.h|   2 +-
 include/migration/register.h |   8 +-
 include/qemu/coroutine-core.h|   2 +-
 include/qemu/coroutine.h |   2 +-
 include/qemu/main-loop.h |  68 ---
 include/qemu/thread.h|   2 +-
 target/arm/internals.h   |   4 +-
 accel/accel-blocker.c|  10 +--
 accel/dummy-cpus.c   |   8 +-
 accel/hvf/hvf-accel-ops.c|   4 +-
 accel/kvm/kvm-accel-ops.c|   4 +-
 accel/kvm/kvm-all.c  |  22 ++---
 accel/tcg/cpu-exec.c |  26 +++---
 accel/tcg/cputlb.c   |  20 ++---
 accel/tcg/tcg-accel-ops-icount.c |   6 +-
 accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
 accel/tcg/tcg-accel-ops-rr.c |  18 ++--
 accel/tcg/tcg-accel-ops.c|   2 +-
 accel/tcg/translate-all.c|   2 +-
 cpu-common.c |   4 +-
 dump/dump.c  |   4 +-
 hw/block/dataplane/virtio-blk.c  |   8 +-
 hw/block/virtio-blk.c|   2 +-
 hw/core/cpu-common.c |   6 +-
 hw/display/virtio-gpu.c  |   2 +-
 hw/i386/intel_iommu.c|   6 +-
 hw/i386/kvm/xen_evtchn.c |  30 +++
 hw/i386/kvm/xen_gnttab.c |   2 +-
 hw/i386/kvm/xen_overlay.c|   2 +-
 hw/i386/kvm/xen_xenstore.c   |   2 +-
 hw/intc/arm_gicv3_cpuif.c|   2 +-
 hw/intc/s390_flic.c  |  18 ++--
 hw/mips/mips_int.c   |   2 +-
 hw/misc/edu.c|   4 +-
 hw/misc/imx6_src.c   |   2 +-
 hw/misc/imx7_src.c   |   2 +-
 hw/net/xen_nic.c |   8 +-
 hw/ppc/pegasos2.c|   2 +-
 hw/ppc/ppc.c |   6 +-
 hw/ppc/spapr.c   |   2 +-
 hw/ppc/spapr_events.c|   2 +-
 hw/ppc/spapr_rng.c   |   4 +-
 hw/ppc/spapr_softmmu.c   |   4 +-
 hw/remote/mpqemu-link.c  |  22 ++---
 hw/remote/vfio-user-obj.c|   2 +-
 hw/s390x/s390-skeys.c|   2 +-
 hw/scsi/virtio-scsi-dataplane.c  |   6 +-
 migration/block-dirty-bitmap.c   |  14 ++--
 migration/block.c|  38 -
 migration/colo.c |  62 +++---
 migration/dirtyrate.c|  12 +--
 migration/migration.c|  54 ++--
 migration/ram.c  |  16 ++--
 net/tap.c|   2 +-
 replay/replay-internal.c |   2 +-
 semihosting/console.c|   8 +-
 stubs/iothread-lock.c|   6 +-
 system/cpu-throttle.c|   6 +-
 system/cpus.c|  55 +++--
 system/dirtylimit.c  |   4 +-
 system/memory.c  |   2 +-
 system/physmem.c |  14 ++--
 system/runstate.c|   2 +-
 system/watchpoint.c  |   4 +-
 target/arm/arm-powerctl.c|  14 ++--
 target/arm/helper.c  |   6 +-
 target/arm/hvf/hvf.c |   8 +-
 target/arm/kvm.c |   8 +-
 target/arm/

Re: [PATCH v2 1/5] system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()

2024-01-02 Thread Stefan Hajnoczi
On Wed, Dec 13, 2023 at 03:37:00PM +0900, Akihiko Odaki wrote:
> On 2023/12/13 0:39, Stefan Hajnoczi wrote:
> > @@ -312,58 +312,58 @@ bool qemu_in_main_thread(void);
> >   } while (0)
> >   /**
> > - * qemu_mutex_lock_iothread: Lock the main loop mutex.
> > + * bql_lock: Lock the Big QEMU Lock (BQL).
> >*
> > - * This function locks the main loop mutex.  The mutex is taken by
> > + * This function locks the Big QEMU Lock (BQL).  The lock is taken by
> >* main() in vl.c and always taken except while waiting on
> > - * external events (such as with select).  The mutex should be taken
> > + * external events (such as with select).  The lock should be taken
> >* by threads other than the main loop thread when calling
> >* qemu_bh_new(), qemu_set_fd_handler() and basically all other
> >* functions documented in this file.
> >*
> > - * NOTE: tools currently are single-threaded and qemu_mutex_lock_iothread
> > + * NOTE: tools currently are single-threaded and bql_lock
> >* is a no-op there.
> >*/
> > -#define qemu_mutex_lock_iothread()  \
> > -qemu_mutex_lock_iothread_impl(__FILE__, __LINE__)
> > -void qemu_mutex_lock_iothread_impl(const char *file, int line);
> > +#define bql_lock()  \
> > +bql_lock_impl(__FILE__, __LINE__)
> 
> This line break is no longer necessary.

Will fix in v3.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

2023-12-20 Thread Stefan Hajnoczi
On Tue, 19 Dec 2023 at 13:20, Kevin Wolf  wrote:
>
> Am 19.12.2023 um 16:28 hat Kevin Wolf geschrieben:
> > Am 05.12.2023 um 19:20 hat Stefan Hajnoczi geschrieben:
> > > aio_context_acquire()/aio_context_release() has been replaced by
> > > fine-grained locking to protect state shared by multiple threads. The
> > > AioContext lock still plays the role of balancing locking in
> > > AIO_WAIT_WHILE() and many functions in QEMU either require that the
> > > AioContext lock is held or not held for this reason. In other words, the
> > > AioContext lock is purely there for consistency with itself and serves
> > > no real purpose anymore.
> > >
> > > Stop actually acquiring/releasing the lock in
> > > aio_context_acquire()/aio_context_release() so that subsequent patches
> > > can remove callers across the codebase incrementally.
> > >
> > > I have performed "make check" and qemu-iotests stress tests across
> > > x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
> > > result of eliminating the lock.
> > >
> > > Signed-off-by: Stefan Hajnoczi 
> > > Reviewed-by: Eric Blake 
> > > Acked-by: Kevin Wolf 
> >
> > I knew why I wasn't confident enough to give a R-b... This crashes
> > qemu-storage-daemon in the qemu-iotests case graph-changes-while-io.
> >
> > qemu-storage-daemon: ../nbd/server.c:2542: nbd_co_receive_request: 
> > Assertion `client->recv_coroutine == qemu_coroutine_self()' failed.
> >
> > (gdb) bt
> > #0  0x7fdb00529884 in __pthread_kill_implementation () from 
> > /lib64/libc.so.6
> > #1  0x7fdb004d8afe in raise () from /lib64/libc.so.6
> > #2  0x7fdb004c187f in abort () from /lib64/libc.so.6
> > #3  0x7fdb004c179b in __assert_fail_base.cold () from /lib64/libc.so.6
> > #4  0x7fdb004d1187 in __assert_fail () from /lib64/libc.so.6
> > #5  0x557f9f9534eb in nbd_co_receive_request (errp=0x7fdafc25eec0, 
> > request=0x7fdafc25ef10, req=0x7fdaf00159c0) at ../nbd/server.c:2542
> > #6  nbd_trip (opaque=0x557fa0b33fa0) at ../nbd/server.c:2962
> > #7  0x557f9faa416b in coroutine_trampoline (i0=, 
> > i1=) at ../util/coroutine-ucontext.c:177
> > #8  0x7fdb004efe90 in ?? () from /lib64/libc.so.6
> > #9  0x7fdafc35f680 in ?? ()
> > #10 0x in ?? ()
> > (gdb) p *client
> > $2 = {refcount = 4, close_fn = 0x557f9f95dc40 , 
> > exp = 0x557fa0b30590, tlscreds = 0x0, tlsauthz = 0x0, sioc = 
> > 0x557fa0b33d90, ioc = 0x557fa0b33d90,
> >   recv_coroutine = 0x7fdaf0015eb0, send_lock = {locked = 0, ctx = 0x0, 
> > from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, 
> > sequence = 0, holder = 0x0},
> >   send_coroutine = 0x0, read_yielding = false, quiescing = false, next = 
> > {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x557fa0b305e8}}, 
> > nb_requests = 1, closing = false,
> >   check_align = 1, mode = NBD_MODE_EXTENDED, contexts = {exp = 
> > 0x557fa0b30590, count = 1, base_allocation = true, allocation_depth = 
> > false, bitmaps = 0x0}, opt = 7, optlen = 0}
> > (gdb) p co_tls_current
> > $3 = (Coroutine *) 0x7fdaf00061d0
>
> This one isn't easy to debug...
>
> The first problem here is that two nbd_trip() coroutines are scheduled
> in the same iothread, and creating the second one overwrites
> client->recv_coroutine, which triggers the assertion in the first one.
>
> This can be fixed by introducing a new mutex in NBDClient and taking it
> in nbd_client_receive_next_request() so that there is no race between
> checking client->recv_coroutine != NULL and setting it to a new
> coroutine. (Not entirely sure why two different threads are doing this,
> maybe the main thread reentering in drained_end and the iothread waiting
> for the next request?)
>
> However, I'm seeing new assertion failures when I do that:
> client->quiescing isn't set in the -EAGAIN case in nbd_trip(). I haven't
> really figured out yet where this comes from. Taking the new NBDClient
> lock in the drain functions and in nbd_trip() doesn't seem to be enough
> to fix it anyway. Or maybe I didn't quite find the right places to take
> it.

bdrv_graph_wrlock() -> bdrv_drain_all_begin_nopoll() followed by
bdrv_drain_all_end() causes this issue.

It's a race condition where nbd_trip() in the IOThread sees
client->quiescing == true for a moment but then the drained region
ends before nbd_trip() re-acquires the lock and reaches
assert(client->quiescing).

The "nopoll" part of bdrv_drain_all_begin_nopoll() seems to be the
issue. We cannot assume all requests have quiesced when .drained_end()
is called.

I'm running more tests now to be sure I have a working solution. Will
send patches soon.

Stefan



Re: [PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

2023-12-20 Thread Stefan Hajnoczi
On Wed, 20 Dec 2023 at 04:32, Kevin Wolf  wrote:
>
> Am 19.12.2023 um 22:23 hat Stefan Hajnoczi geschrieben:
> > The following hack makes the test pass but there are larger safety
> > issues that I'll need to look at on Wednesday:
>
> I see, you're taking the same approach as in the SCSI layer: Don't make
> things thread-safe, but just always access them from the same thread.

Yes, but it feels like a hack to me. You pointed out that other parts
also don't look thread-safe (e.g. the clients list) and I agree. I've
started annotating the code and will try to come up with a full fix
today.

Stefan



Re: [PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

2023-12-19 Thread Stefan Hajnoczi
The following hack makes the test pass but there are larger safety
issues that I'll need to look at on Wednesday:

diff --git a/nbd/server.c b/nbd/server.c
index 895cf0a752..cf4b7d5c6d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1617,7 +1617,7 @@ static void nbd_drained_begin(void *opaque)
 }
 }

-static void nbd_drained_end(void *opaque)
+static void nbd_resume_clients(void *opaque)
 {
 NBDExport *exp = opaque;
 NBDClient *client;
@@ -1628,6 +1628,15 @@ static void nbd_drained_end(void *opaque)
 }
 }

+static void nbd_drained_end(void *opaque)
+{
+NBDExport *exp = opaque;
+
+/* TODO how to make sure exp doesn't go away? */
+/* TODO what if AioContext changes before this runs? */
+aio_bh_schedule_oneshot(nbd_export_aio_context(exp),
nbd_resume_clients, exp);
+}
+
 static bool nbd_drained_poll(void *opaque)
 {
 NBDExport *exp = opaque;



Re: [PATCH v2 06/14] block: remove AioContext locking

2023-12-19 Thread Stefan Hajnoczi
On Tue, 19 Dec 2023 at 10:59, Kevin Wolf  wrote:
>
> Am 05.12.2023 um 19:20 hat Stefan Hajnoczi geschrieben:
> > This is the big patch that removes
> > aio_context_acquire()/aio_context_release() from the block layer and
> > affected block layer users.
> >
> > There isn't a clean way to split this patch and the reviewers are likely
> > the same group of people, so I decided to do it in one patch.
> >
> > Signed-off-by: Stefan Hajnoczi 
> > Reviewed-by: Eric Blake 
> > Reviewed-by: Kevin Wolf 
> > Reviewed-by: Paul Durrant 
>
> > diff --git a/migration/block.c b/migration/block.c
> > index a15f9bddcb..2bcfcbfdf6 100644
> > --- a/migration/block.c
> > +++ b/migration/block.c
> > @@ -313,22 +311,10 @@ static int mig_save_device_bulk(QEMUFile *f, 
> > BlkMigDevState *bmds)
> >  block_mig_state.submitted++;
> >  blk_mig_unlock();
> >
> > -/* We do not know if bs is under the main thread (and thus does
> > - * not acquire the AioContext when doing AIO) or rather under
> > - * dataplane.  Thus acquire both the iothread mutex and the
> > - * AioContext.
> > - *
> > - * This is ugly and will disappear when we make bdrv_* thread-safe,
> > - * without the need to acquire the AioContext.
> > - */
> > -qemu_mutex_lock_iothread();
> > -aio_context_acquire(blk_get_aio_context(bmds->blk));
> >  bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, cur_sector * 
> > BDRV_SECTOR_SIZE,
> >  nr_sectors * BDRV_SECTOR_SIZE);
> >  blk->aiocb = blk_aio_preadv(bb, cur_sector * BDRV_SECTOR_SIZE, 
> > &blk->qiov,
> >  0, blk_mig_read_cb, blk);
> > -aio_context_release(blk_get_aio_context(bmds->blk));
> > -qemu_mutex_unlock_iothread();
> >
> >  bmds->cur_sector = cur_sector + nr_sectors;
> >  return (bmds->cur_sector >= total_sectors);
>
> With this hunk applied, qemu-iotests 183 fails:
>
> (gdb) bt
> #0  0x55aaa7d47c09 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:176
> #1  0x55aaa7d3de2e in graph_lockable_auto_lock (x=) at 
> /home/kwolf/source/qemu/include/block/graph-lock.h:215
> #2  blk_co_do_preadv_part (blk=0x7f38a4000f30, offset=0, bytes=1048576, 
> qiov=0x7f38a40250f0, qiov_offset=qiov_offset@entry=0, flags=0) at 
> ../block/block-backend.c:1340
> #3  0x55aaa7d3e006 in blk_aio_read_entry (opaque=0x7f38a4025140) at 
> ../block/block-backend.c:1620
> #4  0x55aaa7e7aa5b in coroutine_trampoline (i0=, 
> i1=) at ../util/coroutine-ucontext.c:177
> #5  0x7f38d14dbe90 in __start_context () at /lib64/libc.so.6
> #6  0x7f38b3dfa060 in  ()
> #7  0x in  ()
>
> qemu_get_current_aio_context() returns NULL now. I don't completely
> understand why it depends on the BQL, but adding the BQL locking back
> fixes it.

Thanks for letting me know. I have reviewed migration/block.c and
agree that taking the BQL is correct.

I have inlined the fix below and will resend this patch.

Stefan
---
diff --git a/migration/block.c b/migration/block.c
index 2bcfcbfdf6..6ec6a1d6e6 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -311,10 +311,17 @@ static int mig_save_device_bulk(QEMUFile *f,
BlkMigDevState *bmds)
 block_mig_state.submitted++;
 blk_mig_unlock();

+/*
+ * The migration thread does not have an AioContext. Lock the BQL so that
+ * I/O runs in the main loop AioContext (see
+ * qemu_get_current_aio_context()).
+ */
+qemu_mutex_lock_iothread();
 bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, cur_sector * BDRV_SECTOR_SIZE,
 nr_sectors * BDRV_SECTOR_SIZE);
 blk->aiocb = blk_aio_preadv(bb, cur_sector * BDRV_SECTOR_SIZE, &blk->qiov,
 0, blk_mig_read_cb, blk);
+qemu_mutex_unlock_iothread();

 bmds->cur_sector = cur_sector + nr_sectors;
 return (bmds->cur_sector >= total_sectors);



Re: [PATCH] fix qemu build with xen-4.18.0

2023-12-12 Thread Stefan Hajnoczi
On Tue, 12 Dec 2023 at 11:02, Volodymyr Babchuk
 wrote:
>
>
> Hi Stefan,
>
> Stefan Hajnoczi  writes:
>
> > On Tue, 12 Dec 2023 at 10:36, Volodymyr Babchuk
> >  wrote:
> >>
> >> Hi Anthony
> >>
> >> Anthony PERARD  writes:
> >>
> >> > On Fri, Dec 08, 2023 at 02:49:27PM -0800, Stefano Stabellini wrote:
> >> >> On Fri, 8 Dec 2023, Daniel P. Berrangé wrote:
> >> >> > On Thu, Dec 07, 2023 at 11:12:48PM +, Michael Young wrote:
> >> >> > > Builds of qemu-8.2.0rc2 with xen-4.18.0 are currently failing
> >> >> > > with errors like
> >> >> > > ../hw/arm/xen_arm.c:74:5: error: ‘GUEST_VIRTIO_MMIO_SPI_LAST’ 
> >> >> > > undeclared (first use in this function)
> >> >> > >74 |(GUEST_VIRTIO_MMIO_SPI_LAST - 
> >> >> > > GUEST_VIRTIO_MMIO_SPI_FIRST)
> >> >> > >   | ^~
> >> >> > >
> >> >> > > as there is an incorrect comparision in include/hw/xen/xen_native.h
> >> >> > > which means that settings like GUEST_VIRTIO_MMIO_SPI_LAST
> >> >> > > aren't being defined for xen-4.18.0
> >> >> >
> >> >> > The conditions in arch-arm.h for xen 4.18 show:
> >> >> >
> >> >> > $ cppi arch-arm.h | grep -E '(#.*if)|MMIO'
> >> >> > #ifndef __XEN_PUBLIC_ARCH_ARM_H__
> >> >> > # if defined(__XEN__) || defined(__XEN_TOOLS__) || defined(__GNUC__)
> >> >> > # endif
> >> >> > # ifndef __ASSEMBLY__
> >> >> > #  if defined(__XEN__) || defined(__XEN_TOOLS__)
> >> >> > #   if defined(__GNUC__) && !defined(__STRICT_ANSI__)
> >> >> > #   endif
> >> >> > #  endif /* __XEN__ || __XEN_TOOLS__ */
> >> >> > # endif
> >> >> > # if defined(__XEN__) || defined(__XEN_TOOLS__)
> >> >> > #  define PSR_MODE_BIT  0x10U /* Set iff AArch32 */
> >> >> > /* Virtio MMIO mappings */
> >> >> > #  define GUEST_VIRTIO_MMIO_BASE   xen_mk_ullong(0x0200)
> >> >> > #  define GUEST_VIRTIO_MMIO_SIZE   xen_mk_ullong(0x0010)
> >> >> > #  define GUEST_VIRTIO_MMIO_SPI_FIRST   33
> >> >> > #  define GUEST_VIRTIO_MMIO_SPI_LAST43
> >> >> > # endif
> >> >> > # ifndef __ASSEMBLY__
> >> >> > # endif
> >> >> > #endif /*  __XEN_PUBLIC_ARCH_ARM_H__ */
> >> >> >
> >> >> > So the MMIO constants are available if __XEN__ or __XEN_TOOLS__
> >> >> > are defined. This is no different to the condition that was
> >> >> > present in Xen 4.17.
> >> >> >
> >> >> > What you didn't mention was that the Fedora build failure is
> >> >> > seen on an x86_64 host, when building the aarch64 target QEMU,
> >> >> > and I think this is the key issue.
> >> >>
> >> >> Hi Daniel, thanks for looking into it.
> >> >>
> >> >> - you are building on a x86_64 host
> >> >> - the target is aarch64
> >> >> - the target is the aarch64 Xen PVH machine (xen_arm.c)
> >> >>
> >> >> But is the resulting QEMU binary expected to be an x86 binary? Or are
> >> >> you cross compiling ARM binaries on a x86 host?
> >> >>
> >> >> In other word, is the resulting QEMU binary expected to run on ARM or
> >> >> x86?
> >> >>
> >> >>
> >> >> > Are we expecting to build Xen support for non-arch native QEMU
> >> >> > system binaries or not ?
> >> >>
> >> >> The ARM xenpvh machine (xen_arm.c) is meant to work with Xen on ARM, not
> >> >> Xen on x86.  So this is only expected to work if you are
> >> >> cross-compiling. But you can cross-compile both Xen and QEMU, and I am
> >> >> pretty sure that Yocto is able to build Xen, Xen userspace tools, and
> >> >> QEMU for Xen/ARM on an x86 host today.
> >> >>
> >> >>
> >> >> > The constants are defined in arch-arm.h, which is only included
> >> >> > under:
> >> >> >
> >> >> >   #if defined(__i386__) || defined(__x86_64__)
> >> >> >   #include "arch-x86/xen.h"
&

Re: [PATCH] fix qemu build with xen-4.18.0

2023-12-12 Thread Stefan Hajnoczi
On Tue, 12 Dec 2023 at 10:36, Volodymyr Babchuk
 wrote:
>
> Hi Anthony
>
> Anthony PERARD  writes:
>
> > On Fri, Dec 08, 2023 at 02:49:27PM -0800, Stefano Stabellini wrote:
> >> On Fri, 8 Dec 2023, Daniel P. Berrangé wrote:
> >> > On Thu, Dec 07, 2023 at 11:12:48PM +, Michael Young wrote:
> >> > > Builds of qemu-8.2.0rc2 with xen-4.18.0 are currently failing
> >> > > with errors like
> >> > > ../hw/arm/xen_arm.c:74:5: error: ‘GUEST_VIRTIO_MMIO_SPI_LAST’ 
> >> > > undeclared (first use in this function)
> >> > >74 |(GUEST_VIRTIO_MMIO_SPI_LAST - GUEST_VIRTIO_MMIO_SPI_FIRST)
> >> > >   | ^~
> >> > >
> >> > > as there is an incorrect comparision in include/hw/xen/xen_native.h
> >> > > which means that settings like GUEST_VIRTIO_MMIO_SPI_LAST
> >> > > aren't being defined for xen-4.18.0
> >> >
> >> > The conditions in arch-arm.h for xen 4.18 show:
> >> >
> >> > $ cppi arch-arm.h | grep -E '(#.*if)|MMIO'
> >> > #ifndef __XEN_PUBLIC_ARCH_ARM_H__
> >> > # if defined(__XEN__) || defined(__XEN_TOOLS__) || defined(__GNUC__)
> >> > # endif
> >> > # ifndef __ASSEMBLY__
> >> > #  if defined(__XEN__) || defined(__XEN_TOOLS__)
> >> > #   if defined(__GNUC__) && !defined(__STRICT_ANSI__)
> >> > #   endif
> >> > #  endif /* __XEN__ || __XEN_TOOLS__ */
> >> > # endif
> >> > # if defined(__XEN__) || defined(__XEN_TOOLS__)
> >> > #  define PSR_MODE_BIT  0x10U /* Set iff AArch32 */
> >> > /* Virtio MMIO mappings */
> >> > #  define GUEST_VIRTIO_MMIO_BASE   xen_mk_ullong(0x0200)
> >> > #  define GUEST_VIRTIO_MMIO_SIZE   xen_mk_ullong(0x0010)
> >> > #  define GUEST_VIRTIO_MMIO_SPI_FIRST   33
> >> > #  define GUEST_VIRTIO_MMIO_SPI_LAST43
> >> > # endif
> >> > # ifndef __ASSEMBLY__
> >> > # endif
> >> > #endif /*  __XEN_PUBLIC_ARCH_ARM_H__ */
> >> >
> >> > So the MMIO constants are available if __XEN__ or __XEN_TOOLS__
> >> > are defined. This is no different to the condition that was
> >> > present in Xen 4.17.
> >> >
> >> > What you didn't mention was that the Fedora build failure is
> >> > seen on an x86_64 host, when building the aarch64 target QEMU,
> >> > and I think this is the key issue.
> >>
> >> Hi Daniel, thanks for looking into it.
> >>
> >> - you are building on a x86_64 host
> >> - the target is aarch64
> >> - the target is the aarch64 Xen PVH machine (xen_arm.c)
> >>
> >> But is the resulting QEMU binary expected to be an x86 binary? Or are
> >> you cross compiling ARM binaries on a x86 host?
> >>
> >> In other word, is the resulting QEMU binary expected to run on ARM or
> >> x86?
> >>
> >>
> >> > Are we expecting to build Xen support for non-arch native QEMU
> >> > system binaries or not ?
> >>
> >> The ARM xenpvh machine (xen_arm.c) is meant to work with Xen on ARM, not
> >> Xen on x86.  So this is only expected to work if you are
> >> cross-compiling. But you can cross-compile both Xen and QEMU, and I am
> >> pretty sure that Yocto is able to build Xen, Xen userspace tools, and
> >> QEMU for Xen/ARM on an x86 host today.
> >>
> >>
> >> > The constants are defined in arch-arm.h, which is only included
> >> > under:
> >> >
> >> >   #if defined(__i386__) || defined(__x86_64__)
> >> >   #include "arch-x86/xen.h"
> >> >   #elif defined(__arm__) || defined (__aarch64__)
> >> >   #include "arch-arm.h"
> >> >   #else
> >> >   #error "Unsupported architecture"
> >> >   #endif
> >> >
> >> >
> >> > When we are building on an x86_64 host, we not going to get
> >> > arch-arm.h included, even if we're trying to build the aarch64
> >> > system emulator.
> >> >
> >> > I don't know how this is supposed to work ?
> >>
> >> It looks like a host vs. target architecture mismatch: the #if defined
> >> (__aarch64__) check should pass I think.
> >
> >
> > Building qemu with something like:
> > ./configure --enable-xen --cpu=x86_64
> > used to work. Can we fix that? It still works with v8.1.0.
> > At least, it works on x86, I never really try to build qemu for arm.
> > Notice that there's no "--target-list" on the configure command line.
> > I don't know if --cpu is useful here.
> >
> > Looks like the first commit where the build doesn't work is
> > 7899f6589b78 ("xen_arm: Add virtual PCIe host bridge support").
>
> I am currently trying to upstream this patch. It is in the QEMU mailing
> list but it was never accepted. It is not reviewed in fact. I'll take a
> look at it, but I don't understand how did you get in the first place.

Hi Volodymyr,
Paolo Bonzini sent a pull request with similar code changes this
morning and I have merged it into the qemu.git/staging branch:
https://gitlab.com/qemu-project/qemu/-/commit/eaae59af4035770975b0ce9364b587223a909501

If you spot something that is not correct, please reply here.

Thanks!

Stefan

>
> --
> WBR, Volodymyr



[PATCH v2 5/5] Rename "QEMU global mutex" to "BQL" in comments and docs

2023-12-12 Thread Stefan Hajnoczi
The term "QEMU global mutex" is identical to the more widely used Big
QEMU Lock ("BQL"). Update the code comments and documentation to use
"BQL" instead of "QEMU global mutex".

Signed-off-by: Stefan Hajnoczi 
Acked-by: Markus Armbruster 
Reviewed-by: Philippe Mathieu-Daudé 
---
 docs/devel/multi-thread-tcg.rst   |  7 +++
 docs/devel/qapi-code-gen.rst  |  2 +-
 docs/devel/replay.rst |  2 +-
 docs/devel/multiple-iothreads.txt | 16 
 include/block/blockjob.h  |  6 +++---
 include/io/task.h |  2 +-
 include/qemu/coroutine-core.h |  2 +-
 include/qemu/coroutine.h  |  2 +-
 hw/block/dataplane/virtio-blk.c   |  8 
 hw/block/virtio-blk.c |  2 +-
 hw/scsi/virtio-scsi-dataplane.c   |  6 +++---
 net/tap.c |  2 +-
 12 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.rst b/docs/devel/multi-thread-tcg.rst
index c9541a7b20..7302c3bf53 100644
--- a/docs/devel/multi-thread-tcg.rst
+++ b/docs/devel/multi-thread-tcg.rst
@@ -226,10 +226,9 @@ instruction. This could be a future optimisation.
 Emulated hardware state
 ---
 
-Currently thanks to KVM work any access to IO memory is automatically
-protected by the global iothread mutex, also known as the BQL (Big
-QEMU Lock). Any IO region that doesn't use global mutex is expected to
-do its own locking.
+Currently thanks to KVM work any access to IO memory is automatically protected
+by the BQL (Big QEMU Lock). Any IO region that doesn't use the BQL is expected
+to do its own locking.
 
 However IO memory isn't the only way emulated hardware state can be
 modified. Some architectures have model specific registers that
diff --git a/docs/devel/qapi-code-gen.rst b/docs/devel/qapi-code-gen.rst
index 7f78183cd4..ea8228518c 100644
--- a/docs/devel/qapi-code-gen.rst
+++ b/docs/devel/qapi-code-gen.rst
@@ -594,7 +594,7 @@ blocking the guest and other background operations.
 Coroutine safety can be hard to prove, similar to thread safety.  Common
 pitfalls are:
 
-- The global mutex isn't held across ``qemu_coroutine_yield()``, so
+- The BQL isn't held across ``qemu_coroutine_yield()``, so
   operations that used to assume that they execute atomically may have
   to be more careful to protect against changes in the global state.
 
diff --git a/docs/devel/replay.rst b/docs/devel/replay.rst
index 0244be8b9c..effd856f0c 100644
--- a/docs/devel/replay.rst
+++ b/docs/devel/replay.rst
@@ -184,7 +184,7 @@ modes.
 Reading and writing requests are created by CPU thread of QEMU. Later these
 requests proceed to block layer which creates "bottom halves". Bottom
 halves consist of callback and its parameters. They are processed when
-main loop locks the global mutex. These locks are not synchronized with
+main loop locks the BQL. These locks are not synchronized with
 replaying process because main loop also processes the events that do not
 affect the virtual machine state (like user interaction with monitor).
 
diff --git a/docs/devel/multiple-iothreads.txt 
b/docs/devel/multiple-iothreads.txt
index a3e949f6b3..828e5527a3 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -5,7 +5,7 @@ the COPYING file in the top-level directory.
 
 
 This document explains the IOThread feature and how to write code that runs
-outside the QEMU global mutex.
+outside the BQL.
 
 The main loop and IOThreads
 ---
@@ -29,13 +29,13 @@ scalability bottleneck on hosts with many CPUs.  Work can 
be spread across
 several IOThreads instead of just one main loop.  When set up correctly this
 can improve I/O latency and reduce jitter seen by the guest.
 
-The main loop is also deeply associated with the QEMU global mutex, which is a
-scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
-global mutex to serialize execution of QEMU code.  This mutex is necessary
-because a lot of QEMU's code historically was not thread-safe.
+The main loop is also deeply associated with the BQL, which is a
+scalability bottleneck in itself.  vCPU threads and the main loop use the BQL
+to serialize execution of QEMU code.  This mutex is necessary because a lot of
+QEMU's code historically was not thread-safe.
 
 The fact that all I/O processing is done in a single main loop and that the
-QEMU global mutex is contended by all vCPU threads and the main loop explain
+BQL is contended by all vCPU threads and the main loop explain
 why it is desirable to place work into IOThreads.
 
 The experimental virtio-blk data-plane implementation has been benchmarked and
@@ -66,7 +66,7 @@ There are several old APIs that use the main loop AioContext:
 
 Since they implicitly work on the main loop they cannot be used in code that
 runs in an IOThread.  They might cause a crash or deadlock if called from an
-

[PATCH v2 2/5] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to BQL_LOCK_GUARD

2023-12-12 Thread Stefan Hajnoczi
The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Paul Durrant 
Acked-by: David Woodhouse 
Reviewed-by: Cédric Le Goater 
Acked-by: Ilya Leoshkevich 
---
 include/qemu/main-loop.h  | 19 +--
 hw/i386/kvm/xen_evtchn.c  | 14 +++---
 hw/i386/kvm/xen_gnttab.c  |  2 +-
 hw/mips/mips_int.c|  2 +-
 hw/ppc/ppc.c  |  2 +-
 target/i386/kvm/xen-emu.c |  2 +-
 target/ppc/excp_helper.c  |  2 +-
 target/ppc/helper_regs.c  |  2 +-
 target/riscv/cpu_helper.c |  4 ++--
 9 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 596a206acd..1da0fb186e 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -344,33 +344,32 @@ void bql_lock_impl(const char *file, int line);
 void bql_unlock(void);
 
 /**
- * QEMU_IOTHREAD_LOCK_GUARD
+ * BQL_LOCK_GUARD
  *
  * Wrap a block of code in a conditional bql_{lock,unlock}.
  */
-typedef struct IOThreadLockAuto IOThreadLockAuto;
+typedef struct BQLLockAuto BQLLockAuto;
 
-static inline IOThreadLockAuto *qemu_iothread_auto_lock(const char *file,
-int line)
+static inline BQLLockAuto *bql_auto_lock(const char *file, int line)
 {
 if (bql_locked()) {
 return NULL;
 }
 bql_lock_impl(file, line);
 /* Anything non-NULL causes the cleanup function to be called */
-return (IOThreadLockAuto *)(uintptr_t)1;
+return (BQLLockAuto *)(uintptr_t)1;
 }
 
-static inline void qemu_iothread_auto_unlock(IOThreadLockAuto *l)
+static inline void bql_auto_unlock(BQLLockAuto *l)
 {
 bql_unlock();
 }
 
-G_DEFINE_AUTOPTR_CLEANUP_FUNC(IOThreadLockAuto, qemu_iothread_auto_unlock)
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(BQLLockAuto, bql_auto_unlock)
 
-#define QEMU_IOTHREAD_LOCK_GUARD() \
-g_autoptr(IOThreadLockAuto) _iothread_lock_auto __attribute__((unused)) \
-= qemu_iothread_auto_lock(__FILE__, __LINE__)
+#define BQL_LOCK_GUARD() \
+g_autoptr(BQLLockAuto) _bql_lock_auto __attribute__((unused)) \
+= bql_auto_lock(__FILE__, __LINE__)
 
 /*
  * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index d7d15cfaf7..bd077eda6d 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1127,7 +1127,7 @@ int xen_evtchn_reset_op(struct evtchn_reset *reset)
 return -ESRCH;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 return xen_evtchn_soft_reset();
 }
 
@@ -1145,7 +1145,7 @@ int xen_evtchn_close_op(struct evtchn_close *close)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 qemu_mutex_lock(&s->port_lock);
 
 ret = close_port(s, close->port, &flush_kvm_routes);
@@ -1272,7 +1272,7 @@ int xen_evtchn_bind_pirq_op(struct evtchn_bind_pirq *pirq)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 
 if (s->pirq[pirq->pirq].port) {
 return -EBUSY;
@@ -1824,7 +1824,7 @@ int xen_physdev_map_pirq(struct physdev_map_pirq *map)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (map->domid != DOMID_SELF && map->domid != xen_domid) {
@@ -1884,7 +1884,7 @@ int xen_physdev_unmap_pirq(struct physdev_unmap_pirq 
*unmap)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 qemu_mutex_lock(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
@@ -1924,7 +1924,7 @@ int xen_physdev_eoi_pirq(struct physdev_eoi *eoi)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
@@ -1956,7 +1956,7 @@ int xen_physdev_query_pirq(struct 
physdev_irq_status_query *query)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index 0a24f53f20..d9477ae927 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -176,7 +176,7 @@ int xen_gnttab_map_page(uint64_t idx, uint64_t gfn)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->gnt_lock);
 
 xen_overlay_do_map_page(&s->gnt_aliases[idx], gpa);
diff --git a/hw/mips/mips_int.c b/hw/mips/mips_int.c
index 6c32e466a3..eef2fd2cd1 100644
--- a/hw/mips/mips_int.c
+++ b/hw/mips/mips_int.c
@@ -36,7 +36,7 @@ static void cpu_mips_irq_request(void *opaque, int irq, int 
level)
 return;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+BQL_LOCK_GUARD();
 
 if (level) {
 e

[PATCH v2 4/5] Replace "iothread lock" with "BQL" in comments

2023-12-12 Thread Stefan Hajnoczi
The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
in their names. Update the code comments to use "BQL" instead of
"iothread lock".

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
---
 docs/devel/reset.rst |  2 +-
 hw/display/qxl.h |  2 +-
 include/exec/cpu-common.h|  2 +-
 include/exec/memory.h|  4 ++--
 include/exec/ramblock.h  |  2 +-
 include/migration/register.h |  8 
 target/arm/internals.h   |  4 ++--
 accel/tcg/cputlb.c   |  4 ++--
 accel/tcg/tcg-accel-ops-icount.c |  2 +-
 hw/remote/mpqemu-link.c  |  2 +-
 migration/block-dirty-bitmap.c   | 10 +-
 migration/block.c| 24 
 migration/colo.c |  2 +-
 migration/migration.c|  2 +-
 migration/ram.c  |  4 ++--
 system/physmem.c |  6 +++---
 target/arm/helper.c  |  2 +-
 ui/spice-core.c  |  2 +-
 util/rcu.c   |  2 +-
 audio/coreaudio.m|  4 ++--
 ui/cocoa.m   |  6 +++---
 21 files changed, 48 insertions(+), 48 deletions(-)

diff --git a/docs/devel/reset.rst b/docs/devel/reset.rst
index 38ed1790f7..d4e79718ba 100644
--- a/docs/devel/reset.rst
+++ b/docs/devel/reset.rst
@@ -19,7 +19,7 @@ Triggering reset
 
 This section documents the APIs which "users" of a resettable object should use
 to control it. All resettable control functions must be called while holding
-the iothread lock.
+the BQL.
 
 You can apply a reset to an object using ``resettable_assert_reset()``. You 
need
 to call ``resettable_release_reset()`` to release the object from reset. To
diff --git a/hw/display/qxl.h b/hw/display/qxl.h
index fdac14edad..e0a85a5ca4 100644
--- a/hw/display/qxl.h
+++ b/hw/display/qxl.h
@@ -159,7 +159,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PCIQXLDevice, PCI_QXL)
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 41115d8919..fef3138d29 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -92,7 +92,7 @@ RAMBlock *qemu_ram_block_by_name(const char *name);
  *
  * By the time this function returns, the returned pointer is not protected
  * by RCU anymore.  If the caller is not within an RCU critical section and
- * does not hold the iothread lock, it must have other means of protecting the
+ * does not hold the BQL, it must have other means of protecting the
  * pointer, such as a reference to the memory region that owns the RAMBlock.
  */
 RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 831f7c996d..ad6466b07e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1962,7 +1962,7 @@ int memory_region_get_fd(MemoryRegion *mr);
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
@@ -1979,7 +1979,7 @@ MemoryRegion *memory_region_from_host(void *ptr, 
ram_addr_t *offset);
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 69c6a53902..3eb79723c6 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -34,7 +34,7 @@ struct RAMBlock {
 ram_addr_t max_length;
 void (*resized)(const char*, uint64_t length, void *host);
 uint32_t flags;
-/* Protected by iothread lock.  */
+/* Protected by the BQL.  */
 char idstr[256];
 /* RCU-enabled, writes protected by the ramlist lock */
 QLIST_ENTRY(RAMBlock) next;
diff --git a/include/migration/register.h b/include/migration/register.h
index fed1d04a3c..9ab1f79512 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -17,7 +17,7 @@
 #include "hw/vmstate-if.h"
 
 typedef struct Save

[PATCH v2 1/5] system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()

2023-12-12 Thread Stefan Hajnoczi
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Paul Durrant 
Acked-by: Fabiano Rosas 
Acked-by: David Woodhouse 
Reviewed-by: Cédric Le Goater 
Acked-by: Peter Xu 
Acked-by: Eric Farman 
Reviewed-by: Harsh Prateek Bora 
---
 include/block/aio-wait.h |   2 +-
 include/qemu/main-loop.h |  40 -
 include/qemu/thread.h|   2 +-
 accel/accel-blocker.c|  10 +--
 accel/dummy-cpus.c   |   8 +-
 accel/hvf/hvf-accel-ops.c|   4 +-
 accel/kvm/kvm-accel-ops.c|   4 +-
 accel/kvm/kvm-all.c  |  22 ++---
 accel/tcg/cpu-exec.c |  26 +++---
 accel/tcg/cputlb.c   |  16 ++--
 accel/tcg/tcg-accel-ops-icount.c |   4 +-
 accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
 accel/tcg/tcg-accel-ops-rr.c |  14 ++--
 accel/tcg/tcg-accel-ops.c|   2 +-
 accel/tcg/translate-all.c|   2 +-
 cpu-common.c |   4 +-
 dump/dump.c  |   4 +-
 hw/core/cpu-common.c |   6 +-
 hw/i386/intel_iommu.c|   6 +-
 hw/i386/kvm/xen_evtchn.c |  16 ++--
 hw/i386/kvm/xen_overlay.c|   2 +-
 hw/i386/kvm/xen_xenstore.c   |   2 +-
 hw/intc/arm_gicv3_cpuif.c|   2 +-
 hw/intc/s390_flic.c  |  18 ++--
 hw/misc/edu.c|   4 +-
 hw/misc/imx6_src.c   |   2 +-
 hw/misc/imx7_src.c   |   2 +-
 hw/net/xen_nic.c |   8 +-
 hw/ppc/pegasos2.c|   2 +-
 hw/ppc/ppc.c |   4 +-
 hw/ppc/spapr.c   |   2 +-
 hw/ppc/spapr_rng.c   |   4 +-
 hw/ppc/spapr_softmmu.c   |   4 +-
 hw/remote/mpqemu-link.c  |  20 ++---
 hw/remote/vfio-user-obj.c|   2 +-
 hw/s390x/s390-skeys.c|   2 +-
 migration/block-dirty-bitmap.c   |   4 +-
 migration/block.c|  16 ++--
 migration/colo.c |  60 +++---
 migration/dirtyrate.c|  12 +--
 migration/migration.c|  52 ++--
 migration/ram.c  |  12 +--
 replay/replay-internal.c |   2 +-
 semihosting/console.c|   8 +-
 stubs/iothread-lock.c|   6 +-
 system/cpu-throttle.c|   4 +-
 system/cpus.c|  51 ++--
 system/dirtylimit.c  |   4 +-
 system/memory.c  |   2 +-
 system/physmem.c |   8 +-
 system/runstate.c|   2 +-
 system/watchpoint.c  |   4 +-
 target/arm/arm-powerctl.c|  14 ++--
 target/arm/helper.c  |   4 +-
 target/arm/hvf/hvf.c |   8 +-
 target/arm/kvm.c |   4 +-
 target/arm/kvm64.c   |   4 +-
 target/arm/ptw.c |   6 +-
 target/arm/tcg/helper-a64.c  |   8 +-
 target/arm/tcg/m_helper.c|   6 +-
 target/arm/tcg/op_helper.c   |  24 +++---
 target/arm/tcg/psci.c|   2 +-
 target/hppa/int_helper.c |   8 +-
 target/i386/hvf/hvf.c|   6 +-
 target/i386/kvm/hyperv.c |   4 +-
 target/i386/kvm/kvm.c|  28 +++
 target/i386/kvm/xen-emu.c|  14 ++--
 target/i386/nvmm/nvmm-accel-ops.c|   4 +-
 target/i386/nvmm/nvmm-all.c  |  20 ++---
 target/i386/tcg/sysemu/fpu_helper.c  |   6 +-
 target/i386/tcg/sysemu/misc_helper.c |   4 +-
 target/i386/whpx/whpx-accel-ops.c|   4 +-
 target/i386/whpx/whpx-all.c  |  24 +++---
 target/loongarch/csr_helper.c|   4 +-
 target/mips/kvm.c|   4 +-
 target/mips/tcg/sysemu/cp0_helper.c  |   4 +-
 target/openrisc/sys_helper.c |  16 ++--
 target/ppc/excp_helper.c |  12 +--
 target/ppc/kvm.c |   4 +-
 target/ppc/m

[PATCH v2 0/5] Make Big QEMU Lock naming consistent

2023-12-12 Thread Stefan Hajnoczi
v2:
- Rename APIs bql_*() [PeterX]
- Spell out "Big QEMU Lock (BQL)" in doc comments [PeterX]
- Rename "iolock" variables in hw/remote/mpqemu-link.c [Harsh]
- Fix bql_auto_lock() indentation in Patch 2 [Ilya]
- "with BQL taken" -> "with the BQL taken" [Philippe]
- "under BQL" -> "under the BQL" [Philippe]

The Big QEMU Lock ("BQL") has two other names: "iothread lock" and "QEMU global
mutex". The term "iothread lock" is easily confused with the unrelated --object
iothread (iothread.c).

This series updates the code and documentation to consistently use "BQL". This
makes the code easier to understand.

Stefan Hajnoczi (5):
  system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
  qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to BQL_LOCK_GUARD
  qemu/main-loop: rename qemu_cond_wait_iothread() to
qemu_cond_wait_bql()
  Replace "iothread lock" with "BQL" in comments
  Rename "QEMU global mutex" to "BQL" in comments and docs

 docs/devel/multi-thread-tcg.rst  |   7 +-
 docs/devel/qapi-code-gen.rst |   2 +-
 docs/devel/replay.rst|   2 +-
 docs/devel/reset.rst |   2 +-
 docs/devel/multiple-iothreads.txt|  16 ++--
 hw/display/qxl.h |   2 +-
 include/block/aio-wait.h |   2 +-
 include/block/blockjob.h |   6 +-
 include/exec/cpu-common.h|   2 +-
 include/exec/memory.h|   4 +-
 include/exec/ramblock.h  |   2 +-
 include/io/task.h|   2 +-
 include/migration/register.h |   8 +-
 include/qemu/coroutine-core.h|   2 +-
 include/qemu/coroutine.h |   2 +-
 include/qemu/main-loop.h |  69 
 include/qemu/thread.h|   2 +-
 target/arm/internals.h   |   4 +-
 accel/accel-blocker.c|  10 +--
 accel/dummy-cpus.c   |   8 +-
 accel/hvf/hvf-accel-ops.c|   4 +-
 accel/kvm/kvm-accel-ops.c|   4 +-
 accel/kvm/kvm-all.c  |  22 ++---
 accel/tcg/cpu-exec.c |  26 +++---
 accel/tcg/cputlb.c   |  20 ++---
 accel/tcg/tcg-accel-ops-icount.c |   6 +-
 accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
 accel/tcg/tcg-accel-ops-rr.c |  18 ++--
 accel/tcg/tcg-accel-ops.c|   2 +-
 accel/tcg/translate-all.c|   2 +-
 cpu-common.c |   4 +-
 dump/dump.c  |   4 +-
 hw/block/dataplane/virtio-blk.c  |   8 +-
 hw/block/virtio-blk.c|   2 +-
 hw/core/cpu-common.c |   6 +-
 hw/display/virtio-gpu.c  |   2 +-
 hw/i386/intel_iommu.c|   6 +-
 hw/i386/kvm/xen_evtchn.c |  30 +++
 hw/i386/kvm/xen_gnttab.c |   2 +-
 hw/i386/kvm/xen_overlay.c|   2 +-
 hw/i386/kvm/xen_xenstore.c   |   2 +-
 hw/intc/arm_gicv3_cpuif.c|   2 +-
 hw/intc/s390_flic.c  |  18 ++--
 hw/mips/mips_int.c   |   2 +-
 hw/misc/edu.c|   4 +-
 hw/misc/imx6_src.c   |   2 +-
 hw/misc/imx7_src.c   |   2 +-
 hw/net/xen_nic.c |   8 +-
 hw/ppc/pegasos2.c|   2 +-
 hw/ppc/ppc.c |   6 +-
 hw/ppc/spapr.c   |   2 +-
 hw/ppc/spapr_events.c|   2 +-
 hw/ppc/spapr_rng.c   |   4 +-
 hw/ppc/spapr_softmmu.c   |   4 +-
 hw/remote/mpqemu-link.c  |  22 ++---
 hw/remote/vfio-user-obj.c|   2 +-
 hw/s390x/s390-skeys.c|   2 +-
 hw/scsi/virtio-scsi-dataplane.c  |   6 +-
 migration/block-dirty-bitmap.c   |  14 ++--
 migration/block.c|  40 -
 migration/colo.c |  62 +++---
 migration/dirtyrate.c|  12 +--
 migration/migration.c|  54 ++--
 migration/ram.c  |  16 ++--
 net/tap.c|   2 +-
 replay/replay-internal.c |   2 +-
 semihosting/console.c|   8 +-
 stubs/iothread-lock.c|   6 +-
 system/cpu-throttle.c|   6 +-
 system/cpus.c|  55 +++--
 system/dirtylimit.c  |   4 +-
 system/memory.c  |   2 +-
 system/physmem.c |  14 ++--
 system/runstate.c|   2 +-
 system/watchpoint.c  |   4 +-
 target/arm/arm-powerctl.c|  14 ++--
 target/arm/helper.c  |   6 +-
 target/arm/hvf/hvf.c |   8 +-
 target/arm/kvm.c |   4 +-
 target/arm/kvm64.c   |   4 +-
 target/arm/ptw.c |   6

[PATCH v2 3/5] qemu/main-loop: rename qemu_cond_wait_iothread() to qemu_cond_wait_bql()

2023-12-12 Thread Stefan Hajnoczi
The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Cédric Le Goater 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/qemu/main-loop.h  | 10 +-
 accel/tcg/tcg-accel-ops-rr.c  |  4 ++--
 hw/display/virtio-gpu.c   |  2 +-
 hw/ppc/spapr_events.c |  2 +-
 system/cpu-throttle.c |  2 +-
 system/cpus.c |  4 ++--
 target/i386/nvmm/nvmm-accel-ops.c |  2 +-
 target/i386/whpx/whpx-accel-ops.c |  2 +-
 8 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 1da0fb186e..662fdca566 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -372,17 +372,17 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(BQLLockAuto, 
bql_auto_unlock)
 = bql_auto_lock(__FILE__, __LINE__)
 
 /*
- * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
+ * qemu_cond_wait_bql: Wait on condition for the Big QEMU Lock (BQL)
  *
- * This function atomically releases the main loop mutex and causes
+ * This function atomically releases the Big QEMU Lock (BQL) and causes
  * the calling thread to block on the condition.
  */
-void qemu_cond_wait_iothread(QemuCond *cond);
+void qemu_cond_wait_bql(QemuCond *cond);
 
 /*
- * qemu_cond_timedwait_iothread: like the previous, but with timeout
+ * qemu_cond_timedwait_bql: like the previous, but with timeout
  */
-void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
+void qemu_cond_timedwait_bql(QemuCond *cond, int ms);
 
 /* internal interfaces */
 
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index c4ea372a3f..5794e5a9ce 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -111,7 +111,7 @@ static void rr_wait_io_event(void)
 
 while (all_cpu_threads_idle()) {
 rr_stop_kick_timer();
-qemu_cond_wait_iothread(first_cpu->halt_cond);
+qemu_cond_wait_bql(first_cpu->halt_cond);
 }
 
 rr_start_kick_timer();
@@ -198,7 +198,7 @@ static void *rr_cpu_thread_fn(void *arg)
 
 /* wait for initial kick-off after machine start */
 while (first_cpu->stopped) {
-qemu_cond_wait_iothread(first_cpu->halt_cond);
+qemu_cond_wait_bql(first_cpu->halt_cond);
 
 /* process any pending work */
 CPU_FOREACH(cpu) {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index b016d3bac8..67c5be1a4e 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1512,7 +1512,7 @@ void virtio_gpu_reset(VirtIODevice *vdev)
 g->reset_finished = false;
 qemu_bh_schedule(g->reset_bh);
 while (!g->reset_finished) {
-qemu_cond_wait_iothread(&g->reset_cond);
+qemu_cond_wait_bql(&g->reset_cond);
 }
 } else {
 virtio_gpu_reset_bh(g);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index deb4641505..cb0587 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -899,7 +899,7 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 }
 return;
 }
-qemu_cond_wait_iothread(&spapr->fwnmi_machine_check_interlock_cond);
+qemu_cond_wait_bql(&spapr->fwnmi_machine_check_interlock_cond);
 if (spapr->fwnmi_machine_check_addr == -1) {
 /*
  * If the machine was reset while waiting for the interlock,
diff --git a/system/cpu-throttle.c b/system/cpu-throttle.c
index 786a9a5639..c951a6c65e 100644
--- a/system/cpu-throttle.c
+++ b/system/cpu-throttle.c
@@ -54,7 +54,7 @@ static void cpu_throttle_thread(CPUState *cpu, 
run_on_cpu_data opaque)
 endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
 while (sleeptime_ns > 0 && !cpu->stop) {
 if (sleeptime_ns > SCALE_MS) {
-qemu_cond_timedwait_iothread(cpu->halt_cond,
+qemu_cond_timedwait_bql(cpu->halt_cond,
  sleeptime_ns / SCALE_MS);
 } else {
 bql_unlock();
diff --git a/system/cpus.c b/system/cpus.c
index 9b68dc9c7c..c8e2772b5f 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -514,12 +514,12 @@ void bql_unlock(void)
 qemu_mutex_unlock(&bql);
 }
 
-void qemu_cond_wait_iothread(QemuCond *cond)
+void qemu_cond_wait_bql(QemuCond *cond)
 {
 qemu_cond_wait(cond, &bql);
 }
 
-void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
+void qemu_cond_timedwait_bql(QemuCond *cond, int ms)
 {
 qemu_cond_timedwait(cond, &bql, ms);
 }
diff --git a/target/i386/nvmm/nvmm-accel-ops.c 
b/target/i386/nvmm/nvmm-accel-ops.c
index f9d5e9a37a..6b2bfd9b9c 100644
--- a/target/i386/nvmm/nvmm-accel-ops.c
+++ b/target/i386/nvmm/nvmm-accel-ops.c
@@ -48,7 +48,7 @@ static void *qemu_nvmm_cpu_thread_fn(void *arg)
 

Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-12-07 Thread Stefan Hajnoczi
On Fri, Dec 01, 2023 at 10:42:43AM +0530, Harsh Prateek Bora wrote:
> On 11/30/23 02:56, Stefan Hajnoczi wrote:
> > diff --git a/hw/remote/mpqemu-link.c b/hw/remote/mpqemu-link.c
> > index 9bd98e8219..ffb2c25145 100644
> > --- a/hw/remote/mpqemu-link.c
> > +++ b/hw/remote/mpqemu-link.c
> > @@ -33,7 +33,7 @@
> >*/
> >   bool mpqemu_msg_send(MPQemuMsg *msg, QIOChannel *ioc, Error **errp)
> >   {
> > -bool iolock = qemu_mutex_iothread_locked();
> > +bool iolock = qemu_bql_locked();
> 
> Should var name (one more below) be updated to reflect this update ?

Yes. I'll grep for that tree-wide because there might be other
instances.

Stefan


signature.asc
Description: PGP signature


[PATCH v2 14/14] block: remove outdated AioContext locking comments

2023-12-05 Thread Stefan Hajnoczi
The AioContext lock no longer exists.

There is one noteworthy change:

  - * More specifically, these functions use BDRV_POLL_WHILE(bs), which
  - * requires the caller to be either in the main thread and hold
  - * the BlockdriverState (bs) AioContext lock, or directly in the
  - * home thread that runs the bs AioContext. Calling them from
  - * another thread in another AioContext would cause deadlocks.
  + * More specifically, these functions use BDRV_POLL_WHILE(bs), which requires
  + * the caller to be either in the main thread or directly in the home thread
  + * that runs the bs AioContext. Calling them from another thread in another
  + * AioContext would cause deadlocks.

I am not sure whether deadlocks are still possible. Maybe they have just
moved to the fine-grained locks that have replaced the AioContext. Since
I am not sure if the deadlocks are gone, I have kept the substance
unchanged and just removed mention of the AioContext.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
---
 include/block/block-common.h |  3 --
 include/block/block-io.h |  9 ++--
 include/block/block_int-common.h |  2 -
 block.c  | 73 ++--
 block/block-backend.c|  8 ---
 block/export/vhost-user-blk-server.c |  4 --
 tests/qemu-iotests/202   |  2 +-
 tests/qemu-iotests/203   |  3 +-
 8 files changed, 22 insertions(+), 82 deletions(-)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index d7599564db..a846023a09 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -70,9 +70,6 @@
  * automatically takes the graph rdlock when calling the wrapped function. In
  * the same way, no_co_wrapper_bdrv_wrlock functions automatically take the
  * graph wrlock.
- *
- * If the first parameter of the function is a BlockDriverState, BdrvChild or
- * BlockBackend pointer, the AioContext lock for it is taken in the wrapper.
  */
 #define no_co_wrapper
 #define no_co_wrapper_bdrv_rdlock
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 8eb39a858b..b49e0537dd 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -332,11 +332,10 @@ bdrv_co_copy_range(BdrvChild *src, int64_t src_offset,
  * "I/O or GS" API functions. These functions can run without
  * the BQL, but only in one specific iothread/main loop.
  *
- * More specifically, these functions use BDRV_POLL_WHILE(bs), which
- * requires the caller to be either in the main thread and hold
- * the BlockdriverState (bs) AioContext lock, or directly in the
- * home thread that runs the bs AioContext. Calling them from
- * another thread in another AioContext would cause deadlocks.
+ * More specifically, these functions use BDRV_POLL_WHILE(bs), which requires
+ * the caller to be either in the main thread or directly in the home thread
+ * that runs the bs AioContext. Calling them from another thread in another
+ * AioContext would cause deadlocks.
  *
  * Therefore, these functions are not proper I/O, because they
  * can't run in *any* iothreads, but only in a specific one.
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 4e31d161c5..151279d481 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -1192,8 +1192,6 @@ struct BlockDriverState {
 /* The error object in use for blocking operations on backing_hd */
 Error *backing_blocker;
 
-/* Protected by AioContext lock */
-
 /*
  * If we are reading a disk image, give its size in sectors.
  * Generally read-only; it is written to by load_snapshot and
diff --git a/block.c b/block.c
index 434b7f4d72..a097772238 100644
--- a/block.c
+++ b/block.c
@@ -1616,11 +1616,6 @@ out:
 g_free(gen_node_name);
 }
 
-/*
- * The caller must always hold @bs AioContext lock, because this function calls
- * bdrv_refresh_total_sectors() which polls when called from non-coroutine
- * context.
- */
 static int no_coroutine_fn GRAPH_UNLOCKED
 bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
  QDict *options, int open_flags, Error **errp)
@@ -2901,7 +2896,7 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission 
qapi_perm)
  * Replaces the node that a BdrvChild points to without updating permissions.
  *
  * If @new_bs is non-NULL, the parent of @child must already be drained through
- * @child and the caller must hold the AioContext lock for @new_bs.
+ * @child.
  */
 static void GRAPH_WRLOCK
 bdrv_replace_child_noperm(BdrvChild *child, BlockDriverState *new_bs)
@@ -3041,9 +3036,8 @@ static TransactionActionDrv bdrv_attach_child_common_drv 
= {
  *
  * Returns new created child.
  *
- * The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
- * @child_bs can move to a different AioContext in this function. Callers must
- * make sure that their AioContext locking is stil

[PATCH v2 13/14] job: remove outdated AioContext locking comments

2023-12-05 Thread Stefan Hajnoczi
The AioContext lock no longer exists.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
---
 include/qemu/job.h | 20 
 1 file changed, 20 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index e502787dd8..9ea98b5927 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -67,8 +67,6 @@ typedef struct Job {
 
 /**
  * The completion function that will be called when the job completes.
- * Called with AioContext lock held, since many callback implementations
- * use bdrv_* functions that require to hold the lock.
  */
 BlockCompletionFunc *cb;
 
@@ -264,9 +262,6 @@ struct JobDriver {
  *
  * This callback will not be invoked if the job has already failed.
  * If it fails, abort and then clean will be called.
- *
- * Called with AioContext lock held, since many callbacs implementations
- * use bdrv_* functions that require to hold the lock.
  */
 int (*prepare)(Job *job);
 
@@ -277,9 +272,6 @@ struct JobDriver {
  *
  * All jobs will complete with a call to either .commit() or .abort() but
  * never both.
- *
- * Called with AioContext lock held, since many callback implementations
- * use bdrv_* functions that require to hold the lock.
  */
 void (*commit)(Job *job);
 
@@ -290,9 +282,6 @@ struct JobDriver {
  *
  * All jobs will complete with a call to either .commit() or .abort() but
  * never both.
- *
- * Called with AioContext lock held, since many callback implementations
- * use bdrv_* functions that require to hold the lock.
  */
 void (*abort)(Job *job);
 
@@ -301,9 +290,6 @@ struct JobDriver {
  * .commit() or .abort(). Regardless of which callback is invoked after
  * completion, .clean() will always be called, even if the job does not
  * belong to a transaction group.
- *
- * Called with AioContext lock held, since many callbacs implementations
- * use bdrv_* functions that require to hold the lock.
  */
 void (*clean)(Job *job);
 
@@ -318,17 +304,12 @@ struct JobDriver {
  * READY).
  * (If the callback is NULL, the job is assumed to terminate
  * without I/O.)
- *
- * Called with AioContext lock held, since many callback implementations
- * use bdrv_* functions that require to hold the lock.
  */
 bool (*cancel)(Job *job, bool force);
 
 
 /**
  * Called when the job is freed.
- * Called with AioContext lock held, since many callback implementations
- * use bdrv_* functions that require to hold the lock.
  */
 void (*free)(Job *job);
 };
@@ -424,7 +405,6 @@ void job_ref_locked(Job *job);
  * Release a reference that was previously acquired with job_ref_locked() or
  * job_create(). If it's the last reference to the object, it will be freed.
  *
- * Takes AioContext lock internally to invoke a job->driver callback.
  * Called with job lock held.
  */
 void job_unref_locked(Job *job);
-- 
2.43.0




[PATCH v2 11/14] docs: remove AioContext lock from IOThread docs

2023-12-05 Thread Stefan Hajnoczi
Encourage the use of locking primitives and stop mentioning the
AioContext lock since it is being removed.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
---
 docs/devel/multiple-iothreads.txt | 45 +++
 1 file changed, 15 insertions(+), 30 deletions(-)

diff --git a/docs/devel/multiple-iothreads.txt 
b/docs/devel/multiple-iothreads.txt
index a3e949f6b3..4865196bde 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -88,27 +88,18 @@ loop, depending on which AioContext instance the caller 
passes in.
 
 How to synchronize with an IOThread
 ---
-AioContext is not thread-safe so some rules must be followed when using file
-descriptors, event notifiers, timers, or BHs across threads:
+Variables that can be accessed by multiple threads require some form of
+synchronization such as qemu_mutex_lock(), rcu_read_lock(), etc.
 
-1. AioContext functions can always be called safely.  They handle their
-own locking internally.
-
-2. Other threads wishing to access the AioContext must use
-aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
-context is acquired no other thread can access it or run event loop iterations
-in this AioContext.
-
-Legacy code sometimes nests aio_context_acquire()/aio_context_release() calls.
-Do not use nesting anymore, it is incompatible with the BDRV_POLL_WHILE() macro
-used in the block layer and can lead to hangs.
-
-There is currently no lock ordering rule if a thread needs to acquire multiple
-AioContexts simultaneously.  Therefore, it is only safe for code holding the
-QEMU global mutex to acquire other AioContexts.
+AioContext functions like aio_set_fd_handler(), aio_set_event_notifier(),
+aio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger
+activity in an IOThread.
 
 Side note: the best way to schedule a function call across threads is to call
-aio_bh_schedule_oneshot().  No acquire/release or locking is needed.
+aio_bh_schedule_oneshot().
+
+The main loop thread can wait synchronously for a condition using
+AIO_WAIT_WHILE().
 
 AioContext and the block layer
 --
@@ -124,22 +115,16 @@ Block layer code must therefore expect to run in an 
IOThread and avoid using
 old APIs that implicitly use the main loop.  See the "How to program for
 IOThreads" above for information on how to do that.
 
-If main loop code such as a QMP function wishes to access a BlockDriverState
-it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure
-that callbacks in the IOThread do not run in parallel.
-
 Code running in the monitor typically needs to ensure that past
 requests from the guest are completed.  When a block device is running
 in an IOThread, the IOThread can also process requests from the guest
 (via ioeventfd).  To achieve both objects, wrap the code between
 bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
-section".  The functions must be called between aio_context_acquire()
-and aio_context_release().  You can freely release and re-acquire the
-AioContext within a drained section.
+section".
 
-Long-running jobs (usually in the form of coroutines) are best scheduled in
-the BlockDriverState's AioContext to avoid the need to acquire/release around
-each bdrv_*() call.  The functions bdrv_add/remove_aio_context_notifier,
-or alternatively blk_add/remove_aio_context_notifier if you use BlockBackends,
-can be used to get a notification whenever bdrv_try_change_aio_context() moves 
a
+Long-running jobs (usually in the form of coroutines) are often scheduled in
+the BlockDriverState's AioContext.  The functions
+bdrv_add/remove_aio_context_notifier, or alternatively
+blk_add/remove_aio_context_notifier if you use BlockBackends, can be used to
+get a notification whenever bdrv_try_change_aio_context() moves a
 BlockDriverState to a different AioContext.
-- 
2.43.0




[PATCH v2 10/14] aio: remove aio_context_acquire()/aio_context_release() API

2023-12-05 Thread Stefan Hajnoczi
Delete these functions because nothing calls these functions anymore.

I introduced these APIs in commit 98563fc3ec44 ("aio: add
aio_context_acquire() and aio_context_release()") in 2014. It's with a
sigh of relief that I delete these APIs almost 10 years later.

Thanks to Paolo Bonzini's vision for multi-queue QEMU, we got an
understanding of where the code needed to go in order to remove the
limitations that the original dataplane and the IOThread/AioContext
approach that followed it.

Emanuele Giuseppe Esposito had the splendid determination to convert
large parts of the codebase so that they no longer needed the AioContext
lock. This was a painstaking process, both in the actual code changes
required and the iterations of code review that Emanuele eked out of
Kevin and me over many months.

Kevin Wolf tackled multitudes of graph locking conversions to protect
in-flight I/O from run-time changes to the block graph as well as the
clang Thread Safety Analysis annotations that allow the compiler to
check whether the graph lock is being used correctly.

And me, well, I'm just here to add some pizzazz to the QEMU multi-queue
block layer :). Thank you to everyone who helped with this effort,
including Eric Blake, code reviewer extraordinaire, and others who I've
forgotten to mention.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
---
 include/block/aio.h | 17 -
 util/async.c| 10 --
 2 files changed, 27 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index f08b358077..af05512a7d 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -278,23 +278,6 @@ void aio_context_ref(AioContext *ctx);
  */
 void aio_context_unref(AioContext *ctx);
 
-/* Take ownership of the AioContext.  If the AioContext will be shared between
- * threads, and a thread does not want to be interrupted, it will have to
- * take ownership around calls to aio_poll().  Otherwise, aio_poll()
- * automatically takes care of calling aio_context_acquire and
- * aio_context_release.
- *
- * Note that this is separate from bdrv_drained_begin/bdrv_drained_end.  A
- * thread still has to call those to avoid being interrupted by the guest.
- *
- * Bottom halves, timers and callbacks can be created or removed without
- * acquiring the AioContext.
- */
-void aio_context_acquire(AioContext *ctx);
-
-/* Relinquish ownership of the AioContext. */
-void aio_context_release(AioContext *ctx);
-
 /**
  * aio_bh_schedule_oneshot_full: Allocate a new bottom half structure that will
  * run only once and as soon as possible.
diff --git a/util/async.c b/util/async.c
index dfd44ef612..460529057c 100644
--- a/util/async.c
+++ b/util/async.c
@@ -719,16 +719,6 @@ void aio_context_unref(AioContext *ctx)
 g_source_unref(&ctx->source);
 }
 
-void aio_context_acquire(AioContext *ctx)
-{
-/* TODO remove this function */
-}
-
-void aio_context_release(AioContext *ctx)
-{
-/* TODO remove this function */
-}
-
 QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
 
 AioContext *qemu_get_current_aio_context(void)
-- 
2.43.0




[PATCH v2 07/14] block: remove bdrv_co_lock()

2023-12-05 Thread Stefan Hajnoczi
The bdrv_co_lock() and bdrv_co_unlock() functions are already no-ops.
Remove them.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-global-state.h | 14 --
 block.c| 10 --
 blockdev.c |  4 
 3 files changed, 28 deletions(-)

diff --git a/include/block/block-global-state.h 
b/include/block/block-global-state.h
index 0327f1c605..4ec0b217f0 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -267,20 +267,6 @@ int bdrv_debug_remove_breakpoint(BlockDriverState *bs, 
const char *tag);
 int bdrv_debug_resume(BlockDriverState *bs, const char *tag);
 bool bdrv_debug_is_suspended(BlockDriverState *bs, const char *tag);
 
-/**
- * Locks the AioContext of @bs if it's not the current AioContext. This avoids
- * double locking which could lead to deadlocks: This is a coroutine_fn, so we
- * know we already own the lock of the current AioContext.
- *
- * May only be called in the main thread.
- */
-void coroutine_fn bdrv_co_lock(BlockDriverState *bs);
-
-/**
- * Unlocks the AioContext of @bs if it's not the current AioContext.
- */
-void coroutine_fn bdrv_co_unlock(BlockDriverState *bs);
-
 bool bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
GHashTable *visited, Transaction *tran,
Error **errp);
diff --git a/block.c b/block.c
index 91ace5d2d5..434b7f4d72 100644
--- a/block.c
+++ b/block.c
@@ -7431,16 +7431,6 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, 
AioContext *old_ctx)
 bdrv_dec_in_flight(bs);
 }
 
-void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
-{
-/* TODO removed in next patch */
-}
-
-void coroutine_fn bdrv_co_unlock(BlockDriverState *bs)
-{
-/* TODO removed in next patch */
-}
-
 static void bdrv_do_remove_aio_context_notifier(BdrvAioNotifier *ban)
 {
 GLOBAL_STATE_CODE();
diff --git a/blockdev.c b/blockdev.c
index 8a1b28f830..3a5e7222ec 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2264,18 +2264,14 @@ void coroutine_fn qmp_block_resize(const char *device, 
const char *node_name,
 return;
 }
 
-bdrv_co_lock(bs);
 bdrv_drained_begin(bs);
-bdrv_co_unlock(bs);
 
 old_ctx = bdrv_co_enter(bs);
 blk_co_truncate(blk, size, false, PREALLOC_MODE_OFF, 0, errp);
 bdrv_co_leave(bs, old_ctx);
 
-bdrv_co_lock(bs);
 bdrv_drained_end(bs);
 blk_co_unref(blk);
-bdrv_co_unlock(bs);
 }
 
 void qmp_block_stream(const char *job_id, const char *device,
-- 
2.43.0




[PATCH v2 12/14] scsi: remove outdated AioContext lock comment

2023-12-05 Thread Stefan Hajnoczi
The SCSI subsystem no longer uses the AioContext lock. Request
processing runs exclusively in the BlockBackend's AioContext since
"scsi: only access SCSIDevice->requests from one thread" and hence the
lock is unnecessary.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
---
 hw/scsi/scsi-disk.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 61be3d395a..2e7e1e9a1c 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -355,7 +355,6 @@ done:
 scsi_req_unref(&r->req);
 }
 
-/* Called with AioContext lock held */
 static void scsi_dma_complete(void *opaque, int ret)
 {
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
-- 
2.43.0




[PATCH v2 05/14] graph-lock: remove AioContext locking

2023-12-05 Thread Stefan Hajnoczi
Stop acquiring/releasing the AioContext lock in
bdrv_graph_wrlock()/bdrv_graph_unlock() since the lock no longer has any
effect.

The distinction between bdrv_graph_wrunlock() and
bdrv_graph_wrunlock_ctx() becomes meaningless and they can be collapsed
into one function.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 include/block/graph-lock.h | 21 ++---
 block.c| 50 +++---
 block/backup.c |  4 +--
 block/blklogwrites.c   |  8 ++---
 block/blkverify.c  |  4 +--
 block/block-backend.c  | 11 +++
 block/commit.c | 16 +-
 block/graph-lock.c | 44 ++
 block/mirror.c | 22 ++---
 block/qcow2.c  |  4 +--
 block/quorum.c |  8 ++---
 block/replication.c| 14 -
 block/snapshot.c   |  4 +--
 block/stream.c | 12 +++
 block/vmdk.c   | 20 ++--
 blockdev.c |  8 ++---
 blockjob.c | 12 +++
 tests/unit/test-bdrv-drain.c   | 40 
 tests/unit/test-bdrv-graph-mod.c   | 20 ++--
 scripts/block-coroutine-wrapper.py |  4 +--
 20 files changed, 133 insertions(+), 193 deletions(-)

diff --git a/include/block/graph-lock.h b/include/block/graph-lock.h
index 22b5db1ed9..d7545e82d0 100644
--- a/include/block/graph-lock.h
+++ b/include/block/graph-lock.h
@@ -110,34 +110,17 @@ void unregister_aiocontext(AioContext *ctx);
  *
  * The wrlock can only be taken from the main loop, with BQL held, as only the
  * main loop is allowed to modify the graph.
- *
- * If @bs is non-NULL, its AioContext is temporarily released.
- *
- * This function polls. Callers must not hold the lock of any AioContext other
- * than the current one and the one of @bs.
  */
 void no_coroutine_fn TSA_ACQUIRE(graph_lock) TSA_NO_TSA
-bdrv_graph_wrlock(BlockDriverState *bs);
+bdrv_graph_wrlock(void);
 
 /*
  * bdrv_graph_wrunlock:
  * Write finished, reset global has_writer to 0 and restart
  * all readers that are waiting.
- *
- * If @bs is non-NULL, its AioContext is temporarily released.
  */
 void no_coroutine_fn TSA_RELEASE(graph_lock) TSA_NO_TSA
-bdrv_graph_wrunlock(BlockDriverState *bs);
-
-/*
- * bdrv_graph_wrunlock_ctx:
- * Write finished, reset global has_writer to 0 and restart
- * all readers that are waiting.
- *
- * If @ctx is non-NULL, its lock is temporarily released.
- */
-void no_coroutine_fn TSA_RELEASE(graph_lock) TSA_NO_TSA
-bdrv_graph_wrunlock_ctx(AioContext *ctx);
+bdrv_graph_wrunlock(void);
 
 /*
  * bdrv_graph_co_rdlock:
diff --git a/block.c b/block.c
index bfb0861ec6..25e1ebc606 100644
--- a/block.c
+++ b/block.c
@@ -1708,12 +1708,12 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver 
*drv, const char *node_name,
 open_failed:
 bs->drv = NULL;
 
-bdrv_graph_wrlock(NULL);
+bdrv_graph_wrlock();
 if (bs->file != NULL) {
 bdrv_unref_child(bs, bs->file);
 assert(!bs->file);
 }
-bdrv_graph_wrunlock(NULL);
+bdrv_graph_wrunlock();
 
 g_free(bs->opaque);
 bs->opaque = NULL;
@@ -3575,9 +3575,9 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 
 bdrv_ref(drain_bs);
 bdrv_drained_begin(drain_bs);
-bdrv_graph_wrlock(backing_hd);
+bdrv_graph_wrlock();
 ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
-bdrv_graph_wrunlock(backing_hd);
+bdrv_graph_wrunlock();
 bdrv_drained_end(drain_bs);
 bdrv_unref(drain_bs);
 
@@ -3790,13 +3790,13 @@ BdrvChild *bdrv_open_child(const char *filename,
 return NULL;
 }
 
-bdrv_graph_wrlock(NULL);
+bdrv_graph_wrlock();
 ctx = bdrv_get_aio_context(bs);
 aio_context_acquire(ctx);
 child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
   errp);
 aio_context_release(ctx);
-bdrv_graph_wrunlock(NULL);
+bdrv_graph_wrunlock();
 
 return child;
 }
@@ -4650,9 +4650,9 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 aio_context_release(ctx);
 }
 
-bdrv_graph_wrlock(NULL);
+bdrv_graph_wrlock();
 tran_commit(tran);
-bdrv_graph_wrunlock(NULL);
+bdrv_graph_wrunlock();
 
 QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) {
 BlockDriverState *bs = bs_entry->state.bs;
@@ -4669,9 +4669,9 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 goto cleanup;
 
 abort:
-bdrv_graph_wrlock(NULL);
+bdrv_graph_wrlock();
 tran_abort(tran);
-bdrv_graph_wrunlock(NULL);
+bdrv_graph_wrunlock();
 
 QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
 if (bs_entry->prepared) {
@@ -4852,12 +4852,12 @@ bdrv_reop

[PATCH v2 09/14] aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED()

2023-12-05 Thread Stefan Hajnoczi
Now that the AioContext lock no longer exists, AIO_WAIT_WHILE() and
AIO_WAIT_WHILE_UNLOCKED() are equivalent.

A future patch will get rid of AIO_WAIT_WHILE_UNLOCKED().

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
---
 include/block/aio-wait.h | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index 5449b6d742..157f105916 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -63,9 +63,6 @@ extern AioWait global_aio_wait;
  * @ctx: the aio context, or NULL if multiple aio contexts (for which the
  *   caller does not hold a lock) are involved in the polling condition.
  * @cond: wait while this conditional expression is true
- * @unlock: whether to unlock and then lock again @ctx. This applies
- * only when waiting for another AioContext from the main loop.
- * Otherwise it's ignored.
  *
  * Wait while a condition is true.  Use this to implement synchronous
  * operations that require event loop activity.
@@ -78,7 +75,7 @@ extern AioWait global_aio_wait;
  * wait on conditions between two IOThreads since that could lead to deadlock,
  * go via the main loop instead.
  */
-#define AIO_WAIT_WHILE_INTERNAL(ctx, cond, unlock) ({  \
+#define AIO_WAIT_WHILE_INTERNAL(ctx, cond) ({  \
 bool waited_ = false;  \
 AioWait *wait_ = &global_aio_wait; \
 AioContext *ctx_ = (ctx);  \
@@ -95,13 +92,7 @@ extern AioWait global_aio_wait;
 assert(qemu_get_current_aio_context() ==   \
qemu_get_aio_context());\
 while ((cond)) {   \
-if (unlock && ctx_) {  \
-aio_context_release(ctx_); \
-}  \
 aio_poll(qemu_get_aio_context(), true);\
-if (unlock && ctx_) {  \
-aio_context_acquire(ctx_); \
-}  \
 waited_ = true;\
 }  \
 }  \
@@ -109,10 +100,11 @@ extern AioWait global_aio_wait;
 waited_; })
 
 #define AIO_WAIT_WHILE(ctx, cond)  \
-AIO_WAIT_WHILE_INTERNAL(ctx, cond, true)
+AIO_WAIT_WHILE_INTERNAL(ctx, cond)
 
+/* TODO replace this with AIO_WAIT_WHILE() in a future patch */
 #define AIO_WAIT_WHILE_UNLOCKED(ctx, cond) \
-AIO_WAIT_WHILE_INTERNAL(ctx, cond, false)
+AIO_WAIT_WHILE_INTERNAL(ctx, cond)
 
 /**
  * aio_wait_kick:
-- 
2.43.0




[PATCH v2 03/14] tests: remove aio_context_acquire() tests

2023-12-05 Thread Stefan Hajnoczi
The aio_context_acquire() API is being removed. Drop the test case that
calls the API.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 tests/unit/test-aio.c | 67 +--
 1 file changed, 1 insertion(+), 66 deletions(-)

diff --git a/tests/unit/test-aio.c b/tests/unit/test-aio.c
index 337b6e4ea7..e77d86be87 100644
--- a/tests/unit/test-aio.c
+++ b/tests/unit/test-aio.c
@@ -100,76 +100,12 @@ static void event_ready_cb(EventNotifier *e)
 
 /* Tests using aio_*.  */
 
-typedef struct {
-QemuMutex start_lock;
-EventNotifier notifier;
-bool thread_acquired;
-} AcquireTestData;
-
-static void *test_acquire_thread(void *opaque)
-{
-AcquireTestData *data = opaque;
-
-/* Wait for other thread to let us start */
-qemu_mutex_lock(&data->start_lock);
-qemu_mutex_unlock(&data->start_lock);
-
-/* event_notifier_set might be called either before or after
- * the main thread's call to poll().  The test case's outcome
- * should be the same in either case.
- */
-event_notifier_set(&data->notifier);
-aio_context_acquire(ctx);
-aio_context_release(ctx);
-
-data->thread_acquired = true; /* success, we got here */
-
-return NULL;
-}
-
 static void set_event_notifier(AioContext *nctx, EventNotifier *notifier,
EventNotifierHandler *handler)
 {
 aio_set_event_notifier(nctx, notifier, handler, NULL, NULL);
 }
 
-static void dummy_notifier_read(EventNotifier *n)
-{
-event_notifier_test_and_clear(n);
-}
-
-static void test_acquire(void)
-{
-QemuThread thread;
-AcquireTestData data;
-
-/* Dummy event notifier ensures aio_poll() will block */
-event_notifier_init(&data.notifier, false);
-set_event_notifier(ctx, &data.notifier, dummy_notifier_read);
-g_assert(!aio_poll(ctx, false)); /* consume aio_notify() */
-
-qemu_mutex_init(&data.start_lock);
-qemu_mutex_lock(&data.start_lock);
-data.thread_acquired = false;
-
-qemu_thread_create(&thread, "test_acquire_thread",
-   test_acquire_thread,
-   &data, QEMU_THREAD_JOINABLE);
-
-/* Block in aio_poll(), let other thread kick us and acquire context */
-aio_context_acquire(ctx);
-qemu_mutex_unlock(&data.start_lock); /* let the thread run */
-g_assert(aio_poll(ctx, true));
-g_assert(!data.thread_acquired);
-aio_context_release(ctx);
-
-qemu_thread_join(&thread);
-set_event_notifier(ctx, &data.notifier, NULL);
-event_notifier_cleanup(&data.notifier);
-
-g_assert(data.thread_acquired);
-}
-
 static void test_bh_schedule(void)
 {
 BHTestData data = { .n = 0 };
@@ -879,7 +815,7 @@ static void test_worker_thread_co_enter(void)
 qemu_thread_get_self(&this_thread);
 co = qemu_coroutine_create(co_check_current_thread, &this_thread);
 
-qemu_thread_create(&worker_thread, "test_acquire_thread",
+qemu_thread_create(&worker_thread, "test_aio_co_enter",
test_aio_co_enter,
co, QEMU_THREAD_JOINABLE);
 
@@ -899,7 +835,6 @@ int main(int argc, char **argv)
 while (g_main_context_iteration(NULL, false));
 
 g_test_init(&argc, &argv, NULL);
-g_test_add_func("/aio/acquire", test_acquire);
 g_test_add_func("/aio/bh/schedule", test_bh_schedule);
 g_test_add_func("/aio/bh/schedule10",   test_bh_schedule10);
 g_test_add_func("/aio/bh/cancel",   test_bh_cancel);
-- 
2.43.0




[PATCH v2 08/14] scsi: remove AioContext locking

2023-12-05 Thread Stefan Hajnoczi
The AioContext lock no longer has any effect. Remove it.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
---
 include/hw/virtio/virtio-scsi.h | 14 --
 hw/scsi/scsi-bus.c  |  2 --
 hw/scsi/scsi-disk.c | 31 +--
 hw/scsi/virtio-scsi.c   | 18 --
 4 files changed, 5 insertions(+), 60 deletions(-)

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index da8cb928d9..7f0573b1bf 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -101,20 +101,6 @@ struct VirtIOSCSI {
 uint32_t host_features;
 };
 
-static inline void virtio_scsi_acquire(VirtIOSCSI *s)
-{
-if (s->ctx) {
-aio_context_acquire(s->ctx);
-}
-}
-
-static inline void virtio_scsi_release(VirtIOSCSI *s)
-{
-if (s->ctx) {
-aio_context_release(s->ctx);
-}
-}
-
 void virtio_scsi_common_realize(DeviceState *dev,
 VirtIOHandleOutput ctrl,
 VirtIOHandleOutput evt,
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index f3ec11f892..df68a44b6a 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1731,9 +1731,7 @@ void scsi_device_purge_requests(SCSIDevice *sdev, 
SCSISense sense)
 {
 scsi_device_for_each_req_async(sdev, scsi_device_purge_one_req, NULL);
 
-aio_context_acquire(blk_get_aio_context(sdev->conf.blk));
 blk_drain(sdev->conf.blk);
-aio_context_release(blk_get_aio_context(sdev->conf.blk));
 scsi_device_set_ua(sdev, sense);
 }
 
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index a5048e0aaf..61be3d395a 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2339,14 +2339,10 @@ static void scsi_disk_reset(DeviceState *dev)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev.qdev, dev);
 uint64_t nb_sectors;
-AioContext *ctx;
 
 scsi_device_purge_requests(&s->qdev, SENSE_CODE(RESET));
 
-ctx = blk_get_aio_context(s->qdev.conf.blk);
-aio_context_acquire(ctx);
 blk_get_geometry(s->qdev.conf.blk, &nb_sectors);
-aio_context_release(ctx);
 
 nb_sectors /= s->qdev.blocksize / BDRV_SECTOR_SIZE;
 if (nb_sectors) {
@@ -2545,15 +2541,13 @@ static void scsi_unrealize(SCSIDevice *dev)
 static void scsi_hd_realize(SCSIDevice *dev, Error **errp)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
-AioContext *ctx = NULL;
+
 /* can happen for devices without drive. The error message for missing
  * backend will be issued in scsi_realize
  */
 if (s->qdev.conf.blk) {
-ctx = blk_get_aio_context(s->qdev.conf.blk);
-aio_context_acquire(ctx);
 if (!blkconf_blocksizes(&s->qdev.conf, errp)) {
-goto out;
+return;
 }
 }
 s->qdev.blocksize = s->qdev.conf.logical_block_size;
@@ -2562,16 +2556,11 @@ static void scsi_hd_realize(SCSIDevice *dev, Error 
**errp)
 s->product = g_strdup("QEMU HARDDISK");
 }
 scsi_realize(&s->qdev, errp);
-out:
-if (ctx) {
-aio_context_release(ctx);
-}
 }
 
 static void scsi_cd_realize(SCSIDevice *dev, Error **errp)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
-AioContext *ctx;
 int ret;
 uint32_t blocksize = 2048;
 
@@ -2587,8 +2576,6 @@ static void scsi_cd_realize(SCSIDevice *dev, Error **errp)
 blocksize = dev->conf.physical_block_size;
 }
 
-ctx = blk_get_aio_context(dev->conf.blk);
-aio_context_acquire(ctx);
 s->qdev.blocksize = blocksize;
 s->qdev.type = TYPE_ROM;
 s->features |= 1 << SCSI_DISK_F_REMOVABLE;
@@ -2596,7 +2583,6 @@ static void scsi_cd_realize(SCSIDevice *dev, Error **errp)
 s->product = g_strdup("QEMU CD-ROM");
 }
 scsi_realize(&s->qdev, errp);
-aio_context_release(ctx);
 }
 
 
@@ -2727,7 +2713,6 @@ static int get_device_type(SCSIDiskState *s)
 static void scsi_block_realize(SCSIDevice *dev, Error **errp)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
-AioContext *ctx;
 int sg_version;
 int rc;
 
@@ -2742,9 +2727,6 @@ static void scsi_block_realize(SCSIDevice *dev, Error 
**errp)
   "be removed in a future version");
 }
 
-ctx = blk_get_aio_context(s->qdev.conf.blk);
-aio_context_acquire(ctx);
-
 /* check we are using a driver managing SG_IO (version 3 and after) */
 rc = blk_ioctl(s->qdev.conf.blk, SG_GET_VERSION_NUM, &sg_version);
 if (rc < 0) {
@@ -2752,18 +2734,18 @@ static void scsi_block_realize(SCSIDevice *dev, Error 
**errp)
 if (rc != -EPERM) {
 error_append_hint(errp, "Is this a SCSI device?\n");
 }
-goto out;
+return;
 }
 if (sg_version < 3) {
 error_setg(errp, "scsi ge

[PATCH v2 02/14] scsi: assert that callbacks run in the correct AioContext

2023-12-05 Thread Stefan Hajnoczi
Since the removal of AioContext locking, the correctness of the code
relies on running requests from a single AioContext at any given time.

Add assertions that verify that callbacks are invoked in the correct
AioContext.

Signed-off-by: Stefan Hajnoczi 
---
 hw/scsi/scsi-disk.c  | 14 ++
 system/dma-helpers.c |  3 +++
 2 files changed, 17 insertions(+)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 2c1bbb3530..a5048e0aaf 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -273,6 +273,10 @@ static void scsi_aio_complete(void *opaque, int ret)
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+/* The request must only run in the BlockBackend's AioContext */
+assert(blk_get_aio_context(s->qdev.conf.blk) ==
+   qemu_get_current_aio_context());
+
 assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
 
@@ -370,8 +374,13 @@ static void scsi_dma_complete(void *opaque, int ret)
 
 static void scsi_read_complete_noio(SCSIDiskReq *r, int ret)
 {
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 uint32_t n;
 
+/* The request must only run in the BlockBackend's AioContext */
+assert(blk_get_aio_context(s->qdev.conf.blk) ==
+   qemu_get_current_aio_context());
+
 assert(r->req.aiocb == NULL);
 if (scsi_disk_req_check_error(r, ret, false)) {
 goto done;
@@ -496,8 +505,13 @@ static void scsi_read_data(SCSIRequest *req)
 
 static void scsi_write_complete_noio(SCSIDiskReq *r, int ret)
 {
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 uint32_t n;
 
+/* The request must only run in the BlockBackend's AioContext */
+assert(blk_get_aio_context(s->qdev.conf.blk) ==
+   qemu_get_current_aio_context());
+
 assert (r->req.aiocb == NULL);
 if (scsi_disk_req_check_error(r, ret, false)) {
 goto done;
diff --git a/system/dma-helpers.c b/system/dma-helpers.c
index 528117f256..9b221cf94e 100644
--- a/system/dma-helpers.c
+++ b/system/dma-helpers.c
@@ -119,6 +119,9 @@ static void dma_blk_cb(void *opaque, int ret)
 
 trace_dma_blk_cb(dbs, ret);
 
+/* DMAAIOCB is not thread-safe and must be accessed only from dbs->ctx */
+assert(ctx == qemu_get_current_aio_context());
+
 dbs->acb = NULL;
 dbs->offset += dbs->iov.size;
 
-- 
2.43.0




[PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

2023-12-05 Thread Stefan Hajnoczi
aio_context_acquire()/aio_context_release() has been replaced by
fine-grained locking to protect state shared by multiple threads. The
AioContext lock still plays the role of balancing locking in
AIO_WAIT_WHILE() and many functions in QEMU either require that the
AioContext lock is held or not held for this reason. In other words, the
AioContext lock is purely there for consistency with itself and serves
no real purpose anymore.

Stop actually acquiring/releasing the lock in
aio_context_acquire()/aio_context_release() so that subsequent patches
can remove callers across the codebase incrementally.

I have performed "make check" and qemu-iotests stress tests across
x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
result of eliminating the lock.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Acked-by: Kevin Wolf 
---
 util/async.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/async.c b/util/async.c
index 8f90ddc304..04ee83d220 100644
--- a/util/async.c
+++ b/util/async.c
@@ -725,12 +725,12 @@ void aio_context_unref(AioContext *ctx)
 
 void aio_context_acquire(AioContext *ctx)
 {
-qemu_rec_mutex_lock(&ctx->lock);
+/* TODO remove this function */
 }
 
 void aio_context_release(AioContext *ctx)
 {
-qemu_rec_mutex_unlock(&ctx->lock);
+/* TODO remove this function */
 }
 
 QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
-- 
2.43.0




[PATCH v2 01/14] virtio-scsi: replace AioContext lock with tmf_bh_lock

2023-12-05 Thread Stefan Hajnoczi
Protect the Task Management Function BH state with a lock. The TMF BH
runs in the main loop thread. An IOThread might process a TMF at the
same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
must be protected by a lock.

Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
This avoids more locking to protect the virtqueue and SCSI layer state.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Kevin Wolf 
---
 include/hw/virtio/virtio-scsi.h |  3 +-
 hw/scsi/virtio-scsi.c   | 62 ++---
 2 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 779568ab5d..da8cb928d9 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -85,8 +85,9 @@ struct VirtIOSCSI {
 
 /*
  * TMFs deferred to main loop BH. These fields are protected by
- * virtio_scsi_acquire().
+ * tmf_bh_lock.
  */
+QemuMutex tmf_bh_lock;
 QEMUBH *tmf_bh;
 QTAILQ_HEAD(, VirtIOSCSIReq) tmf_bh_list;
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 9c751bf296..4f8d35facc 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -123,6 +123,30 @@ static void virtio_scsi_complete_req(VirtIOSCSIReq *req)
 virtio_scsi_free_req(req);
 }
 
+static void virtio_scsi_complete_req_bh(void *opaque)
+{
+VirtIOSCSIReq *req = opaque;
+
+virtio_scsi_complete_req(req);
+}
+
+/*
+ * Called from virtio_scsi_do_one_tmf_bh() in main loop thread. The main loop
+ * thread cannot touch the virtqueue since that could race with an IOThread.
+ */
+static void virtio_scsi_complete_req_from_main_loop(VirtIOSCSIReq *req)
+{
+VirtIOSCSI *s = req->dev;
+
+if (!s->ctx || s->ctx == qemu_get_aio_context()) {
+/* No need to schedule a BH when there is no IOThread */
+virtio_scsi_complete_req(req);
+} else {
+/* Run request completion in the IOThread */
+aio_wait_bh_oneshot(s->ctx, virtio_scsi_complete_req_bh, req);
+}
+}
+
 static void virtio_scsi_bad_req(VirtIOSCSIReq *req)
 {
 virtio_error(VIRTIO_DEVICE(req->dev), "wrong size for virtio-scsi 
headers");
@@ -338,10 +362,7 @@ static void virtio_scsi_do_one_tmf_bh(VirtIOSCSIReq *req)
 
 out:
 object_unref(OBJECT(d));
-
-virtio_scsi_acquire(s);
-virtio_scsi_complete_req(req);
-virtio_scsi_release(s);
+virtio_scsi_complete_req_from_main_loop(req);
 }
 
 /* Some TMFs must be processed from the main loop thread */
@@ -354,18 +375,16 @@ static void virtio_scsi_do_tmf_bh(void *opaque)
 
 GLOBAL_STATE_CODE();
 
-virtio_scsi_acquire(s);
+WITH_QEMU_LOCK_GUARD(&s->tmf_bh_lock) {
+QTAILQ_FOREACH_SAFE(req, &s->tmf_bh_list, next, tmp) {
+QTAILQ_REMOVE(&s->tmf_bh_list, req, next);
+QTAILQ_INSERT_TAIL(&reqs, req, next);
+}
 
-QTAILQ_FOREACH_SAFE(req, &s->tmf_bh_list, next, tmp) {
-QTAILQ_REMOVE(&s->tmf_bh_list, req, next);
-QTAILQ_INSERT_TAIL(&reqs, req, next);
+qemu_bh_delete(s->tmf_bh);
+s->tmf_bh = NULL;
 }
 
-qemu_bh_delete(s->tmf_bh);
-s->tmf_bh = NULL;
-
-virtio_scsi_release(s);
-
 QTAILQ_FOREACH_SAFE(req, &reqs, next, tmp) {
 QTAILQ_REMOVE(&reqs, req, next);
 virtio_scsi_do_one_tmf_bh(req);
@@ -379,8 +398,7 @@ static void virtio_scsi_reset_tmf_bh(VirtIOSCSI *s)
 
 GLOBAL_STATE_CODE();
 
-virtio_scsi_acquire(s);
-
+/* Called after ioeventfd has been stopped, so tmf_bh_lock is not needed */
 if (s->tmf_bh) {
 qemu_bh_delete(s->tmf_bh);
 s->tmf_bh = NULL;
@@ -393,19 +411,19 @@ static void virtio_scsi_reset_tmf_bh(VirtIOSCSI *s)
 req->resp.tmf.response = VIRTIO_SCSI_S_TARGET_FAILURE;
 virtio_scsi_complete_req(req);
 }
-
-virtio_scsi_release(s);
 }
 
 static void virtio_scsi_defer_tmf_to_bh(VirtIOSCSIReq *req)
 {
 VirtIOSCSI *s = req->dev;
 
-QTAILQ_INSERT_TAIL(&s->tmf_bh_list, req, next);
+WITH_QEMU_LOCK_GUARD(&s->tmf_bh_lock) {
+QTAILQ_INSERT_TAIL(&s->tmf_bh_list, req, next);
 
-if (!s->tmf_bh) {
-s->tmf_bh = qemu_bh_new(virtio_scsi_do_tmf_bh, s);
-qemu_bh_schedule(s->tmf_bh);
+if (!s->tmf_bh) {
+s->tmf_bh = qemu_bh_new(virtio_scsi_do_tmf_bh, s);
+qemu_bh_schedule(s->tmf_bh);
+}
 }
 }
 
@@ -1235,6 +1253,7 @@ static void virtio_scsi_device_realize(DeviceState *dev, 
Error **errp)
 Error *err = NULL;
 
 QTAILQ_INIT(&s->tmf_bh_list);
+qemu_mutex_init(&s->tmf_bh_lock);
 
 virtio_scsi_common_realize(dev,
virtio_scsi_handle_ctrl,
@@ -1277,6 +1296,7 @@ static void virtio_scsi_device_unrealize(DeviceState *dev)
 
 qbus_set_hotplug_handler(BUS(&s->bus), NULL);
 virtio_scsi_common_unrealize(dev);
+qemu_mutex_destroy(&s->tmf_bh_lock);
 }
 
 static Property virtio_scsi_properties[] = {
-- 
2.43.0




[PATCH v2 00/14] aio: remove AioContext lock

2023-12-05 Thread Stefan Hajnoczi
v2:
- Add Patch 2 "scsi: assert that callbacks run in the correct AioContext" 
[Kevin]
- Add Patch 7 "block: remove bdrv_co_lock()" [Eric and Kevin]
- Remove stray goto label in Patch 8 [Kevin]
- Fix "eeked" -> "eked" typo in Patch 10 [Eric]

This series removes the AioContext locking APIs from QEMU.
aio_context_acquire() and aio_context_release() are currently only needed to
support the locking discipline required by AIO_POLL_WHILE() (except for a stray
user that I converted in Patch 1). AIO_POLL_WHILE() doesn't really need the
AioContext lock anymore, so it's possible to remove the API. This is a nice
simplification because the AioContext locking rules were sometimes tricky or
underspecified, leading to many bugs of the years.

This patch series removes these APIs across the codebase and cleans up the
documentation/comments that refers to them.

Patch 1 is a AioContext lock user I forgot to convert in my earlier SCSI
conversion series.

Patch 2 adds an assertion to the SCSI code to ensure that callbacks are invoked
in the correct AioContext.

Patch 3 removes tests for the AioContext lock because they will no longer be
needed when the lock is gone.

Patches 4-10 remove the AioContext lock. These can be reviewed by categorizing
the call sites into 1. places that take the lock because they call an API that
requires the lock (ultimately AIO_POLL_WHILE()) and 2. places that take the
lock to protect state. There should be no instances of case 2 left. If you see
one, you've found a bug in this patch series!

Patches 11-14 remove comments.

Based-on: 20231204164259.1515217-1-stefa...@redhat.com ("[PATCH v2 0/4] scsi: 
eliminate AioContext lock")
Since SCSI needs to stop relying on the AioContext lock before we can remove
the lock.

Stefan Hajnoczi (14):
  virtio-scsi: replace AioContext lock with tmf_bh_lock
  scsi: assert that callbacks run in the correct AioContext
  tests: remove aio_context_acquire() tests
  aio: make aio_context_acquire()/aio_context_release() a no-op
  graph-lock: remove AioContext locking
  block: remove AioContext locking
  block: remove bdrv_co_lock()
  scsi: remove AioContext locking
  aio-wait: draw equivalence between AIO_WAIT_WHILE() and
AIO_WAIT_WHILE_UNLOCKED()
  aio: remove aio_context_acquire()/aio_context_release() API
  docs: remove AioContext lock from IOThread docs
  scsi: remove outdated AioContext lock comment
  job: remove outdated AioContext locking comments
  block: remove outdated AioContext locking comments

 docs/devel/multiple-iothreads.txt|  45 ++--
 include/block/aio-wait.h |  16 +-
 include/block/aio.h  |  17 --
 include/block/block-common.h |   3 -
 include/block/block-global-state.h   |  23 +-
 include/block/block-io.h |  12 +-
 include/block/block_int-common.h |   2 -
 include/block/graph-lock.h   |  21 +-
 include/block/snapshot.h |   2 -
 include/hw/virtio/virtio-scsi.h  |  17 +-
 include/qemu/job.h   |  20 --
 block.c  | 363 ---
 block/backup.c   |   4 +-
 block/blklogwrites.c |   8 +-
 block/blkverify.c|   4 +-
 block/block-backend.c|  33 +--
 block/commit.c   |  16 +-
 block/copy-before-write.c|  22 +-
 block/export/export.c|  22 +-
 block/export/vhost-user-blk-server.c |   4 -
 block/graph-lock.c   |  44 +---
 block/io.c   |  45 +---
 block/mirror.c   |  41 +--
 block/monitor/bitmap-qmp-cmds.c  |  20 +-
 block/monitor/block-hmp-cmds.c   |  29 ---
 block/qapi-sysemu.c  |  27 +-
 block/qapi.c |  18 +-
 block/qcow2.c|   4 +-
 block/quorum.c   |   8 +-
 block/raw-format.c   |   5 -
 block/replication.c  |  72 +-
 block/snapshot.c |  26 +-
 block/stream.c   |  12 +-
 block/vmdk.c |  20 +-
 block/write-threshold.c  |   6 -
 blockdev.c   | 319 +--
 blockjob.c   |  30 +--
 hw/block/dataplane/virtio-blk.c  |  10 -
 hw/block/dataplane/xen-block.c   |  17 +-
 hw/block/virtio-blk.c|  45 +---
 hw/core/qdev-properties-system.c |   9 -
 hw/scsi/scsi-bus.c   |   2 -
 hw/scsi/scsi-disk.c  |  46 ++--
 hw/scsi/virtio-scsi.c|  80 +++---
 job.c|  16 --
 migration/block.c|  33 +--
 migration/migration-hmp-cmds.c   |   3 -
 migration/savevm.c   |  22 --
 net/colo-compare.c   |   2 -
 qemu-img.c   |   4 -
 qemu-io.c

Re: [PATCH 05/12] block: remove AioContext locking

2023-12-04 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 03:31:37PM -0600, Eric Blake wrote:
> On Wed, Nov 29, 2023 at 02:55:46PM -0500, Stefan Hajnoczi wrote:
> > This is the big patch that removes
> > aio_context_acquire()/aio_context_release() from the block layer and
> > affected block layer users.
> > 
> > There isn't a clean way to split this patch and the reviewers are likely
> > the same group of people, so I decided to do it in one patch.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> 
> > +++ b/block.c
> > @@ -7585,29 +7433,12 @@ void coroutine_fn bdrv_co_leave(BlockDriverState 
> > *bs, AioContext *old_ctx)
> >  
> >  void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
> >  {
> > -AioContext *ctx = bdrv_get_aio_context(bs);
> > -
> > -/* In the main thread, bs->aio_context won't change concurrently */
> > -assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> > -
> > -/*
> > - * We're in coroutine context, so we already hold the lock of the main
> > - * loop AioContext. Don't lock it twice to avoid deadlocks.
> > - */
> > -assert(qemu_in_coroutine());
> 
> Is this assertion worth keeping in the short term?...

Probably not because coroutine vs non-coroutine functions don't change
in this patch series, so it's unlikely that this will break.

> 
> > -if (ctx != qemu_get_aio_context()) {
> > -aio_context_acquire(ctx);
> > -}
> > +/* TODO removed in next patch */
> >  }
> 
> ...I guess I'll see in the next patch.
> 
> >  
> >  void coroutine_fn bdrv_co_unlock(BlockDriverState *bs)
> >  {
> > -AioContext *ctx = bdrv_get_aio_context(bs);
> > -
> > -assert(qemu_in_coroutine());
> > -if (ctx != qemu_get_aio_context()) {
> > -aio_context_release(ctx);
> > -}
> > +/* TODO removed in next patch */
> >  }
> 
> Same comment.
> 
> > +++ b/blockdev.c
> > @@ -1395,7 +1352,6 @@ static void 
> > external_snapshot_action(TransactionAction *action,
> >  /* File name of the new image (for 'blockdev-snapshot-sync') */
> >  const char *new_image_file;
> >  ExternalSnapshotState *state = g_new0(ExternalSnapshotState, 1);
> > -AioContext *aio_context;
> >  uint64_t perm, shared;
> >  
> >  /* TODO We'll eventually have to take a writer lock in this function */
> 
> I'm guessing removal of the locking gets us one step closer to
> implementing this TODO at a later time?  Or is it now a stale comment?
> Either way, it doesn't affect this patch.

I'm not sure. Kevin can answer questions about the graph lock.

> > +++ b/tests/unit/test-blockjob.c
> 
> > -static void test_complete_in_standby(void)
> > -{
> 
> > @@ -531,13 +402,5 @@ int main(int argc, char **argv)
> >  g_test_add_func("/blockjob/cancel/standby", test_cancel_standby);
> >  g_test_add_func("/blockjob/cancel/pending", test_cancel_pending);
> >  g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded);
> > -
> > -/*
> > - * This test is flaky and sometimes fails in CI and otherwise:
> > - * don't run unless user opts in via environment variable.
> > - */
> > -if (getenv("QEMU_TEST_FLAKY_TESTS")) {
> > -g_test_add_func("/blockjob/complete_in_standby", 
> > test_complete_in_standby);
> > -}
> 
> Looks like you ripped out this entire test, because it is no longer
> viable.  I might have mentioned it in the commit message, or squashed
> the removal of this test into the earlier 02/12 patch.

I have sent a separate patch to remove this test and once it's merged
this hunk will disappear this patch series:
https://lore.kernel.org/qemu-devel/20231127170210.422728-1-stefa...@redhat.com/

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 06/12] scsi: remove AioContext locking

2023-12-04 Thread Stefan Hajnoczi
On Mon, Dec 04, 2023 at 01:23:09PM +0100, Kevin Wolf wrote:
> Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> > The AioContext lock no longer has any effect. Remove it.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/hw/virtio/virtio-scsi.h | 14 --
> >  hw/scsi/scsi-bus.c  |  2 --
> >  hw/scsi/scsi-disk.c | 28 
> >  hw/scsi/virtio-scsi.c   | 18 --
> >  4 files changed, 4 insertions(+), 58 deletions(-)
> 
> > @@ -2531,13 +2527,11 @@ static void scsi_unrealize(SCSIDevice *dev)
> >  static void scsi_hd_realize(SCSIDevice *dev, Error **errp)
> >  {
> >  SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
> > -AioContext *ctx = NULL;
> > +
> >  /* can happen for devices without drive. The error message for missing
> >   * backend will be issued in scsi_realize
> >   */
> >  if (s->qdev.conf.blk) {
> > -ctx = blk_get_aio_context(s->qdev.conf.blk);
> > -aio_context_acquire(ctx);
> >  if (!blkconf_blocksizes(&s->qdev.conf, errp)) {
> >  goto out;
> >  }
> > @@ -2549,15 +2543,11 @@ static void scsi_hd_realize(SCSIDevice *dev, Error 
> > **errp)
> >  }
> >  scsi_realize(&s->qdev, errp);
> >  out:
> > -if (ctx) {
> > -aio_context_release(ctx);
> > -}
> >  }
> 
> This doesn't build for me:
> 
> ../hw/scsi/scsi-disk.c:2545:1: error: label at end of compound statement is a 
> C2x extension [-Werror,-Wc2x-extensions]
> }
> ^
> 1 error generated.

Will fix in v2. Thanks!

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 05/12] block: remove AioContext locking

2023-12-04 Thread Stefan Hajnoczi
On Mon, Dec 04, 2023 at 03:33:57PM +0100, Kevin Wolf wrote:
> Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> > This is the big patch that removes
> > aio_context_acquire()/aio_context_release() from the block layer and
> > affected block layer users.
> > 
> > There isn't a clean way to split this patch and the reviewers are likely
> > the same group of people, so I decided to do it in one patch.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> > @@ -7585,29 +7433,12 @@ void coroutine_fn bdrv_co_leave(BlockDriverState 
> > *bs, AioContext *old_ctx)
> >  
> >  void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
> >  {
> > -AioContext *ctx = bdrv_get_aio_context(bs);
> > -
> > -/* In the main thread, bs->aio_context won't change concurrently */
> > -assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> > -
> > -/*
> > - * We're in coroutine context, so we already hold the lock of the main
> > - * loop AioContext. Don't lock it twice to avoid deadlocks.
> > - */
> > -assert(qemu_in_coroutine());
> > -if (ctx != qemu_get_aio_context()) {
> > -aio_context_acquire(ctx);
> > -}
> > +/* TODO removed in next patch */
> >  }
> 
> It's still there at the end of the series.

Will fix in v2. Thanks!


signature.asc
Description: PGP signature


Re: [PATCH 01/12] virtio-scsi: replace AioContext lock with tmf_bh_lock

2023-12-04 Thread Stefan Hajnoczi
On Mon, Dec 04, 2023 at 01:46:13PM +0100, Kevin Wolf wrote:
> Am 29.11.2023 um 20:55 hat Stefan Hajnoczi geschrieben:
> > Protect the Task Management Function BH state with a lock. The TMF BH
> > runs in the main loop thread. An IOThread might process a TMF at the
> > same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
> > must be protected by a lock.
> > 
> > Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
> > This avoids more locking to protect the virtqueue and SCSI layer state.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> The second part reminds me that the implicit protection of the virtqueue
> and SCSI data structures by having all accesses in a single thread is
> hard to review and I think we wanted to put some assertions there to
> check that we're really running in the right thread. I don't think we
> have done that so far, so I suppose after this patch would be the place
> in the series to add them, before we remove the protection by the
> AioContext lock?

Thanks for reminding me. I will add assertions in the next revision of
this series.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 01/12] virtio-scsi: replace AioContext lock with tmf_bh_lock

2023-12-04 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 09:25:52AM -0600, Eric Blake wrote:
> On Wed, Nov 29, 2023 at 02:55:42PM -0500, Stefan Hajnoczi wrote:
> > Protect the Task Management Function BH state with a lock. The TMF BH
> > runs in the main loop thread. An IOThread might process a TMF at the
> > same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
> > must be protected by a lock.
> > 
> > Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
> > This avoids more locking to protect the virtqueue and SCSI layer state.
> 
> Are we trying to get this into 8.2?

No. 8.2 still has the AioContext lock is therefore safe without this
patch.

Stefan

> 
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/hw/virtio/virtio-scsi.h |  3 +-
> >  hw/scsi/virtio-scsi.c   | 62 ++---
> >  2 files changed, 43 insertions(+), 22 deletions(-)
> >
> 
> Reviewed-by: Eric Blake 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.
> Virtualization:  qemu.org | libguestfs.org
> 


signature.asc
Description: PGP signature


Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 03:08:49PM -0500, Peter Xu wrote:
> On Wed, Nov 29, 2023 at 04:26:20PM -0500, Stefan Hajnoczi wrote:
> > The Big QEMU Lock (BQL) has many names and they are confusing. The
> > actual QemuMutex variable is called qemu_global_mutex but it's commonly
> > referred to as the BQL in discussions and some code comments. The
> > locking APIs, however, are called qemu_mutex_lock_iothread() and
> > qemu_mutex_unlock_iothread().
> > 
> > The "iothread" name is historic and comes from when the main thread was
> > split into into KVM vcpu threads and the "iothread" (now called the main
> > loop thread). I have contributed to the confusion myself by introducing
> > a separate --object iothread, a separate concept unrelated to the BQL.
> > 
> > The "iothread" name is no longer appropriate for the BQL. Rename the
> > locking APIs to:
> > - void qemu_bql_lock(void)
> > - void qemu_bql_unlock(void)
> > - bool qemu_bql_locked(void)
> > 
> > There are more APIs with "iothread" in their names. Subsequent patches
> > will rename them. There are also comments and documentation that will be
> > updated in later patches.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> Acked-by: Peter Xu 
> 
> Two nickpicks:
> 
>   - BQL contains "QEMU" as the 2nd character, so maybe easier to further
> rename qemu_bql into bql_?

Philippe wondered whether the variable name should end with _mutex (or
_lock is common too), so an alternative might be big_qemu_lock. That's
imperfect because it doesn't start with the usual qemu_ prefix.
qemu_big_lock is better in that regard but inconsistent with our BQL
abbreviation.

I don't like putting an underscore at the end. It's unusual and would
make me wonder what that means.

Naming is hard, but please discuss and I'm open to change to BQL
variable's name to whatever we all agree on.

> 
>   - Could we keep the full spell of BQL at some places, so people can still
> reference it if not familiar?  IIUC most of the BQL helpers will root
> back to the major three functions (_lock, _unlock, _locked), perhaps
> add a comment of "BQL stands for..." over these three functions as
> comment?

Yes, I'll update the doc comments to say "Big QEMU Lock (BQL)" for each
of these functions.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 6/6] Rename "QEMU global mutex" to "BQL" in comments and docs

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 02:49:48PM +0100, Philippe Mathieu-Daudé wrote:
> On 29/11/23 22:26, Stefan Hajnoczi wrote:
> > The term "QEMU global mutex" is identical to the more widely used Big
> > QEMU Lock ("BQL"). Update the code comments and documentation to use
> > "BQL" instead of "QEMU global mutex".
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   docs/devel/multi-thread-tcg.rst   |  7 +++
> >   docs/devel/qapi-code-gen.rst  |  2 +-
> >   docs/devel/replay.rst |  2 +-
> >   docs/devel/multiple-iothreads.txt | 16 
> >   include/block/blockjob.h  |  6 +++---
> >   include/io/task.h |  2 +-
> >   include/qemu/coroutine-core.h |  2 +-
> >   include/qemu/coroutine.h  |  2 +-
> >   hw/block/dataplane/virtio-blk.c   |  8 
> >   hw/block/virtio-blk.c |  2 +-
> >   hw/scsi/virtio-scsi-dataplane.c   |  6 +++---
> >   net/tap.c |  2 +-
> >   12 files changed, 28 insertions(+), 29 deletions(-)
> 
> 
> > diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> > index e594c10d23..b2bc7c04d6 100644
> > --- a/include/block/blockjob.h
> > +++ b/include/block/blockjob.h
> > @@ -54,7 +54,7 @@ typedef struct BlockJob {
> >   /**
> >* Speed that was set with @block_job_set_speed.
> > - * Always modified and read under QEMU global mutex 
> > (GLOBAL_STATE_CODE).
> > + * Always modified and read under BQL (GLOBAL_STATE_CODE).
> 
> "under the BQL"
> 
> >*/
> >   int64_t speed;
> > @@ -66,7 +66,7 @@ typedef struct BlockJob {
> >   /**
> >* Block other operations when block job is running.
> > - * Always modified and read under QEMU global mutex 
> > (GLOBAL_STATE_CODE).
> > + * Always modified and read under BQL (GLOBAL_STATE_CODE).
> 
> Ditto,
> 
> >*/
> >   Error *blocker;
> > @@ -89,7 +89,7 @@ typedef struct BlockJob {
> >   /**
> >* BlockDriverStates that are involved in this block job.
> > - * Always modified and read under QEMU global mutex 
> > (GLOBAL_STATE_CODE).
> > + * Always modified and read under BQL (GLOBAL_STATE_CODE).
> 
> Ditto.
> 
> >*/
> >   GSList *nodes;
> >   } BlockJob;

Will fix in v2.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 5/6] Replace "iothread lock" with "BQL" in comments

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 02:47:49PM +0100, Philippe Mathieu-Daudé wrote:
> Hi Stefan,
> 
> On 29/11/23 22:26, Stefan Hajnoczi wrote:
> > The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
> > in their names. Update the code comments to use "BQL" instead of
> > "iothread lock".
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   docs/devel/reset.rst |  2 +-
> >   hw/display/qxl.h |  2 +-
> >   include/exec/cpu-common.h|  2 +-
> >   include/exec/memory.h|  4 ++--
> >   include/exec/ramblock.h  |  2 +-
> >   include/migration/register.h |  8 
> >   target/arm/internals.h   |  4 ++--
> >   accel/tcg/cputlb.c   |  4 ++--
> >   accel/tcg/tcg-accel-ops-icount.c |  2 +-
> >   hw/remote/mpqemu-link.c  |  2 +-
> >   migration/block-dirty-bitmap.c   | 10 +-
> >   migration/block.c| 24 
> >   migration/colo.c |  2 +-
> >   migration/migration.c|  2 +-
> >   migration/ram.c  |  4 ++--
> >   system/physmem.c |  6 +++---
> >   target/arm/helper.c  |  2 +-
> >   target/arm/tcg/m_helper.c|  2 +-
> >   ui/spice-core.c  |  2 +-
> >   util/rcu.c   |  2 +-
> >   audio/coreaudio.m|  4 ++--
> >   ui/cocoa.m   |  6 +++---
> >   22 files changed, 49 insertions(+), 49 deletions(-)
> 
> 
> > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> > index 69c6a53902..a2bc0a345d 100644
> > --- a/include/exec/ramblock.h
> > +++ b/include/exec/ramblock.h
> > @@ -34,7 +34,7 @@ struct RAMBlock {
> >   ram_addr_t max_length;
> >   void (*resized)(const char*, uint64_t length, void *host);
> >   uint32_t flags;
> > -/* Protected by iothread lock.  */
> > +/* Protected by BQL.  */
> 
> There is only one single BQL, so preferably:
> 
> "by the BQL"
> 
> >   char idstr[256];
> >   /* RCU-enabled, writes protected by the ramlist lock */
> >   QLIST_ENTRY(RAMBlock) next;
> 
> 
> 
> 
> > -/* Called with iothread lock taken.  */
> > +/* Called with BQL taken.  */
> 
> "with the BQL" (other uses)

I will try to change these for v2. It's a pre-existing issue though
because there was only ever one "iothread lock" too.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 4/6] system/cpus: rename qemu_global_mutex to qemu_bql

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 02:44:07PM +0100, Philippe Mathieu-Daudé wrote:
> Hi Stefan,
> 
> On 29/11/23 22:26, Stefan Hajnoczi wrote:
> > The APIs using qemu_global_mutex now follow the Big QEMU Lock (BQL)
> > nomenclature. It's a little strange that the actual QemuMutex variable
> > that embodies the BQL is called qemu_global_mutex instead of qemu_bql.
> > Rename it for consistency.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   system/cpus.c | 20 ++--
> >   1 file changed, 10 insertions(+), 10 deletions(-)
> > 
> > diff --git a/system/cpus.c b/system/cpus.c
> > index eb24a4db8e..138720a540 100644
> > --- a/system/cpus.c
> > +++ b/system/cpus.c
> > @@ -65,7 +65,7 @@
> >   #endif /* CONFIG_LINUX */
> > -static QemuMutex qemu_global_mutex;
> > +static QemuMutex qemu_bql;
> 
> I thought we were using _cond/_sem/_mutex suffixes, but
> this is not enforced:

I'm open to alternative names. Here are some I can think of:
- big_qemu_lock (although grepping for "bql" won't find it)
- qemu_bql_mutex

If there is no strong feeling about this then let's leave it at
qemu_bql. Otherwise, please discuss.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [PATCH 2/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 10:14:47AM +0100, Ilya Leoshkevich wrote:
> On Wed, 2023-11-29 at 16:26 -0500, Stefan Hajnoczi wrote:
> > The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
> > instead, it is already widely used and unambiguous.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/qemu/main-loop.h  | 20 ++--
> >  hw/i386/kvm/xen_evtchn.c  | 14 +++---
> >  hw/i386/kvm/xen_gnttab.c  |  2 +-
> >  hw/mips/mips_int.c    |  2 +-
> >  hw/ppc/ppc.c  |  2 +-
> >  target/i386/kvm/xen-emu.c |  2 +-
> >  target/ppc/excp_helper.c  |  2 +-
> >  target/ppc/helper_regs.c  |  2 +-
> >  target/riscv/cpu_helper.c |  4 ++--
> >  9 files changed, 25 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> > index d6f75e57bd..0b6a3e4824 100644
> > --- a/include/qemu/main-loop.h
> > +++ b/include/qemu/main-loop.h
> > @@ -344,13 +344,13 @@ void qemu_bql_lock_impl(const char *file, int
> > line);
> >  void qemu_bql_unlock(void);
> >  
> >  /**
> > - * QEMU_IOTHREAD_LOCK_GUARD
> > + * QEMU_BQL_LOCK_GUARD
> >   *
> > - * Wrap a block of code in a conditional
> > qemu_mutex_{lock,unlock}_iothread.
> > + * Wrap a block of code in a conditional qemu_bql_{lock,unlock}.
> >   */
> > -typedef struct IOThreadLockAuto IOThreadLockAuto;
> > +typedef struct BQLLockAuto BQLLockAuto;
> >  
> > -static inline IOThreadLockAuto *qemu_iothread_auto_lock(const char
> > *file,
> > +static inline BQLLockAuto *qemu_bql_auto_lock(const char *file,
> >  int line)
> 
> The padding is not correct anymore.

Good point, I didn't check the formatting after search-and-replace. I
will fix this across the patch series in v2.

Stefan


signature.asc
Description: PGP signature


[PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-29 Thread Stefan Hajnoczi
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void qemu_bql_lock(void)
- void qemu_bql_unlock(void)
- bool qemu_bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio-wait.h |   2 +-
 include/qemu/main-loop.h |  26 +++---
 accel/accel-blocker.c|  10 +--
 accel/dummy-cpus.c   |   8 +-
 accel/hvf/hvf-accel-ops.c|   4 +-
 accel/kvm/kvm-accel-ops.c|   4 +-
 accel/kvm/kvm-all.c  |  22 ++---
 accel/tcg/cpu-exec.c |  26 +++---
 accel/tcg/cputlb.c   |  16 ++--
 accel/tcg/tcg-accel-ops-icount.c |   4 +-
 accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
 accel/tcg/tcg-accel-ops-rr.c |  14 ++--
 accel/tcg/tcg-accel-ops.c|   2 +-
 accel/tcg/translate-all.c|   2 +-
 cpu-common.c |   4 +-
 dump/dump.c  |   4 +-
 hw/core/cpu-common.c |   6 +-
 hw/i386/intel_iommu.c|   6 +-
 hw/i386/kvm/xen_evtchn.c |  16 ++--
 hw/i386/kvm/xen_overlay.c|   2 +-
 hw/i386/kvm/xen_xenstore.c   |   2 +-
 hw/intc/arm_gicv3_cpuif.c|   2 +-
 hw/intc/s390_flic.c  |  18 ++--
 hw/misc/edu.c|   4 +-
 hw/misc/imx6_src.c   |   2 +-
 hw/misc/imx7_src.c   |   2 +-
 hw/net/xen_nic.c |   8 +-
 hw/ppc/pegasos2.c|   2 +-
 hw/ppc/ppc.c |   4 +-
 hw/ppc/spapr.c   |   2 +-
 hw/ppc/spapr_rng.c   |   4 +-
 hw/ppc/spapr_softmmu.c   |   4 +-
 hw/remote/mpqemu-link.c  |  12 +--
 hw/remote/vfio-user-obj.c|   2 +-
 hw/s390x/s390-skeys.c|   2 +-
 migration/block-dirty-bitmap.c   |   4 +-
 migration/block.c|  16 ++--
 migration/colo.c |  60 +++---
 migration/dirtyrate.c|  12 +--
 migration/migration.c|  52 ++--
 migration/ram.c  |  12 +--
 replay/replay-internal.c |   2 +-
 semihosting/console.c|   8 +-
 stubs/iothread-lock.c|   6 +-
 system/cpu-throttle.c|   4 +-
 system/cpus.c|  28 +++
 system/dirtylimit.c  |   4 +-
 system/memory.c  |   2 +-
 system/physmem.c |   8 +-
 system/runstate.c|   2 +-
 system/watchpoint.c  |   4 +-
 target/arm/arm-powerctl.c|  14 ++--
 target/arm/helper.c  |   4 +-
 target/arm/hvf/hvf.c |   8 +-
 target/arm/kvm.c |   4 +-
 target/arm/kvm64.c   |   4 +-
 target/arm/ptw.c |   6 +-
 target/arm/tcg/helper-a64.c  |   8 +-
 target/arm/tcg/m_helper.c|   4 +-
 target/arm/tcg/op_helper.c   |  24 +++---
 target/arm/tcg/psci.c|   2 +-
 target/hppa/int_helper.c |   8 +-
 target/i386/hvf/hvf.c|   6 +-
 target/i386/kvm/hyperv.c |   4 +-
 target/i386/kvm/kvm.c|  28 +++
 target/i386/kvm/xen-emu.c|  14 ++--
 target/i386/nvmm/nvmm-accel-ops.c|   4 +-
 target/i386/nvmm/nvmm-all.c  |  20 ++---
 target/i386/tcg/sysemu/fpu_helper.c  |   6 +-
 target/i386/tcg/sysemu/misc_helper.c |   4 +-
 target/i386/whpx/whpx-accel-ops.c|   4 +-
 target/i386/whpx/whpx-all.c  |  24 +++---
 target/loongarch/csr_helper.c|   4 +-
 target/mips/kvm.c|   4 +-
 target/mips/tcg/sysemu/cp0_helper.c  |   4 +-
 target/openrisc/sys_helper.c |  16 ++--
 target/ppc/excp_helper.c |  12 +--
 target/ppc/kvm.c |   4 +-
 target/ppc/misc_helper.c |   8 +-
 target/ppc/timebase_helper.c |   8 +-
 target/s390x/kvm/kvm.c   |   4 +-
 target/s390x/tcg/misc_helper.c   | 118 +--
 target/sparc/i

[PATCH 6/6] Rename "QEMU global mutex" to "BQL" in comments and docs

2023-11-29 Thread Stefan Hajnoczi
The term "QEMU global mutex" is identical to the more widely used Big
QEMU Lock ("BQL"). Update the code comments and documentation to use
"BQL" instead of "QEMU global mutex".

Signed-off-by: Stefan Hajnoczi 
---
 docs/devel/multi-thread-tcg.rst   |  7 +++
 docs/devel/qapi-code-gen.rst  |  2 +-
 docs/devel/replay.rst |  2 +-
 docs/devel/multiple-iothreads.txt | 16 
 include/block/blockjob.h  |  6 +++---
 include/io/task.h |  2 +-
 include/qemu/coroutine-core.h |  2 +-
 include/qemu/coroutine.h  |  2 +-
 hw/block/dataplane/virtio-blk.c   |  8 
 hw/block/virtio-blk.c |  2 +-
 hw/scsi/virtio-scsi-dataplane.c   |  6 +++---
 net/tap.c |  2 +-
 12 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/docs/devel/multi-thread-tcg.rst b/docs/devel/multi-thread-tcg.rst
index c9541a7b20..7302c3bf53 100644
--- a/docs/devel/multi-thread-tcg.rst
+++ b/docs/devel/multi-thread-tcg.rst
@@ -226,10 +226,9 @@ instruction. This could be a future optimisation.
 Emulated hardware state
 ---
 
-Currently thanks to KVM work any access to IO memory is automatically
-protected by the global iothread mutex, also known as the BQL (Big
-QEMU Lock). Any IO region that doesn't use global mutex is expected to
-do its own locking.
+Currently thanks to KVM work any access to IO memory is automatically protected
+by the BQL (Big QEMU Lock). Any IO region that doesn't use the BQL is expected
+to do its own locking.
 
 However IO memory isn't the only way emulated hardware state can be
 modified. Some architectures have model specific registers that
diff --git a/docs/devel/qapi-code-gen.rst b/docs/devel/qapi-code-gen.rst
index 7f78183cd4..ea8228518c 100644
--- a/docs/devel/qapi-code-gen.rst
+++ b/docs/devel/qapi-code-gen.rst
@@ -594,7 +594,7 @@ blocking the guest and other background operations.
 Coroutine safety can be hard to prove, similar to thread safety.  Common
 pitfalls are:
 
-- The global mutex isn't held across ``qemu_coroutine_yield()``, so
+- The BQL isn't held across ``qemu_coroutine_yield()``, so
   operations that used to assume that they execute atomically may have
   to be more careful to protect against changes in the global state.
 
diff --git a/docs/devel/replay.rst b/docs/devel/replay.rst
index 0244be8b9c..effd856f0c 100644
--- a/docs/devel/replay.rst
+++ b/docs/devel/replay.rst
@@ -184,7 +184,7 @@ modes.
 Reading and writing requests are created by CPU thread of QEMU. Later these
 requests proceed to block layer which creates "bottom halves". Bottom
 halves consist of callback and its parameters. They are processed when
-main loop locks the global mutex. These locks are not synchronized with
+main loop locks the BQL. These locks are not synchronized with
 replaying process because main loop also processes the events that do not
 affect the virtual machine state (like user interaction with monitor).
 
diff --git a/docs/devel/multiple-iothreads.txt 
b/docs/devel/multiple-iothreads.txt
index a3e949f6b3..828e5527a3 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -5,7 +5,7 @@ the COPYING file in the top-level directory.
 
 
 This document explains the IOThread feature and how to write code that runs
-outside the QEMU global mutex.
+outside the BQL.
 
 The main loop and IOThreads
 ---
@@ -29,13 +29,13 @@ scalability bottleneck on hosts with many CPUs.  Work can 
be spread across
 several IOThreads instead of just one main loop.  When set up correctly this
 can improve I/O latency and reduce jitter seen by the guest.
 
-The main loop is also deeply associated with the QEMU global mutex, which is a
-scalability bottleneck in itself.  vCPU threads and the main loop use the QEMU
-global mutex to serialize execution of QEMU code.  This mutex is necessary
-because a lot of QEMU's code historically was not thread-safe.
+The main loop is also deeply associated with the BQL, which is a
+scalability bottleneck in itself.  vCPU threads and the main loop use the BQL
+to serialize execution of QEMU code.  This mutex is necessary because a lot of
+QEMU's code historically was not thread-safe.
 
 The fact that all I/O processing is done in a single main loop and that the
-QEMU global mutex is contended by all vCPU threads and the main loop explain
+BQL is contended by all vCPU threads and the main loop explain
 why it is desirable to place work into IOThreads.
 
 The experimental virtio-blk data-plane implementation has been benchmarked and
@@ -66,7 +66,7 @@ There are several old APIs that use the main loop AioContext:
 
 Since they implicitly work on the main loop they cannot be used in code that
 runs in an IOThread.  They might cause a crash or deadlock if called from an
-IOThread since the QEMU global mutex is not held.
+IOThread since the 

[PATCH 3/6] qemu/main-loop: rename qemu_cond_wait_iothread() to qemu_cond_wait_bql()

2023-11-29 Thread Stefan Hajnoczi
The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/main-loop.h  | 8 
 accel/tcg/tcg-accel-ops-rr.c  | 4 ++--
 hw/display/virtio-gpu.c   | 2 +-
 hw/ppc/spapr_events.c | 2 +-
 system/cpu-throttle.c | 2 +-
 system/cpus.c | 4 ++--
 target/i386/nvmm/nvmm-accel-ops.c | 2 +-
 target/i386/whpx/whpx-accel-ops.c | 2 +-
 8 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 0b6a3e4824..ec2a70f041 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -373,17 +373,17 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(BQLLockAuto, 
qemu_bql_auto_unlock)
 = qemu_bql_auto_lock(__FILE__, __LINE__)
 
 /*
- * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
+ * qemu_cond_wait_bql: Wait on condition for the main loop mutex
  *
  * This function atomically releases the main loop mutex and causes
  * the calling thread to block on the condition.
  */
-void qemu_cond_wait_iothread(QemuCond *cond);
+void qemu_cond_wait_bql(QemuCond *cond);
 
 /*
- * qemu_cond_timedwait_iothread: like the previous, but with timeout
+ * qemu_cond_timedwait_bql: like the previous, but with timeout
  */
-void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
+void qemu_cond_timedwait_bql(QemuCond *cond, int ms);
 
 /* internal interfaces */
 
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index c21215a094..1e5a688085 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -111,7 +111,7 @@ static void rr_wait_io_event(void)
 
 while (all_cpu_threads_idle()) {
 rr_stop_kick_timer();
-qemu_cond_wait_iothread(first_cpu->halt_cond);
+qemu_cond_wait_bql(first_cpu->halt_cond);
 }
 
 rr_start_kick_timer();
@@ -198,7 +198,7 @@ static void *rr_cpu_thread_fn(void *arg)
 
 /* wait for initial kick-off after machine start */
 while (first_cpu->stopped) {
-qemu_cond_wait_iothread(first_cpu->halt_cond);
+qemu_cond_wait_bql(first_cpu->halt_cond);
 
 /* process any pending work */
 CPU_FOREACH(cpu) {
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index b016d3bac8..67c5be1a4e 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -1512,7 +1512,7 @@ void virtio_gpu_reset(VirtIODevice *vdev)
 g->reset_finished = false;
 qemu_bh_schedule(g->reset_bh);
 while (!g->reset_finished) {
-qemu_cond_wait_iothread(&g->reset_cond);
+qemu_cond_wait_bql(&g->reset_cond);
 }
 } else {
 virtio_gpu_reset_bh(g);
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index deb4641505..cb0587 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -899,7 +899,7 @@ void spapr_mce_req_event(PowerPCCPU *cpu, bool recovered)
 }
 return;
 }
-qemu_cond_wait_iothread(&spapr->fwnmi_machine_check_interlock_cond);
+qemu_cond_wait_bql(&spapr->fwnmi_machine_check_interlock_cond);
 if (spapr->fwnmi_machine_check_addr == -1) {
 /*
  * If the machine was reset while waiting for the interlock,
diff --git a/system/cpu-throttle.c b/system/cpu-throttle.c
index e98836311b..1d2b73369e 100644
--- a/system/cpu-throttle.c
+++ b/system/cpu-throttle.c
@@ -54,7 +54,7 @@ static void cpu_throttle_thread(CPUState *cpu, 
run_on_cpu_data opaque)
 endtime_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + sleeptime_ns;
 while (sleeptime_ns > 0 && !cpu->stop) {
 if (sleeptime_ns > SCALE_MS) {
-qemu_cond_timedwait_iothread(cpu->halt_cond,
+qemu_cond_timedwait_bql(cpu->halt_cond,
  sleeptime_ns / SCALE_MS);
 } else {
 qemu_bql_unlock();
diff --git a/system/cpus.c b/system/cpus.c
index d5b98c11f5..eb24a4db8e 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -513,12 +513,12 @@ void qemu_bql_unlock(void)
 qemu_mutex_unlock(&qemu_global_mutex);
 }
 
-void qemu_cond_wait_iothread(QemuCond *cond)
+void qemu_cond_wait_bql(QemuCond *cond)
 {
 qemu_cond_wait(cond, &qemu_global_mutex);
 }
 
-void qemu_cond_timedwait_iothread(QemuCond *cond, int ms)
+void qemu_cond_timedwait_bql(QemuCond *cond, int ms)
 {
 qemu_cond_timedwait(cond, &qemu_global_mutex, ms);
 }
diff --git a/target/i386/nvmm/nvmm-accel-ops.c 
b/target/i386/nvmm/nvmm-accel-ops.c
index 387ccfcce5..0fe8a76820 100644
--- a/target/i386/nvmm/nvmm-accel-ops.c
+++ b/target/i386/nvmm/nvmm-accel-ops.c
@@ -48,7 +48,7 @@ static void *qemu_nvmm_cpu_thread_fn(void *arg)
 }
 }
 while (cpu_thread_is_idle(cpu)) {

[PATCH 0/6] Make Big QEMU Lock naming consistent

2023-11-29 Thread Stefan Hajnoczi
The Big QEMU Lock ("BQL") has two other names: "iothread lock" and "QEMU global
mutex". The term "iothread lock" is easily confused with the unrelated --object
iothread (iothread.c).

This series updates the code and documentation to consistently use "BQL". This
makes the code easier to understand.

Stefan Hajnoczi (6):
  system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()
  qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD
  qemu/main-loop: rename qemu_cond_wait_iothread() to
qemu_cond_wait_bql()
  system/cpus: rename qemu_global_mutex to qemu_bql
  Replace "iothread lock" with "BQL" in comments
  Rename "QEMU global mutex" to "BQL" in comments and docs

 docs/devel/multi-thread-tcg.rst  |   7 +-
 docs/devel/qapi-code-gen.rst |   2 +-
 docs/devel/replay.rst|   2 +-
 docs/devel/reset.rst |   2 +-
 docs/devel/multiple-iothreads.txt|  16 ++--
 hw/display/qxl.h |   2 +-
 include/block/aio-wait.h |   2 +-
 include/block/blockjob.h |   6 +-
 include/exec/cpu-common.h|   2 +-
 include/exec/memory.h|   4 +-
 include/exec/ramblock.h  |   2 +-
 include/io/task.h|   2 +-
 include/migration/register.h |   8 +-
 include/qemu/coroutine-core.h|   2 +-
 include/qemu/coroutine.h |   2 +-
 include/qemu/main-loop.h |  54 ++--
 target/arm/internals.h   |   4 +-
 accel/accel-blocker.c|  10 +--
 accel/dummy-cpus.c   |   8 +-
 accel/hvf/hvf-accel-ops.c|   4 +-
 accel/kvm/kvm-accel-ops.c|   4 +-
 accel/kvm/kvm-all.c  |  22 ++---
 accel/tcg/cpu-exec.c |  26 +++---
 accel/tcg/cputlb.c   |  20 ++---
 accel/tcg/tcg-accel-ops-icount.c |   6 +-
 accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
 accel/tcg/tcg-accel-ops-rr.c |  18 ++--
 accel/tcg/tcg-accel-ops.c|   2 +-
 accel/tcg/translate-all.c|   2 +-
 cpu-common.c |   4 +-
 dump/dump.c  |   4 +-
 hw/block/dataplane/virtio-blk.c  |   8 +-
 hw/block/virtio-blk.c|   2 +-
 hw/core/cpu-common.c |   6 +-
 hw/display/virtio-gpu.c  |   2 +-
 hw/i386/intel_iommu.c|   6 +-
 hw/i386/kvm/xen_evtchn.c |  30 +++
 hw/i386/kvm/xen_gnttab.c |   2 +-
 hw/i386/kvm/xen_overlay.c|   2 +-
 hw/i386/kvm/xen_xenstore.c   |   2 +-
 hw/intc/arm_gicv3_cpuif.c|   2 +-
 hw/intc/s390_flic.c  |  18 ++--
 hw/mips/mips_int.c   |   2 +-
 hw/misc/edu.c|   4 +-
 hw/misc/imx6_src.c   |   2 +-
 hw/misc/imx7_src.c   |   2 +-
 hw/net/xen_nic.c |   8 +-
 hw/ppc/pegasos2.c|   2 +-
 hw/ppc/ppc.c |   6 +-
 hw/ppc/spapr.c   |   2 +-
 hw/ppc/spapr_events.c|   2 +-
 hw/ppc/spapr_rng.c   |   4 +-
 hw/ppc/spapr_softmmu.c   |   4 +-
 hw/remote/mpqemu-link.c  |  14 ++--
 hw/remote/vfio-user-obj.c|   2 +-
 hw/s390x/s390-skeys.c|   2 +-
 hw/scsi/virtio-scsi-dataplane.c  |   6 +-
 migration/block-dirty-bitmap.c   |  14 ++--
 migration/block.c|  40 -
 migration/colo.c |  62 +++---
 migration/dirtyrate.c|  12 +--
 migration/migration.c|  54 ++--
 migration/ram.c  |  16 ++--
 net/tap.c|   2 +-
 replay/replay-internal.c |   2 +-
 semihosting/console.c|   8 +-
 stubs/iothread-lock.c|   6 +-
 system/cpu-throttle.c|   6 +-
 system/cpus.c|  52 ++--
 system/dirtylimit.c  |   4 +-
 system/memory.c  |   2 +-
 system/physmem.c |  14 ++--
 system/runstate.c|   2 +-
 system/watchpoint.c  |   4 +-
 target/arm/arm-powerctl.c|  14 ++--
 target/arm/helper.c  |   6 +-
 target/arm/hvf/hvf.c |   8 +-
 target/arm/kvm.c |   4 +-
 target/arm/kvm64.c   |   4 +-
 target/arm/ptw.c |   6 +-
 target/arm/tcg/helper-a64.c  |   8 +-
 target/arm/tcg/m_helper.c|   6 +-
 target/arm/tcg/op_helper.c   |  24 +++---
 target/arm/tcg/psci.c|   2 +-
 target/hppa/int_helper.c |   8 +-
 target/i386/hvf/hvf.c|   6 +-
 target/i386/kvm/hyperv.c |   4 +-
 target/i386/kvm/kvm.c  

[PATCH 5/6] Replace "iothread lock" with "BQL" in comments

2023-11-29 Thread Stefan Hajnoczi
The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
in their names. Update the code comments to use "BQL" instead of
"iothread lock".

Signed-off-by: Stefan Hajnoczi 
---
 docs/devel/reset.rst |  2 +-
 hw/display/qxl.h |  2 +-
 include/exec/cpu-common.h|  2 +-
 include/exec/memory.h|  4 ++--
 include/exec/ramblock.h  |  2 +-
 include/migration/register.h |  8 
 target/arm/internals.h   |  4 ++--
 accel/tcg/cputlb.c   |  4 ++--
 accel/tcg/tcg-accel-ops-icount.c |  2 +-
 hw/remote/mpqemu-link.c  |  2 +-
 migration/block-dirty-bitmap.c   | 10 +-
 migration/block.c| 24 
 migration/colo.c |  2 +-
 migration/migration.c|  2 +-
 migration/ram.c  |  4 ++--
 system/physmem.c |  6 +++---
 target/arm/helper.c  |  2 +-
 target/arm/tcg/m_helper.c|  2 +-
 ui/spice-core.c  |  2 +-
 util/rcu.c   |  2 +-
 audio/coreaudio.m|  4 ++--
 ui/cocoa.m   |  6 +++---
 22 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/docs/devel/reset.rst b/docs/devel/reset.rst
index 38ed1790f7..d4e79718ba 100644
--- a/docs/devel/reset.rst
+++ b/docs/devel/reset.rst
@@ -19,7 +19,7 @@ Triggering reset
 
 This section documents the APIs which "users" of a resettable object should use
 to control it. All resettable control functions must be called while holding
-the iothread lock.
+the BQL.
 
 You can apply a reset to an object using ``resettable_assert_reset()``. You 
need
 to call ``resettable_release_reset()`` to release the object from reset. To
diff --git a/hw/display/qxl.h b/hw/display/qxl.h
index fdac14edad..e0a85a5ca4 100644
--- a/hw/display/qxl.h
+++ b/hw/display/qxl.h
@@ -159,7 +159,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(PCIQXLDevice, PCI_QXL)
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 41115d8919..fef3138d29 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -92,7 +92,7 @@ RAMBlock *qemu_ram_block_by_name(const char *name);
  *
  * By the time this function returns, the returned pointer is not protected
  * by RCU anymore.  If the caller is not within an RCU critical section and
- * does not hold the iothread lock, it must have other means of protecting the
+ * does not hold the BQL, it must have other means of protecting the
  * pointer, such as a reference to the memory region that owns the RAMBlock.
  */
 RAMBlock *qemu_ram_block_from_host(void *ptr, bool round_offset,
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 831f7c996d..ad6466b07e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -1962,7 +1962,7 @@ int memory_region_get_fd(MemoryRegion *mr);
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
@@ -1979,7 +1979,7 @@ MemoryRegion *memory_region_from_host(void *ptr, 
ram_addr_t *offset);
  *
  * Use with care; by the time this function returns, the returned pointer is
  * not protected by RCU anymore.  If the caller is not within an RCU critical
- * section and does not hold the iothread lock, it must have other means of
+ * section and does not hold the BQL, it must have other means of
  * protecting the pointer, such as a reference to the region that includes
  * the incoming ram_addr_t.
  *
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 69c6a53902..a2bc0a345d 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -34,7 +34,7 @@ struct RAMBlock {
 ram_addr_t max_length;
 void (*resized)(const char*, uint64_t length, void *host);
 uint32_t flags;
-/* Protected by iothread lock.  */
+/* Protected by BQL.  */
 char idstr[256];
 /* RCU-enabled, writes protected by the ramlist lock */
 QLIST_ENTRY(RAMBlock) next;
diff --git a/include/migration/register.h b/include/migration/register.h
index fed1d04a3c..9ab1f79512 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -17,7 +17,7 @@
 #include "hw/vmstate-if.h"
 
 typedef struct Save

[PATCH 4/6] system/cpus: rename qemu_global_mutex to qemu_bql

2023-11-29 Thread Stefan Hajnoczi
The APIs using qemu_global_mutex now follow the Big QEMU Lock (BQL)
nomenclature. It's a little strange that the actual QemuMutex variable
that embodies the BQL is called qemu_global_mutex instead of qemu_bql.
Rename it for consistency.

Signed-off-by: Stefan Hajnoczi 
---
 system/cpus.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/system/cpus.c b/system/cpus.c
index eb24a4db8e..138720a540 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -65,7 +65,7 @@
 
 #endif /* CONFIG_LINUX */
 
-static QemuMutex qemu_global_mutex;
+static QemuMutex qemu_bql;
 
 /*
  * The chosen accelerator is supposed to register this.
@@ -389,14 +389,14 @@ void qemu_init_cpu_loop(void)
 qemu_init_sigbus();
 qemu_cond_init(&qemu_cpu_cond);
 qemu_cond_init(&qemu_pause_cond);
-qemu_mutex_init(&qemu_global_mutex);
+qemu_mutex_init(&qemu_bql);
 
 qemu_thread_get_self(&io_thread);
 }
 
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, run_on_cpu_data data)
 {
-do_run_on_cpu(cpu, func, data, &qemu_global_mutex);
+do_run_on_cpu(cpu, func, data, &qemu_bql);
 }
 
 static void qemu_cpu_stop(CPUState *cpu, bool exit)
@@ -428,7 +428,7 @@ void qemu_wait_io_event(CPUState *cpu)
 slept = true;
 qemu_plugin_vcpu_idle_cb(cpu);
 }
-qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
+qemu_cond_wait(cpu->halt_cond, &qemu_bql);
 }
 if (slept) {
 qemu_plugin_vcpu_resume_cb(cpu);
@@ -502,7 +502,7 @@ void qemu_bql_lock_impl(const char *file, int line)
 QemuMutexLockFunc bql_lock = qatomic_read(&qemu_bql_mutex_lock_func);
 
 g_assert(!qemu_bql_locked());
-bql_lock(&qemu_global_mutex, file, line);
+bql_lock(&qemu_bql, file, line);
 set_bql_locked(true);
 }
 
@@ -510,17 +510,17 @@ void qemu_bql_unlock(void)
 {
 g_assert(qemu_bql_locked());
 set_bql_locked(false);
-qemu_mutex_unlock(&qemu_global_mutex);
+qemu_mutex_unlock(&qemu_bql);
 }
 
 void qemu_cond_wait_bql(QemuCond *cond)
 {
-qemu_cond_wait(cond, &qemu_global_mutex);
+qemu_cond_wait(cond, &qemu_bql);
 }
 
 void qemu_cond_timedwait_bql(QemuCond *cond, int ms)
 {
-qemu_cond_timedwait(cond, &qemu_global_mutex, ms);
+qemu_cond_timedwait(cond, &qemu_bql, ms);
 }
 
 /* signal CPU creation */
@@ -571,7 +571,7 @@ void pause_all_vcpus(void)
 replay_mutex_unlock();
 
 while (!all_vcpus_paused()) {
-qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
+qemu_cond_wait(&qemu_pause_cond, &qemu_bql);
 CPU_FOREACH(cpu) {
 qemu_cpu_kick(cpu);
 }
@@ -649,7 +649,7 @@ void qemu_init_vcpu(CPUState *cpu)
 cpus_accel->create_vcpu_thread(cpu);
 
 while (!cpu->created) {
-qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
+qemu_cond_wait(&qemu_cpu_cond, &qemu_bql);
 }
 }
 
-- 
2.42.0




[PATCH 2/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD

2023-11-29 Thread Stefan Hajnoczi
The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/main-loop.h  | 20 ++--
 hw/i386/kvm/xen_evtchn.c  | 14 +++---
 hw/i386/kvm/xen_gnttab.c  |  2 +-
 hw/mips/mips_int.c|  2 +-
 hw/ppc/ppc.c  |  2 +-
 target/i386/kvm/xen-emu.c |  2 +-
 target/ppc/excp_helper.c  |  2 +-
 target/ppc/helper_regs.c  |  2 +-
 target/riscv/cpu_helper.c |  4 ++--
 9 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index d6f75e57bd..0b6a3e4824 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -344,13 +344,13 @@ void qemu_bql_lock_impl(const char *file, int line);
 void qemu_bql_unlock(void);
 
 /**
- * QEMU_IOTHREAD_LOCK_GUARD
+ * QEMU_BQL_LOCK_GUARD
  *
- * Wrap a block of code in a conditional qemu_mutex_{lock,unlock}_iothread.
+ * Wrap a block of code in a conditional qemu_bql_{lock,unlock}.
  */
-typedef struct IOThreadLockAuto IOThreadLockAuto;
+typedef struct BQLLockAuto BQLLockAuto;
 
-static inline IOThreadLockAuto *qemu_iothread_auto_lock(const char *file,
+static inline BQLLockAuto *qemu_bql_auto_lock(const char *file,
 int line)
 {
 if (qemu_bql_locked()) {
@@ -358,19 +358,19 @@ static inline IOThreadLockAuto 
*qemu_iothread_auto_lock(const char *file,
 }
 qemu_bql_lock_impl(file, line);
 /* Anything non-NULL causes the cleanup function to be called */
-return (IOThreadLockAuto *)(uintptr_t)1;
+return (BQLLockAuto *)(uintptr_t)1;
 }
 
-static inline void qemu_iothread_auto_unlock(IOThreadLockAuto *l)
+static inline void qemu_bql_auto_unlock(BQLLockAuto *l)
 {
 qemu_bql_unlock();
 }
 
-G_DEFINE_AUTOPTR_CLEANUP_FUNC(IOThreadLockAuto, qemu_iothread_auto_unlock)
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(BQLLockAuto, qemu_bql_auto_unlock)
 
-#define QEMU_IOTHREAD_LOCK_GUARD() \
-g_autoptr(IOThreadLockAuto) _iothread_lock_auto __attribute__((unused)) \
-= qemu_iothread_auto_lock(__FILE__, __LINE__)
+#define QEMU_BQL_LOCK_GUARD() \
+g_autoptr(BQLLockAuto) _bql_lock_auto __attribute__((unused)) \
+= qemu_bql_auto_lock(__FILE__, __LINE__)
 
 /*
  * qemu_cond_wait_iothread: Wait on condition for the main loop mutex
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 07d0ff0253..3ab686bd79 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1127,7 +1127,7 @@ int xen_evtchn_reset_op(struct evtchn_reset *reset)
 return -ESRCH;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+QEMU_BQL_LOCK_GUARD();
 return xen_evtchn_soft_reset();
 }
 
@@ -1145,7 +1145,7 @@ int xen_evtchn_close_op(struct evtchn_close *close)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+QEMU_BQL_LOCK_GUARD();
 qemu_mutex_lock(&s->port_lock);
 
 ret = close_port(s, close->port, &flush_kvm_routes);
@@ -1272,7 +1272,7 @@ int xen_evtchn_bind_pirq_op(struct evtchn_bind_pirq *pirq)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+QEMU_BQL_LOCK_GUARD();
 
 if (s->pirq[pirq->pirq].port) {
 return -EBUSY;
@@ -1824,7 +1824,7 @@ int xen_physdev_map_pirq(struct physdev_map_pirq *map)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+QEMU_BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (map->domid != DOMID_SELF && map->domid != xen_domid) {
@@ -1884,7 +1884,7 @@ int xen_physdev_unmap_pirq(struct physdev_unmap_pirq 
*unmap)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+QEMU_BQL_LOCK_GUARD();
 qemu_mutex_lock(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
@@ -1924,7 +1924,7 @@ int xen_physdev_eoi_pirq(struct physdev_eoi *eoi)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+QEMU_BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
@@ -1956,7 +1956,7 @@ int xen_physdev_query_pirq(struct 
physdev_irq_status_query *query)
 return -ENOTSUP;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+QEMU_BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->port_lock);
 
 if (!pirq_inuse(s, pirq)) {
diff --git a/hw/i386/kvm/xen_gnttab.c b/hw/i386/kvm/xen_gnttab.c
index 0a24f53f20..ee5f8cf257 100644
--- a/hw/i386/kvm/xen_gnttab.c
+++ b/hw/i386/kvm/xen_gnttab.c
@@ -176,7 +176,7 @@ int xen_gnttab_map_page(uint64_t idx, uint64_t gfn)
 return -EINVAL;
 }
 
-QEMU_IOTHREAD_LOCK_GUARD();
+QEMU_BQL_LOCK_GUARD();
 QEMU_LOCK_GUARD(&s->gnt_lock);
 
 xen_overlay_do_map_page(&s->gnt_aliases[idx], gpa);
diff --git a/hw/mips/mips_int.c b/hw/mips/mips_int.c
index 6c32e466a3..c2454f9724 100644
--- a/hw/mips/mips_int.c
+++ b/hw/mips/mips_int.c
@@ -36,7 +36,7 @@ static void cpu_mips_irq_request(void

[PATCH 12/12] block: remove outdated AioContext locking comments

2023-11-29 Thread Stefan Hajnoczi
The AioContext lock no longer exists.

There is one noteworthy change:

  - * More specifically, these functions use BDRV_POLL_WHILE(bs), which
  - * requires the caller to be either in the main thread and hold
  - * the BlockdriverState (bs) AioContext lock, or directly in the
  - * home thread that runs the bs AioContext. Calling them from
  - * another thread in another AioContext would cause deadlocks.
  + * More specifically, these functions use BDRV_POLL_WHILE(bs), which requires
  + * the caller to be either in the main thread or directly in the home thread
  + * that runs the bs AioContext. Calling them from another thread in another
  + * AioContext would cause deadlocks.

I am not sure whether deadlocks are still possible. Maybe they have just
moved to the fine-grained locks that have replaced the AioContext. Since
I am not sure if the deadlocks are gone, I have kept the substance
unchanged and just removed mention of the AioContext.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-common.h |  3 --
 include/block/block-io.h |  9 ++--
 include/block/block_int-common.h |  2 -
 block.c  | 73 ++--
 block/block-backend.c|  8 ---
 block/export/vhost-user-blk-server.c |  4 --
 tests/qemu-iotests/202   |  2 +-
 tests/qemu-iotests/203   |  3 +-
 8 files changed, 22 insertions(+), 82 deletions(-)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index d7599564db..a846023a09 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -70,9 +70,6 @@
  * automatically takes the graph rdlock when calling the wrapped function. In
  * the same way, no_co_wrapper_bdrv_wrlock functions automatically take the
  * graph wrlock.
- *
- * If the first parameter of the function is a BlockDriverState, BdrvChild or
- * BlockBackend pointer, the AioContext lock for it is taken in the wrapper.
  */
 #define no_co_wrapper
 #define no_co_wrapper_bdrv_rdlock
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 8eb39a858b..b49e0537dd 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -332,11 +332,10 @@ bdrv_co_copy_range(BdrvChild *src, int64_t src_offset,
  * "I/O or GS" API functions. These functions can run without
  * the BQL, but only in one specific iothread/main loop.
  *
- * More specifically, these functions use BDRV_POLL_WHILE(bs), which
- * requires the caller to be either in the main thread and hold
- * the BlockdriverState (bs) AioContext lock, or directly in the
- * home thread that runs the bs AioContext. Calling them from
- * another thread in another AioContext would cause deadlocks.
+ * More specifically, these functions use BDRV_POLL_WHILE(bs), which requires
+ * the caller to be either in the main thread or directly in the home thread
+ * that runs the bs AioContext. Calling them from another thread in another
+ * AioContext would cause deadlocks.
  *
  * Therefore, these functions are not proper I/O, because they
  * can't run in *any* iothreads, but only in a specific one.
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 4e31d161c5..151279d481 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -1192,8 +1192,6 @@ struct BlockDriverState {
 /* The error object in use for blocking operations on backing_hd */
 Error *backing_blocker;
 
-/* Protected by AioContext lock */
-
 /*
  * If we are reading a disk image, give its size in sectors.
  * Generally read-only; it is written to by load_snapshot and
diff --git a/block.c b/block.c
index 91ace5d2d5..e773584dfd 100644
--- a/block.c
+++ b/block.c
@@ -1616,11 +1616,6 @@ out:
 g_free(gen_node_name);
 }
 
-/*
- * The caller must always hold @bs AioContext lock, because this function calls
- * bdrv_refresh_total_sectors() which polls when called from non-coroutine
- * context.
- */
 static int no_coroutine_fn GRAPH_UNLOCKED
 bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
  QDict *options, int open_flags, Error **errp)
@@ -2901,7 +2896,7 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission 
qapi_perm)
  * Replaces the node that a BdrvChild points to without updating permissions.
  *
  * If @new_bs is non-NULL, the parent of @child must already be drained through
- * @child and the caller must hold the AioContext lock for @new_bs.
+ * @child.
  */
 static void GRAPH_WRLOCK
 bdrv_replace_child_noperm(BdrvChild *child, BlockDriverState *new_bs)
@@ -3041,9 +3036,8 @@ static TransactionActionDrv bdrv_attach_child_common_drv 
= {
  *
  * Returns new created child.
  *
- * The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
- * @child_bs can move to a different AioContext in this function. Callers must
- * make sure that their AioContext locking is still correct after this.
+ * 

[PATCH 11/12] job: remove outdated AioContext locking comments

2023-11-29 Thread Stefan Hajnoczi
The AioContext lock no longer exists.

Signed-off-by: Stefan Hajnoczi 
---
 include/qemu/job.h | 20 
 1 file changed, 20 deletions(-)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index e502787dd8..9ea98b5927 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -67,8 +67,6 @@ typedef struct Job {
 
 /**
  * The completion function that will be called when the job completes.
- * Called with AioContext lock held, since many callback implementations
- * use bdrv_* functions that require to hold the lock.
  */
 BlockCompletionFunc *cb;
 
@@ -264,9 +262,6 @@ struct JobDriver {
  *
  * This callback will not be invoked if the job has already failed.
  * If it fails, abort and then clean will be called.
- *
- * Called with AioContext lock held, since many callbacs implementations
- * use bdrv_* functions that require to hold the lock.
  */
 int (*prepare)(Job *job);
 
@@ -277,9 +272,6 @@ struct JobDriver {
  *
  * All jobs will complete with a call to either .commit() or .abort() but
  * never both.
- *
- * Called with AioContext lock held, since many callback implementations
- * use bdrv_* functions that require to hold the lock.
  */
 void (*commit)(Job *job);
 
@@ -290,9 +282,6 @@ struct JobDriver {
  *
  * All jobs will complete with a call to either .commit() or .abort() but
  * never both.
- *
- * Called with AioContext lock held, since many callback implementations
- * use bdrv_* functions that require to hold the lock.
  */
 void (*abort)(Job *job);
 
@@ -301,9 +290,6 @@ struct JobDriver {
  * .commit() or .abort(). Regardless of which callback is invoked after
  * completion, .clean() will always be called, even if the job does not
  * belong to a transaction group.
- *
- * Called with AioContext lock held, since many callbacs implementations
- * use bdrv_* functions that require to hold the lock.
  */
 void (*clean)(Job *job);
 
@@ -318,17 +304,12 @@ struct JobDriver {
  * READY).
  * (If the callback is NULL, the job is assumed to terminate
  * without I/O.)
- *
- * Called with AioContext lock held, since many callback implementations
- * use bdrv_* functions that require to hold the lock.
  */
 bool (*cancel)(Job *job, bool force);
 
 
 /**
  * Called when the job is freed.
- * Called with AioContext lock held, since many callback implementations
- * use bdrv_* functions that require to hold the lock.
  */
 void (*free)(Job *job);
 };
@@ -424,7 +405,6 @@ void job_ref_locked(Job *job);
  * Release a reference that was previously acquired with job_ref_locked() or
  * job_create(). If it's the last reference to the object, it will be freed.
  *
- * Takes AioContext lock internally to invoke a job->driver callback.
  * Called with job lock held.
  */
 void job_unref_locked(Job *job);
-- 
2.42.0




[PATCH 10/12] scsi: remove outdated AioContext lock comment

2023-11-29 Thread Stefan Hajnoczi
The SCSI subsystem no longer uses the AioContext lock. Request
processing runs exclusively in the BlockBackend's AioContext since
"scsi: only access SCSIDevice->requests from one thread" and hence the
lock is unnecessary.

Signed-off-by: Stefan Hajnoczi 
---
 hw/scsi/scsi-disk.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index f1bd5f5c6e..ef0d21d737 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -351,7 +351,6 @@ done:
 scsi_req_unref(&r->req);
 }
 
-/* Called with AioContext lock held */
 static void scsi_dma_complete(void *opaque, int ret)
 {
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
-- 
2.42.0




[PATCH 06/12] scsi: remove AioContext locking

2023-11-29 Thread Stefan Hajnoczi
The AioContext lock no longer has any effect. Remove it.

Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio-scsi.h | 14 --
 hw/scsi/scsi-bus.c  |  2 --
 hw/scsi/scsi-disk.c | 28 
 hw/scsi/virtio-scsi.c   | 18 --
 4 files changed, 4 insertions(+), 58 deletions(-)

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index da8cb928d9..7f0573b1bf 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -101,20 +101,6 @@ struct VirtIOSCSI {
 uint32_t host_features;
 };
 
-static inline void virtio_scsi_acquire(VirtIOSCSI *s)
-{
-if (s->ctx) {
-aio_context_acquire(s->ctx);
-}
-}
-
-static inline void virtio_scsi_release(VirtIOSCSI *s)
-{
-if (s->ctx) {
-aio_context_release(s->ctx);
-}
-}
-
 void virtio_scsi_common_realize(DeviceState *dev,
 VirtIOHandleOutput ctrl,
 VirtIOHandleOutput evt,
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index b8bfde9565..0031164cc3 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1725,9 +1725,7 @@ void scsi_device_purge_requests(SCSIDevice *sdev, 
SCSISense sense)
 {
 scsi_device_for_each_req_async(sdev, scsi_device_purge_one_req, NULL);
 
-aio_context_acquire(blk_get_aio_context(sdev->conf.blk));
 blk_drain(sdev->conf.blk);
-aio_context_release(blk_get_aio_context(sdev->conf.blk));
 scsi_device_set_ua(sdev, sense);
 }
 
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 2c1bbb3530..f1bd5f5c6e 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2325,14 +2325,10 @@ static void scsi_disk_reset(DeviceState *dev)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev.qdev, dev);
 uint64_t nb_sectors;
-AioContext *ctx;
 
 scsi_device_purge_requests(&s->qdev, SENSE_CODE(RESET));
 
-ctx = blk_get_aio_context(s->qdev.conf.blk);
-aio_context_acquire(ctx);
 blk_get_geometry(s->qdev.conf.blk, &nb_sectors);
-aio_context_release(ctx);
 
 nb_sectors /= s->qdev.blocksize / BDRV_SECTOR_SIZE;
 if (nb_sectors) {
@@ -2531,13 +2527,11 @@ static void scsi_unrealize(SCSIDevice *dev)
 static void scsi_hd_realize(SCSIDevice *dev, Error **errp)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
-AioContext *ctx = NULL;
+
 /* can happen for devices without drive. The error message for missing
  * backend will be issued in scsi_realize
  */
 if (s->qdev.conf.blk) {
-ctx = blk_get_aio_context(s->qdev.conf.blk);
-aio_context_acquire(ctx);
 if (!blkconf_blocksizes(&s->qdev.conf, errp)) {
 goto out;
 }
@@ -2549,15 +2543,11 @@ static void scsi_hd_realize(SCSIDevice *dev, Error 
**errp)
 }
 scsi_realize(&s->qdev, errp);
 out:
-if (ctx) {
-aio_context_release(ctx);
-}
 }
 
 static void scsi_cd_realize(SCSIDevice *dev, Error **errp)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
-AioContext *ctx;
 int ret;
 uint32_t blocksize = 2048;
 
@@ -2573,8 +2563,6 @@ static void scsi_cd_realize(SCSIDevice *dev, Error **errp)
 blocksize = dev->conf.physical_block_size;
 }
 
-ctx = blk_get_aio_context(dev->conf.blk);
-aio_context_acquire(ctx);
 s->qdev.blocksize = blocksize;
 s->qdev.type = TYPE_ROM;
 s->features |= 1 << SCSI_DISK_F_REMOVABLE;
@@ -2582,7 +2570,6 @@ static void scsi_cd_realize(SCSIDevice *dev, Error **errp)
 s->product = g_strdup("QEMU CD-ROM");
 }
 scsi_realize(&s->qdev, errp);
-aio_context_release(ctx);
 }
 
 
@@ -2713,7 +2700,6 @@ static int get_device_type(SCSIDiskState *s)
 static void scsi_block_realize(SCSIDevice *dev, Error **errp)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
-AioContext *ctx;
 int sg_version;
 int rc;
 
@@ -2728,9 +2714,6 @@ static void scsi_block_realize(SCSIDevice *dev, Error 
**errp)
   "be removed in a future version");
 }
 
-ctx = blk_get_aio_context(s->qdev.conf.blk);
-aio_context_acquire(ctx);
-
 /* check we are using a driver managing SG_IO (version 3 and after) */
 rc = blk_ioctl(s->qdev.conf.blk, SG_GET_VERSION_NUM, &sg_version);
 if (rc < 0) {
@@ -2738,18 +2721,18 @@ static void scsi_block_realize(SCSIDevice *dev, Error 
**errp)
 if (rc != -EPERM) {
 error_append_hint(errp, "Is this a SCSI device?\n");
 }
-goto out;
+return;
 }
 if (sg_version < 3) {
 error_setg(errp, "scsi generic interface too old");
-goto out;
+return;
 }
 
 /* get device type from INQUIRY data */
 rc = get_device_type(s);
 if (rc < 0) {
 erro

[PATCH 07/12] aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED()

2023-11-29 Thread Stefan Hajnoczi
Now that the AioContext lock no longer exists, AIO_WAIT_WHILE() and
AIO_WAIT_WHILE_UNLOCKED() are equivalent.

A future patch will get rid of AIO_WAIT_WHILE_UNLOCKED().

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio-wait.h | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index 5449b6d742..157f105916 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -63,9 +63,6 @@ extern AioWait global_aio_wait;
  * @ctx: the aio context, or NULL if multiple aio contexts (for which the
  *   caller does not hold a lock) are involved in the polling condition.
  * @cond: wait while this conditional expression is true
- * @unlock: whether to unlock and then lock again @ctx. This applies
- * only when waiting for another AioContext from the main loop.
- * Otherwise it's ignored.
  *
  * Wait while a condition is true.  Use this to implement synchronous
  * operations that require event loop activity.
@@ -78,7 +75,7 @@ extern AioWait global_aio_wait;
  * wait on conditions between two IOThreads since that could lead to deadlock,
  * go via the main loop instead.
  */
-#define AIO_WAIT_WHILE_INTERNAL(ctx, cond, unlock) ({  \
+#define AIO_WAIT_WHILE_INTERNAL(ctx, cond) ({  \
 bool waited_ = false;  \
 AioWait *wait_ = &global_aio_wait; \
 AioContext *ctx_ = (ctx);  \
@@ -95,13 +92,7 @@ extern AioWait global_aio_wait;
 assert(qemu_get_current_aio_context() ==   \
qemu_get_aio_context());\
 while ((cond)) {   \
-if (unlock && ctx_) {  \
-aio_context_release(ctx_); \
-}  \
 aio_poll(qemu_get_aio_context(), true);\
-if (unlock && ctx_) {  \
-aio_context_acquire(ctx_); \
-}  \
 waited_ = true;\
 }  \
 }  \
@@ -109,10 +100,11 @@ extern AioWait global_aio_wait;
 waited_; })
 
 #define AIO_WAIT_WHILE(ctx, cond)  \
-AIO_WAIT_WHILE_INTERNAL(ctx, cond, true)
+AIO_WAIT_WHILE_INTERNAL(ctx, cond)
 
+/* TODO replace this with AIO_WAIT_WHILE() in a future patch */
 #define AIO_WAIT_WHILE_UNLOCKED(ctx, cond) \
-AIO_WAIT_WHILE_INTERNAL(ctx, cond, false)
+AIO_WAIT_WHILE_INTERNAL(ctx, cond)
 
 /**
  * aio_wait_kick:
-- 
2.42.0




[PATCH 04/12] graph-lock: remove AioContext locking

2023-11-29 Thread Stefan Hajnoczi
Stop acquiring/releasing the AioContext lock in
bdrv_graph_wrlock()/bdrv_graph_unlock() since the lock no longer has any
effect.

The distinction between bdrv_graph_wrunlock() and
bdrv_graph_wrunlock_ctx() becomes meaningless and they can be collapsed
into one function.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/graph-lock.h | 21 ++---
 block.c| 50 +++---
 block/backup.c |  4 +--
 block/blklogwrites.c   |  8 ++---
 block/blkverify.c  |  4 +--
 block/block-backend.c  | 11 +++
 block/commit.c | 16 +-
 block/graph-lock.c | 44 ++
 block/mirror.c | 22 ++---
 block/qcow2.c  |  4 +--
 block/quorum.c |  8 ++---
 block/replication.c| 14 -
 block/snapshot.c   |  4 +--
 block/stream.c | 12 +++
 block/vmdk.c   | 20 ++--
 blockdev.c |  8 ++---
 blockjob.c | 12 +++
 tests/unit/test-bdrv-drain.c   | 40 
 tests/unit/test-bdrv-graph-mod.c   | 20 ++--
 scripts/block-coroutine-wrapper.py |  4 +--
 20 files changed, 133 insertions(+), 193 deletions(-)

diff --git a/include/block/graph-lock.h b/include/block/graph-lock.h
index 22b5db1ed9..d7545e82d0 100644
--- a/include/block/graph-lock.h
+++ b/include/block/graph-lock.h
@@ -110,34 +110,17 @@ void unregister_aiocontext(AioContext *ctx);
  *
  * The wrlock can only be taken from the main loop, with BQL held, as only the
  * main loop is allowed to modify the graph.
- *
- * If @bs is non-NULL, its AioContext is temporarily released.
- *
- * This function polls. Callers must not hold the lock of any AioContext other
- * than the current one and the one of @bs.
  */
 void no_coroutine_fn TSA_ACQUIRE(graph_lock) TSA_NO_TSA
-bdrv_graph_wrlock(BlockDriverState *bs);
+bdrv_graph_wrlock(void);
 
 /*
  * bdrv_graph_wrunlock:
  * Write finished, reset global has_writer to 0 and restart
  * all readers that are waiting.
- *
- * If @bs is non-NULL, its AioContext is temporarily released.
  */
 void no_coroutine_fn TSA_RELEASE(graph_lock) TSA_NO_TSA
-bdrv_graph_wrunlock(BlockDriverState *bs);
-
-/*
- * bdrv_graph_wrunlock_ctx:
- * Write finished, reset global has_writer to 0 and restart
- * all readers that are waiting.
- *
- * If @ctx is non-NULL, its lock is temporarily released.
- */
-void no_coroutine_fn TSA_RELEASE(graph_lock) TSA_NO_TSA
-bdrv_graph_wrunlock_ctx(AioContext *ctx);
+bdrv_graph_wrunlock(void);
 
 /*
  * bdrv_graph_co_rdlock:
diff --git a/block.c b/block.c
index bfb0861ec6..25e1ebc606 100644
--- a/block.c
+++ b/block.c
@@ -1708,12 +1708,12 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver 
*drv, const char *node_name,
 open_failed:
 bs->drv = NULL;
 
-bdrv_graph_wrlock(NULL);
+bdrv_graph_wrlock();
 if (bs->file != NULL) {
 bdrv_unref_child(bs, bs->file);
 assert(!bs->file);
 }
-bdrv_graph_wrunlock(NULL);
+bdrv_graph_wrunlock();
 
 g_free(bs->opaque);
 bs->opaque = NULL;
@@ -3575,9 +3575,9 @@ int bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 
 bdrv_ref(drain_bs);
 bdrv_drained_begin(drain_bs);
-bdrv_graph_wrlock(backing_hd);
+bdrv_graph_wrlock();
 ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
-bdrv_graph_wrunlock(backing_hd);
+bdrv_graph_wrunlock();
 bdrv_drained_end(drain_bs);
 bdrv_unref(drain_bs);
 
@@ -3790,13 +3790,13 @@ BdrvChild *bdrv_open_child(const char *filename,
 return NULL;
 }
 
-bdrv_graph_wrlock(NULL);
+bdrv_graph_wrlock();
 ctx = bdrv_get_aio_context(bs);
 aio_context_acquire(ctx);
 child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
   errp);
 aio_context_release(ctx);
-bdrv_graph_wrunlock(NULL);
+bdrv_graph_wrunlock();
 
 return child;
 }
@@ -4650,9 +4650,9 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 aio_context_release(ctx);
 }
 
-bdrv_graph_wrlock(NULL);
+bdrv_graph_wrlock();
 tran_commit(tran);
-bdrv_graph_wrunlock(NULL);
+bdrv_graph_wrunlock();
 
 QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) {
 BlockDriverState *bs = bs_entry->state.bs;
@@ -4669,9 +4669,9 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 goto cleanup;
 
 abort:
-bdrv_graph_wrlock(NULL);
+bdrv_graph_wrlock();
 tran_abort(tran);
-bdrv_graph_wrunlock(NULL);
+bdrv_graph_wrunlock();
 
 QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
 if (bs_entry->prepared) {
@@ -4852,12 +4852,12 @@ bdrv_reopen_parse_file_or_backing(BDRVRe

[PATCH 09/12] docs: remove AioContext lock from IOThread docs

2023-11-29 Thread Stefan Hajnoczi
Encourage the use of locking primitives and stop mentioning the
AioContext lock since it is being removed.

Signed-off-by: Stefan Hajnoczi 
---
 docs/devel/multiple-iothreads.txt | 45 +++
 1 file changed, 15 insertions(+), 30 deletions(-)

diff --git a/docs/devel/multiple-iothreads.txt 
b/docs/devel/multiple-iothreads.txt
index a3e949f6b3..4865196bde 100644
--- a/docs/devel/multiple-iothreads.txt
+++ b/docs/devel/multiple-iothreads.txt
@@ -88,27 +88,18 @@ loop, depending on which AioContext instance the caller 
passes in.
 
 How to synchronize with an IOThread
 ---
-AioContext is not thread-safe so some rules must be followed when using file
-descriptors, event notifiers, timers, or BHs across threads:
+Variables that can be accessed by multiple threads require some form of
+synchronization such as qemu_mutex_lock(), rcu_read_lock(), etc.
 
-1. AioContext functions can always be called safely.  They handle their
-own locking internally.
-
-2. Other threads wishing to access the AioContext must use
-aio_context_acquire()/aio_context_release() for mutual exclusion.  Once the
-context is acquired no other thread can access it or run event loop iterations
-in this AioContext.
-
-Legacy code sometimes nests aio_context_acquire()/aio_context_release() calls.
-Do not use nesting anymore, it is incompatible with the BDRV_POLL_WHILE() macro
-used in the block layer and can lead to hangs.
-
-There is currently no lock ordering rule if a thread needs to acquire multiple
-AioContexts simultaneously.  Therefore, it is only safe for code holding the
-QEMU global mutex to acquire other AioContexts.
+AioContext functions like aio_set_fd_handler(), aio_set_event_notifier(),
+aio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger
+activity in an IOThread.
 
 Side note: the best way to schedule a function call across threads is to call
-aio_bh_schedule_oneshot().  No acquire/release or locking is needed.
+aio_bh_schedule_oneshot().
+
+The main loop thread can wait synchronously for a condition using
+AIO_WAIT_WHILE().
 
 AioContext and the block layer
 --
@@ -124,22 +115,16 @@ Block layer code must therefore expect to run in an 
IOThread and avoid using
 old APIs that implicitly use the main loop.  See the "How to program for
 IOThreads" above for information on how to do that.
 
-If main loop code such as a QMP function wishes to access a BlockDriverState
-it must first call aio_context_acquire(bdrv_get_aio_context(bs)) to ensure
-that callbacks in the IOThread do not run in parallel.
-
 Code running in the monitor typically needs to ensure that past
 requests from the guest are completed.  When a block device is running
 in an IOThread, the IOThread can also process requests from the guest
 (via ioeventfd).  To achieve both objects, wrap the code between
 bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
-section".  The functions must be called between aio_context_acquire()
-and aio_context_release().  You can freely release and re-acquire the
-AioContext within a drained section.
+section".
 
-Long-running jobs (usually in the form of coroutines) are best scheduled in
-the BlockDriverState's AioContext to avoid the need to acquire/release around
-each bdrv_*() call.  The functions bdrv_add/remove_aio_context_notifier,
-or alternatively blk_add/remove_aio_context_notifier if you use BlockBackends,
-can be used to get a notification whenever bdrv_try_change_aio_context() moves 
a
+Long-running jobs (usually in the form of coroutines) are often scheduled in
+the BlockDriverState's AioContext.  The functions
+bdrv_add/remove_aio_context_notifier, or alternatively
+blk_add/remove_aio_context_notifier if you use BlockBackends, can be used to
+get a notification whenever bdrv_try_change_aio_context() moves a
 BlockDriverState to a different AioContext.
-- 
2.42.0




[PATCH 08/12] aio: remove aio_context_acquire()/aio_context_release() API

2023-11-29 Thread Stefan Hajnoczi
Delete these functions because nothing calls these functions anymore.

I introduced these APIs in commit 98563fc3ec44 ("aio: add
aio_context_acquire() and aio_context_release()") in 2014. It's with a
sigh of relief that I delete these APIs almost 10 years later.

Thanks to Paolo Bonzini's vision for multi-queue QEMU, we got an
understanding of where the code needed to go in order to remove the
limitations that the original dataplane and the IOThread/AioContext
approach that followed it.

Emanuele Giuseppe Esposito had the splendid determination to convert
large parts of the codebase so that they no longer needed the AioContext
lock. This was a painstaking process, both in the actual code changes
required and the iterations of code review that Emanuele eeked out of
Kevin and me over many months.

Kevin Wolf tackled multitudes of graph locking conversions to protect
in-flight I/O from run-time changes to the block graph as well as the
clang Thread Safety Analysis annotations that allow the compiler to
check whether the graph lock is being used correctly.

And me, well, I'm just here to add some pizzazz to the QEMU multi-queue
block layer :). Thank you to everyone who helped with this effort,
including Eric Blake, code reviewer extraordinaire, and others who I've
forgotten to mention.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h | 17 -
 util/async.c| 10 --
 2 files changed, 27 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index f08b358077..af05512a7d 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -278,23 +278,6 @@ void aio_context_ref(AioContext *ctx);
  */
 void aio_context_unref(AioContext *ctx);
 
-/* Take ownership of the AioContext.  If the AioContext will be shared between
- * threads, and a thread does not want to be interrupted, it will have to
- * take ownership around calls to aio_poll().  Otherwise, aio_poll()
- * automatically takes care of calling aio_context_acquire and
- * aio_context_release.
- *
- * Note that this is separate from bdrv_drained_begin/bdrv_drained_end.  A
- * thread still has to call those to avoid being interrupted by the guest.
- *
- * Bottom halves, timers and callbacks can be created or removed without
- * acquiring the AioContext.
- */
-void aio_context_acquire(AioContext *ctx);
-
-/* Relinquish ownership of the AioContext. */
-void aio_context_release(AioContext *ctx);
-
 /**
  * aio_bh_schedule_oneshot_full: Allocate a new bottom half structure that will
  * run only once and as soon as possible.
diff --git a/util/async.c b/util/async.c
index dfd44ef612..460529057c 100644
--- a/util/async.c
+++ b/util/async.c
@@ -719,16 +719,6 @@ void aio_context_unref(AioContext *ctx)
 g_source_unref(&ctx->source);
 }
 
-void aio_context_acquire(AioContext *ctx)
-{
-/* TODO remove this function */
-}
-
-void aio_context_release(AioContext *ctx)
-{
-/* TODO remove this function */
-}
-
 QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
 
 AioContext *qemu_get_current_aio_context(void)
-- 
2.42.0




[PATCH 00/12] aio: remove AioContext lock

2023-11-29 Thread Stefan Hajnoczi
This series removes the AioContext locking APIs from QEMU.
aio_context_acquire() and aio_context_release() are currently only needed to
support the locking discipline required by AIO_POLL_WHILE() (except for a stray
user that I converted in Patch 1). AIO_POLL_WHILE() doesn't really need the
AioContext lock anymore, so it's possible to remove the API. This is a nice
simplification because the AioContext locking rules were sometimes tricky or
underspecified, leading to many bugs of the years.

This patch series removes these APIs across the codebase and cleans up the
documentation/comments that refers to them.

Patch 1 is a AioContext lock user I forgot to convert in my earlier SCSI
conversion series.

Patch 2 removes tests for the AioContext lock because they will no longer be
needed when the lock is gone.

Patches 3-9 remove the AioContext lock. These can be reviewed by categorizing
the call sites into 1. places that take the lock because they call an API that
requires the lock (ultimately AIO_POLL_WHILE()) and 2. places that take the
lock to protect state. There should be no instances of case 2 left. If you see
one, you've found a bug in this patch series!

Patches 10-12 remove comments.

Based-on: 20231123194931.171598-1-stefa...@redhat.com ("[PATCH 0/4] scsi: 
eliminate AioContext lock")
Since SCSI needs to stop relying on the AioContext lock before we can remove
the lock.

Stefan Hajnoczi (12):
  virtio-scsi: replace AioContext lock with tmf_bh_lock
  tests: remove aio_context_acquire() tests
  aio: make aio_context_acquire()/aio_context_release() a no-op
  graph-lock: remove AioContext locking
  block: remove AioContext locking
  scsi: remove AioContext locking
  aio-wait: draw equivalence between AIO_WAIT_WHILE() and
AIO_WAIT_WHILE_UNLOCKED()
  aio: remove aio_context_acquire()/aio_context_release() API
  docs: remove AioContext lock from IOThread docs
  scsi: remove outdated AioContext lock comment
  job: remove outdated AioContext locking comments
  block: remove outdated AioContext locking comments

 docs/devel/multiple-iothreads.txt|  45 ++--
 include/block/aio-wait.h |  16 +-
 include/block/aio.h  |  17 --
 include/block/block-common.h |   3 -
 include/block/block-global-state.h   |   9 +-
 include/block/block-io.h |  12 +-
 include/block/block_int-common.h |   2 -
 include/block/graph-lock.h   |  21 +-
 include/block/snapshot.h |   2 -
 include/hw/virtio/virtio-scsi.h  |  17 +-
 include/qemu/job.h   |  20 --
 block.c  | 357 ---
 block/backup.c   |   4 +-
 block/blklogwrites.c |   8 +-
 block/blkverify.c|   4 +-
 block/block-backend.c|  33 +--
 block/commit.c   |  16 +-
 block/copy-before-write.c|  22 +-
 block/export/export.c|  22 +-
 block/export/vhost-user-blk-server.c |   4 -
 block/graph-lock.c   |  44 +---
 block/io.c   |  45 +---
 block/mirror.c   |  41 +--
 block/monitor/bitmap-qmp-cmds.c  |  20 +-
 block/monitor/block-hmp-cmds.c   |  29 ---
 block/qapi-sysemu.c  |  27 +-
 block/qapi.c |  18 +-
 block/qcow2.c|   4 +-
 block/quorum.c   |   8 +-
 block/raw-format.c   |   5 -
 block/replication.c  |  72 +-
 block/snapshot.c |  26 +-
 block/stream.c   |  12 +-
 block/vmdk.c |  20 +-
 block/write-threshold.c  |   6 -
 blockdev.c   | 315 +--
 blockjob.c   |  30 +--
 hw/block/dataplane/virtio-blk.c  |  10 -
 hw/block/dataplane/xen-block.c   |  17 +-
 hw/block/virtio-blk.c|  45 +---
 hw/core/qdev-properties-system.c |   9 -
 hw/scsi/scsi-bus.c   |   2 -
 hw/scsi/scsi-disk.c  |  29 +--
 hw/scsi/virtio-scsi.c|  80 +++---
 job.c|  16 --
 migration/block.c|  33 +--
 migration/migration-hmp-cmds.c   |   3 -
 migration/savevm.c   |  22 --
 net/colo-compare.c   |   2 -
 qemu-img.c   |   4 -
 qemu-io.c|  10 +-
 qemu-nbd.c   |   2 -
 replay/replay-debugging.c|   4 -
 tests/unit/test-aio.c|  67 +
 tests/unit/test-bdrv-drain.c |  91 ++-
 tests/unit/test-bdrv-graph-mod.c |  26 +-
 tests/unit/test-block-iothread.c |  31 ---
 tests/unit/test-blockjob.c   | 137 --
 tests/unit/test-replication.c|  11 -
 util/async.c |  14 --
 util/vhost-user-server.c  

[PATCH 03/12] aio: make aio_context_acquire()/aio_context_release() a no-op

2023-11-29 Thread Stefan Hajnoczi
aio_context_acquire()/aio_context_release() has been replaced by
fine-grained locking to protect state shared by multiple threads. The
AioContext lock still plays the role of balancing locking in
AIO_WAIT_WHILE() and many functions in QEMU either require that the
AioContext lock is held or not held for this reason. In other words, the
AioContext lock is purely there for consistency with itself and serves
no real purpose anymore.

Stop actually acquiring/releasing the lock in
aio_context_acquire()/aio_context_release() so that subsequent patches
can remove callers across the codebase incrementally.

I have performed "make check" and qemu-iotests stress tests across
x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
result of eliminating the lock.

Signed-off-by: Stefan Hajnoczi 
---
 util/async.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/async.c b/util/async.c
index 8f90ddc304..04ee83d220 100644
--- a/util/async.c
+++ b/util/async.c
@@ -725,12 +725,12 @@ void aio_context_unref(AioContext *ctx)
 
 void aio_context_acquire(AioContext *ctx)
 {
-qemu_rec_mutex_lock(&ctx->lock);
+/* TODO remove this function */
 }
 
 void aio_context_release(AioContext *ctx)
 {
-qemu_rec_mutex_unlock(&ctx->lock);
+/* TODO remove this function */
 }
 
 QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
-- 
2.42.0




[PATCH 01/12] virtio-scsi: replace AioContext lock with tmf_bh_lock

2023-11-29 Thread Stefan Hajnoczi
Protect the Task Management Function BH state with a lock. The TMF BH
runs in the main loop thread. An IOThread might process a TMF at the
same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
must be protected by a lock.

Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
This avoids more locking to protect the virtqueue and SCSI layer state.

Signed-off-by: Stefan Hajnoczi 
---
 include/hw/virtio/virtio-scsi.h |  3 +-
 hw/scsi/virtio-scsi.c   | 62 ++---
 2 files changed, 43 insertions(+), 22 deletions(-)

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 779568ab5d..da8cb928d9 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -85,8 +85,9 @@ struct VirtIOSCSI {
 
 /*
  * TMFs deferred to main loop BH. These fields are protected by
- * virtio_scsi_acquire().
+ * tmf_bh_lock.
  */
+QemuMutex tmf_bh_lock;
 QEMUBH *tmf_bh;
 QTAILQ_HEAD(, VirtIOSCSIReq) tmf_bh_list;
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 9c751bf296..4f8d35facc 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -123,6 +123,30 @@ static void virtio_scsi_complete_req(VirtIOSCSIReq *req)
 virtio_scsi_free_req(req);
 }
 
+static void virtio_scsi_complete_req_bh(void *opaque)
+{
+VirtIOSCSIReq *req = opaque;
+
+virtio_scsi_complete_req(req);
+}
+
+/*
+ * Called from virtio_scsi_do_one_tmf_bh() in main loop thread. The main loop
+ * thread cannot touch the virtqueue since that could race with an IOThread.
+ */
+static void virtio_scsi_complete_req_from_main_loop(VirtIOSCSIReq *req)
+{
+VirtIOSCSI *s = req->dev;
+
+if (!s->ctx || s->ctx == qemu_get_aio_context()) {
+/* No need to schedule a BH when there is no IOThread */
+virtio_scsi_complete_req(req);
+} else {
+/* Run request completion in the IOThread */
+aio_wait_bh_oneshot(s->ctx, virtio_scsi_complete_req_bh, req);
+}
+}
+
 static void virtio_scsi_bad_req(VirtIOSCSIReq *req)
 {
 virtio_error(VIRTIO_DEVICE(req->dev), "wrong size for virtio-scsi 
headers");
@@ -338,10 +362,7 @@ static void virtio_scsi_do_one_tmf_bh(VirtIOSCSIReq *req)
 
 out:
 object_unref(OBJECT(d));
-
-virtio_scsi_acquire(s);
-virtio_scsi_complete_req(req);
-virtio_scsi_release(s);
+virtio_scsi_complete_req_from_main_loop(req);
 }
 
 /* Some TMFs must be processed from the main loop thread */
@@ -354,18 +375,16 @@ static void virtio_scsi_do_tmf_bh(void *opaque)
 
 GLOBAL_STATE_CODE();
 
-virtio_scsi_acquire(s);
+WITH_QEMU_LOCK_GUARD(&s->tmf_bh_lock) {
+QTAILQ_FOREACH_SAFE(req, &s->tmf_bh_list, next, tmp) {
+QTAILQ_REMOVE(&s->tmf_bh_list, req, next);
+QTAILQ_INSERT_TAIL(&reqs, req, next);
+}
 
-QTAILQ_FOREACH_SAFE(req, &s->tmf_bh_list, next, tmp) {
-QTAILQ_REMOVE(&s->tmf_bh_list, req, next);
-QTAILQ_INSERT_TAIL(&reqs, req, next);
+qemu_bh_delete(s->tmf_bh);
+s->tmf_bh = NULL;
 }
 
-qemu_bh_delete(s->tmf_bh);
-s->tmf_bh = NULL;
-
-virtio_scsi_release(s);
-
 QTAILQ_FOREACH_SAFE(req, &reqs, next, tmp) {
 QTAILQ_REMOVE(&reqs, req, next);
 virtio_scsi_do_one_tmf_bh(req);
@@ -379,8 +398,7 @@ static void virtio_scsi_reset_tmf_bh(VirtIOSCSI *s)
 
 GLOBAL_STATE_CODE();
 
-virtio_scsi_acquire(s);
-
+/* Called after ioeventfd has been stopped, so tmf_bh_lock is not needed */
 if (s->tmf_bh) {
 qemu_bh_delete(s->tmf_bh);
 s->tmf_bh = NULL;
@@ -393,19 +411,19 @@ static void virtio_scsi_reset_tmf_bh(VirtIOSCSI *s)
 req->resp.tmf.response = VIRTIO_SCSI_S_TARGET_FAILURE;
 virtio_scsi_complete_req(req);
 }
-
-virtio_scsi_release(s);
 }
 
 static void virtio_scsi_defer_tmf_to_bh(VirtIOSCSIReq *req)
 {
 VirtIOSCSI *s = req->dev;
 
-QTAILQ_INSERT_TAIL(&s->tmf_bh_list, req, next);
+WITH_QEMU_LOCK_GUARD(&s->tmf_bh_lock) {
+QTAILQ_INSERT_TAIL(&s->tmf_bh_list, req, next);
 
-if (!s->tmf_bh) {
-s->tmf_bh = qemu_bh_new(virtio_scsi_do_tmf_bh, s);
-qemu_bh_schedule(s->tmf_bh);
+if (!s->tmf_bh) {
+s->tmf_bh = qemu_bh_new(virtio_scsi_do_tmf_bh, s);
+qemu_bh_schedule(s->tmf_bh);
+}
 }
 }
 
@@ -1235,6 +1253,7 @@ static void virtio_scsi_device_realize(DeviceState *dev, 
Error **errp)
 Error *err = NULL;
 
 QTAILQ_INIT(&s->tmf_bh_list);
+qemu_mutex_init(&s->tmf_bh_lock);
 
 virtio_scsi_common_realize(dev,
virtio_scsi_handle_ctrl,
@@ -1277,6 +1296,7 @@ static void virtio_scsi_device_unrealize(DeviceState *dev)
 
 qbus_set_hotplug_handler(BUS(&s->bus), NULL);
 virtio_scsi_common_unrealize(dev);
+qemu_mutex_destroy(&s->tmf_bh_lock);
 }
 
 static Property virtio_scsi_properties[] = {
-- 
2.42.0




[PATCH 02/12] tests: remove aio_context_acquire() tests

2023-11-29 Thread Stefan Hajnoczi
The aio_context_acquire() API is being removed. Drop the test case that
calls the API.

Signed-off-by: Stefan Hajnoczi 
---
 tests/unit/test-aio.c | 67 +--
 1 file changed, 1 insertion(+), 66 deletions(-)

diff --git a/tests/unit/test-aio.c b/tests/unit/test-aio.c
index 337b6e4ea7..e77d86be87 100644
--- a/tests/unit/test-aio.c
+++ b/tests/unit/test-aio.c
@@ -100,76 +100,12 @@ static void event_ready_cb(EventNotifier *e)
 
 /* Tests using aio_*.  */
 
-typedef struct {
-QemuMutex start_lock;
-EventNotifier notifier;
-bool thread_acquired;
-} AcquireTestData;
-
-static void *test_acquire_thread(void *opaque)
-{
-AcquireTestData *data = opaque;
-
-/* Wait for other thread to let us start */
-qemu_mutex_lock(&data->start_lock);
-qemu_mutex_unlock(&data->start_lock);
-
-/* event_notifier_set might be called either before or after
- * the main thread's call to poll().  The test case's outcome
- * should be the same in either case.
- */
-event_notifier_set(&data->notifier);
-aio_context_acquire(ctx);
-aio_context_release(ctx);
-
-data->thread_acquired = true; /* success, we got here */
-
-return NULL;
-}
-
 static void set_event_notifier(AioContext *nctx, EventNotifier *notifier,
EventNotifierHandler *handler)
 {
 aio_set_event_notifier(nctx, notifier, handler, NULL, NULL);
 }
 
-static void dummy_notifier_read(EventNotifier *n)
-{
-event_notifier_test_and_clear(n);
-}
-
-static void test_acquire(void)
-{
-QemuThread thread;
-AcquireTestData data;
-
-/* Dummy event notifier ensures aio_poll() will block */
-event_notifier_init(&data.notifier, false);
-set_event_notifier(ctx, &data.notifier, dummy_notifier_read);
-g_assert(!aio_poll(ctx, false)); /* consume aio_notify() */
-
-qemu_mutex_init(&data.start_lock);
-qemu_mutex_lock(&data.start_lock);
-data.thread_acquired = false;
-
-qemu_thread_create(&thread, "test_acquire_thread",
-   test_acquire_thread,
-   &data, QEMU_THREAD_JOINABLE);
-
-/* Block in aio_poll(), let other thread kick us and acquire context */
-aio_context_acquire(ctx);
-qemu_mutex_unlock(&data.start_lock); /* let the thread run */
-g_assert(aio_poll(ctx, true));
-g_assert(!data.thread_acquired);
-aio_context_release(ctx);
-
-qemu_thread_join(&thread);
-set_event_notifier(ctx, &data.notifier, NULL);
-event_notifier_cleanup(&data.notifier);
-
-g_assert(data.thread_acquired);
-}
-
 static void test_bh_schedule(void)
 {
 BHTestData data = { .n = 0 };
@@ -879,7 +815,7 @@ static void test_worker_thread_co_enter(void)
 qemu_thread_get_self(&this_thread);
 co = qemu_coroutine_create(co_check_current_thread, &this_thread);
 
-qemu_thread_create(&worker_thread, "test_acquire_thread",
+qemu_thread_create(&worker_thread, "test_aio_co_enter",
test_aio_co_enter,
co, QEMU_THREAD_JOINABLE);
 
@@ -899,7 +835,6 @@ int main(int argc, char **argv)
 while (g_main_context_iteration(NULL, false));
 
 g_test_init(&argc, &argv, NULL);
-g_test_add_func("/aio/acquire", test_acquire);
 g_test_add_func("/aio/bh/schedule", test_bh_schedule);
 g_test_add_func("/aio/bh/schedule10",   test_bh_schedule10);
 g_test_add_func("/aio/bh/cancel",   test_bh_cancel);
-- 
2.42.0




Re: [PULL 00/15] xenfv.for-upstream queue

2023-11-07 Thread Stefan Hajnoczi
Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature


Re: [PULL 0/7] xenfv-stable queue

2023-11-06 Thread Stefan Hajnoczi
Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature


Re: [RFC PATCH v3 08/78] hw/block: add fallthrough pseudo-keyword

2023-10-16 Thread Stefan Hajnoczi
On Fri, Oct 13, 2023 at 11:45:36AM +0300, Emmanouil Pitsidianakis wrote:
> In preparation of raising -Wimplicit-fallthrough to 5, replace all
> fall-through comments with the fallthrough attribute pseudo-keyword.
> 
> Signed-off-by: Emmanouil Pitsidianakis 
> ---
>  hw/block/dataplane/xen-block.c | 4 ++--
>  hw/block/m25p80.c  | 2 +-
>  hw/block/onenand.c | 2 +-
>  hw/block/pflash_cfi01.c| 1 +
>  hw/block/pflash_cfi02.c| 6 --
>  5 files changed, 9 insertions(+), 6 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[PATCH v3 3/4] virtio: use defer_call() in virtio_irqfd_notify()

2023-09-13 Thread Stefan Hajnoczi
virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used
Buffer Notifications from an IOThread. This involves an eventfd
write(2) syscall. Calling this repeatedly when completing multiple I/O
requests in a row is wasteful.

Use the defer_call() API to batch together virtio_irqfd_notify() calls
made during thread pool (aio=threads), Linux AIO (aio=native), and
io_uring (aio=io_uring) completion processing.

Behavior is unchanged for emulated devices that do not use
defer_call_begin()/defer_call_end() since defer_call() immediately
invokes the callback when called outside a
defer_call_begin()/defer_call_end() region.

fio rw=randread bs=4k iodepth=64 numjobs=8 IOPS increases by ~9% with a
single IOThread and 8 vCPUs. iodepth=1 decreases by ~1% but this could
be noise. Detailed performance data and configuration specifics are
available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd

This duplicates the BH that virtio-blk uses for batching. The next
commit will remove it.

Reviewed-by: Eric Blake 
Signed-off-by: Stefan Hajnoczi 
---
 block/io_uring.c   |  6 ++
 block/linux-aio.c  |  4 
 hw/virtio/virtio.c | 13 -
 util/thread-pool.c |  5 +
 hw/virtio/trace-events |  1 +
 5 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index 3a1e1f45b3..7cdd00e9f1 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -125,6 +125,9 @@ static void luring_process_completions(LuringState *s)
 {
 struct io_uring_cqe *cqes;
 int total_bytes;
+
+defer_call_begin();
+
 /*
  * Request completion callbacks can run the nested event loop.
  * Schedule ourselves so the nested event loop will "see" remaining
@@ -217,7 +220,10 @@ end:
 aio_co_wake(luringcb->co);
 }
 }
+
 qemu_bh_cancel(s->completion_bh);
+
+defer_call_end();
 }
 
 static int ioq_submit(LuringState *s)
diff --git a/block/linux-aio.c b/block/linux-aio.c
index a2670b3e46..ec05d946f3 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -205,6 +205,8 @@ static void qemu_laio_process_completions(LinuxAioState *s)
 {
 struct io_event *events;
 
+defer_call_begin();
+
 /* Reschedule so nested event loops see currently pending completions */
 qemu_bh_schedule(s->completion_bh);
 
@@ -231,6 +233,8 @@ static void qemu_laio_process_completions(LinuxAioState *s)
  * own `for` loop.  If we are the last all counters dropped to zero. */
 s->event_max = 0;
 s->event_idx = 0;
+
+defer_call_end();
 }
 
 static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 969c25f4cf..d9aeed7012 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -15,6 +15,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-virtio.h"
 #include "trace.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
@@ -2426,6 +2427,16 @@ static bool virtio_should_notify(VirtIODevice *vdev, 
VirtQueue *vq)
 }
 }
 
+/* Batch irqs while inside a defer_call_begin()/defer_call_end() section */
+static void virtio_notify_irqfd_deferred_fn(void *opaque)
+{
+EventNotifier *notifier = opaque;
+VirtQueue *vq = container_of(notifier, VirtQueue, guest_notifier);
+
+trace_virtio_notify_irqfd_deferred_fn(vq->vdev, vq);
+event_notifier_set(notifier);
+}
+
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 {
 WITH_RCU_READ_LOCK_GUARD() {
@@ -2452,7 +2463,7 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue 
*vq)
  * to an atomic operation.
  */
 virtio_set_isr(vq->vdev, 0x1);
-event_notifier_set(&vq->guest_notifier);
+defer_call(virtio_notify_irqfd_deferred_fn, &vq->guest_notifier);
 }
 
 static void virtio_irq(VirtQueue *vq)
diff --git a/util/thread-pool.c b/util/thread-pool.c
index e3d8292d14..d84961779a 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -15,6 +15,7 @@
  * GNU GPL, version 2 or (at your option) any later version.
  */
 #include "qemu/osdep.h"
+#include "qemu/defer-call.h"
 #include "qemu/queue.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine.h"
@@ -175,6 +176,8 @@ static void thread_pool_completion_bh(void *opaque)
 ThreadPool *pool = opaque;
 ThreadPoolElement *elem, *next;
 
+defer_call_begin(); /* cb() may use defer_call() to coalesce work */
+
 restart:
 QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
 if (elem->state != THREAD_DONE) {
@@ -208,6 +211,8 @@ restart:
 qemu_aio_unref(elem);
 }
 }
+
+defer_call_end();
 }
 
 static void thread_pool_cancel(BlockAIOCB *acb)
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 7109cf1a3

[PATCH v3 2/4] util/defer-call: move defer_call() to util/

2023-09-13 Thread Stefan Hajnoczi
The networking subsystem may wish to use defer_call(), so move the code
to util/ where it can be reused.

As a reminder of what defer_call() does:

This API defers a function call within a defer_call_begin()/defer_call_end()
section, allowing multiple calls to batch up. This is a performance
optimization that is used in the block layer to submit several I/O requests
at once instead of individually:

  defer_call_begin(); <-- start of section
  ...
  defer_call(my_func, my_obj); <-- deferred my_func(my_obj) call
  defer_call(my_func, my_obj); <-- another
  defer_call(my_func, my_obj); <-- another
  ...
  defer_call_end(); <-- end of section, my_func(my_obj) is called once

Suggested-by: Ilya Maximets 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |  3 ++-
 include/qemu/defer-call.h | 16 
 include/sysemu/block-backend-io.h |  4 
 block/blkio.c |  1 +
 block/io_uring.c  |  1 +
 block/linux-aio.c |  1 +
 block/nvme.c  |  1 +
 hw/block/dataplane/xen-block.c|  1 +
 hw/block/virtio-blk.c |  1 +
 hw/scsi/virtio-scsi.c |  1 +
 block/plug.c => util/defer-call.c |  2 +-
 block/meson.build |  1 -
 util/meson.build  |  1 +
 13 files changed, 27 insertions(+), 7 deletions(-)
 create mode 100644 include/qemu/defer-call.h
 rename block/plug.c => util/defer-call.c (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 00562f924f..acda735326 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2685,12 +2685,13 @@ S: Supported
 F: util/async.c
 F: util/aio-*.c
 F: util/aio-*.h
+F: util/defer-call.c
 F: util/fdmon-*.c
 F: block/io.c
-F: block/plug.c
 F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
+F: include/qemu/defer-call.h
 F: scripts/qemugdb/aio.py
 F: tests/unit/test-fdmon-epoll.c
 T: git https://github.com/stefanha/qemu.git block
diff --git a/include/qemu/defer-call.h b/include/qemu/defer-call.h
new file mode 100644
index 00..e2c1d24572
--- /dev/null
+++ b/include/qemu/defer-call.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Deferred calls
+ *
+ * Copyright Red Hat.
+ */
+
+#ifndef QEMU_DEFER_CALL_H
+#define QEMU_DEFER_CALL_H
+
+/* See documentation in util/defer-call.c */
+void defer_call_begin(void);
+void defer_call_end(void);
+void defer_call(void (*fn)(void *), void *opaque);
+
+#endif /* QEMU_DEFER_CALL_H */
diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index cfcfd85c1d..d174275a5c 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,10 +100,6 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-void defer_call_begin(void);
-void defer_call_end(void);
-void defer_call(void (*fn)(void *), void *opaque);
-
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
diff --git a/block/blkio.c b/block/blkio.c
index 7cf6d61f47..0a0a6c0f5f 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -13,6 +13,7 @@
 #include "block/block_int.h"
 #include "exec/memory.h"
 #include "exec/cpu-common.h" /* for qemu_ram_get_fd() */
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
diff --git a/block/io_uring.c b/block/io_uring.c
index 8429f341be..3a1e1f45b3 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -15,6 +15,7 @@
 #include "block/block.h"
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "sysemu/block-backend.h"
 #include "trace.h"
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 49a37174c2..a2670b3e46 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -14,6 +14,7 @@
 #include "block/raw-aio.h"
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "sysemu/block-backend.h"
 
diff --git a/block/nvme.c b/block/nvme.c
index dfbd1085fd..96b3f8f2fa 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -16,6 +16,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index e9dd8f8a99..c4bb28c66f 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/

[PATCH v3 1/4] block: rename blk_io_plug_call() API to defer_call()

2023-09-13 Thread Stefan Hajnoczi
Prepare to move the blk_io_plug_call() API out of the block layer so
that other subsystems call use this deferred call mechanism. Rename it
to defer_call() but leave the code in block/plug.c.

The next commit will move the code out of the block layer.

Suggested-by: Ilya Maximets 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Paul Durrant 
Signed-off-by: Stefan Hajnoczi 
---
 include/sysemu/block-backend-io.h |   6 +-
 block/blkio.c |   8 +--
 block/io_uring.c  |   4 +-
 block/linux-aio.c |   4 +-
 block/nvme.c  |   4 +-
 block/plug.c  | 109 +++---
 hw/block/dataplane/xen-block.c|  10 +--
 hw/block/virtio-blk.c |   4 +-
 hw/scsi/virtio-scsi.c |   6 +-
 9 files changed, 76 insertions(+), 79 deletions(-)

diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index be4dcef59d..cfcfd85c1d 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,9 +100,9 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-void blk_io_plug(void);
-void blk_io_unplug(void);
-void blk_io_plug_call(void (*fn)(void *), void *opaque);
+void defer_call_begin(void);
+void defer_call_end(void);
+void defer_call(void (*fn)(void *), void *opaque);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
diff --git a/block/blkio.c b/block/blkio.c
index 1dd495617c..7cf6d61f47 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -312,10 +312,10 @@ static void blkio_detach_aio_context(BlockDriverState *bs)
 }
 
 /*
- * Called by blk_io_unplug() or immediately if not plugged. Called without
- * blkio_lock.
+ * Called by defer_call_end() or immediately if not in a deferred section.
+ * Called without blkio_lock.
  */
-static void blkio_unplug_fn(void *opaque)
+static void blkio_deferred_fn(void *opaque)
 {
 BDRVBlkioState *s = opaque;
 
@@ -332,7 +332,7 @@ static void blkio_submit_io(BlockDriverState *bs)
 {
 BDRVBlkioState *s = bs->opaque;
 
-blk_io_plug_call(blkio_unplug_fn, s);
+defer_call(blkio_deferred_fn, s);
 }
 
 static int coroutine_fn
diff --git a/block/io_uring.c b/block/io_uring.c
index 69d9820928..8429f341be 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -306,7 +306,7 @@ static void ioq_init(LuringQueue *io_q)
 io_q->blocked = false;
 }
 
-static void luring_unplug_fn(void *opaque)
+static void luring_deferred_fn(void *opaque)
 {
 LuringState *s = opaque;
 trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
@@ -367,7 +367,7 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, 
LuringState *s,
 return ret;
 }
 
-blk_io_plug_call(luring_unplug_fn, s);
+defer_call(luring_deferred_fn, s);
 }
 return 0;
 }
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 1a51503271..49a37174c2 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -353,7 +353,7 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t 
dev_max_batch)
 return max_batch;
 }
 
-static void laio_unplug_fn(void *opaque)
+static void laio_deferred_fn(void *opaque)
 {
 LinuxAioState *s = opaque;
 
@@ -393,7 +393,7 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) {
 ioq_submit(s);
 } else {
-blk_io_plug_call(laio_unplug_fn, s);
+defer_call(laio_deferred_fn, s);
 }
 }
 
diff --git a/block/nvme.c b/block/nvme.c
index b6e95f0b7e..dfbd1085fd 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -476,7 +476,7 @@ static void nvme_trace_command(const NvmeCmd *cmd)
 }
 }
 
-static void nvme_unplug_fn(void *opaque)
+static void nvme_deferred_fn(void *opaque)
 {
 NVMeQueuePair *q = opaque;
 
@@ -503,7 +503,7 @@ static void nvme_submit_command(NVMeQueuePair *q, 
NVMeRequest *req,
 q->need_kick++;
 qemu_mutex_unlock(&q->lock);
 
-blk_io_plug_call(nvme_unplug_fn, q);
+defer_call(nvme_deferred_fn, q);
 }
 
 static void nvme_admin_cmd_sync_cb(void *opaque, int ret)
diff --git a/block/plug.c b/block/plug.c
index 98a155d2f4..f26173559c 100644
--- a/block/plug.c
+++ b/block/plug.c
@@ -1,24 +1,21 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 /*
- * Block I/O plugging
+ * Deferred calls
  *
  * Copyright Red Hat.
  *
- * This API defers a function call within a blk_io_plug()/blk_io_unplug()
+ * This API defers a function call within a defer_call_begin()/defer_call_end()
  * section, allowing multiple calls to batch up. This is a performance
  * optimization that is used in the block layer to submit several I/O requests
  * at once instead of individually:
  *
- *   blk_io_plug(); <-- start of plugged region
+ *   d

[PATCH v3 4/4] virtio-blk: remove batch notification BH

2023-09-13 Thread Stefan Hajnoczi
There is a batching mechanism for virtio-blk Used Buffer Notifications
that is no longer needed because the previous commit added batching to
virtio_notify_irqfd().

Note that this mechanism was rarely used in practice because it is only
enabled when EVENT_IDX is not negotiated by the driver. Modern drivers
enable EVENT_IDX.

Reviewed-by: Eric Blake 
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/dataplane/virtio-blk.c | 48 +
 1 file changed, 1 insertion(+), 47 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index da36fcfd0b..f83bb0f116 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -31,9 +31,6 @@ struct VirtIOBlockDataPlane {
 
 VirtIOBlkConf *conf;
 VirtIODevice *vdev;
-QEMUBH *bh; /* bh for guest notification */
-unsigned long *batch_notify_vqs;
-bool batch_notifications;
 
 /* Note that these EventNotifiers are assigned by value.  This is
  * fine as long as you do not call event_notifier_cleanup on them
@@ -47,36 +44,7 @@ struct VirtIOBlockDataPlane {
 /* Raise an interrupt to signal guest, if necessary */
 void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq)
 {
-if (s->batch_notifications) {
-set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
-qemu_bh_schedule(s->bh);
-} else {
-virtio_notify_irqfd(s->vdev, vq);
-}
-}
-
-static void notify_guest_bh(void *opaque)
-{
-VirtIOBlockDataPlane *s = opaque;
-unsigned nvqs = s->conf->num_queues;
-unsigned long bitmap[BITS_TO_LONGS(nvqs)];
-unsigned j;
-
-memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
-memset(s->batch_notify_vqs, 0, sizeof(bitmap));
-
-for (j = 0; j < nvqs; j += BITS_PER_LONG) {
-unsigned long bits = bitmap[j / BITS_PER_LONG];
-
-while (bits != 0) {
-unsigned i = j + ctzl(bits);
-VirtQueue *vq = virtio_get_queue(s->vdev, i);
-
-virtio_notify_irqfd(s->vdev, vq);
-
-bits &= bits - 1; /* clear right-most bit */
-}
-}
+virtio_notify_irqfd(s->vdev, vq);
 }
 
 /* Context: QEMU global mutex held */
@@ -126,9 +94,6 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 } else {
 s->ctx = qemu_get_aio_context();
 }
-s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
-   &DEVICE(vdev)->mem_reentrancy_guard);
-s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
 *dataplane = s;
 
@@ -146,8 +111,6 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 
 vblk = VIRTIO_BLK(s->vdev);
 assert(!vblk->dataplane_started);
-g_free(s->batch_notify_vqs);
-qemu_bh_delete(s->bh);
 if (s->iothread) {
 object_unref(OBJECT(s->iothread));
 }
@@ -173,12 +136,6 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
 s->starting = true;
 
-if (!virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
-s->batch_notifications = true;
-} else {
-s->batch_notifications = false;
-}
-
 /* Set up guest notifier (irq) */
 r = k->set_guest_notifiers(qbus->parent, nvqs, true);
 if (r != 0) {
@@ -370,9 +327,6 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
 aio_context_release(s->ctx);
 
-qemu_bh_cancel(s->bh);
-notify_guest_bh(s); /* final chance to notify guest */
-
 /* Clean up guest notifier (irq) */
 k->set_guest_notifiers(qbus->parent, nvqs, false);
 
-- 
2.41.0




[PATCH v3 0/4] virtio-blk: use blk_io_plug_call() instead of notification BH

2023-09-13 Thread Stefan Hajnoczi
v3:
- Add comment pointing to API documentation in .c file [Philippe]
- Add virtio_notify_irqfd_deferred_fn trace event [Ilya]
- Remove outdated #include [Ilya]
v2:
- Rename blk_io_plug() to defer_call() and move it to util/ so the net
  subsystem can use it [Ilya]
- Add defer_call_begin()/end() to thread_pool_completion_bh() to match Linux
  AIO and io_uring completion batching

Replace the seldom-used virtio-blk notification BH mechanism with
blk_io_plug(). This is part of an effort to enable the multi-queue block layer
in virtio-blk. The notification BH was not multi-queue friendly.

The blk_io_plug() mechanism improves fio rw=randread bs=4k iodepth=64 numjobs=8
IOPS by ~9% with a single IOThread and 8 vCPUs (this is not even a multi-queue
block layer configuration) compared to no completion batching. iodepth=1
decreases by ~1% but this could be noise. Benchmark details are available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd

Stefan Hajnoczi (4):
  block: rename blk_io_plug_call() API to defer_call()
  util/defer-call: move defer_call() to util/
  virtio: use defer_call() in virtio_irqfd_notify()
  virtio-blk: remove batch notification BH

 MAINTAINERS   |   3 +-
 include/qemu/defer-call.h |  16 +++
 include/sysemu/block-backend-io.h |   4 -
 block/blkio.c |   9 +-
 block/io_uring.c  |  11 ++-
 block/linux-aio.c |   9 +-
 block/nvme.c  |   5 +-
 block/plug.c  | 159 --
 hw/block/dataplane/virtio-blk.c   |  48 +
 hw/block/dataplane/xen-block.c|  11 ++-
 hw/block/virtio-blk.c |   5 +-
 hw/scsi/virtio-scsi.c |   7 +-
 hw/virtio/virtio.c|  13 ++-
 util/defer-call.c | 156 +
 util/thread-pool.c|   5 +
 block/meson.build |   1 -
 hw/virtio/trace-events|   1 +
 util/meson.build  |   1 +
 18 files changed, 231 insertions(+), 233 deletions(-)
 create mode 100644 include/qemu/defer-call.h
 delete mode 100644 block/plug.c
 create mode 100644 util/defer-call.c

-- 
2.41.0




Re: [PATCH v2 2/4] util/defer-call: move defer_call() to util/

2023-09-13 Thread Stefan Hajnoczi
On Fri, Aug 18, 2023 at 10:31:40AM +0200, Philippe Mathieu-Daudé wrote:
> Hi Stefan,
> 
> On 17/8/23 17:58, Stefan Hajnoczi wrote:
> > The networking subsystem may wish to use defer_call(), so move the code
> > to util/ where it can be reused.
> > 
> > As a reminder of what defer_call() does:
> > 
> > This API defers a function call within a defer_call_begin()/defer_call_end()
> > section, allowing multiple calls to batch up. This is a performance
> > optimization that is used in the block layer to submit several I/O requests
> > at once instead of individually:
> > 
> >defer_call_begin(); <-- start of section
> >...
> >defer_call(my_func, my_obj); <-- deferred my_func(my_obj) call
> >defer_call(my_func, my_obj); <-- another
> >defer_call(my_func, my_obj); <-- another
> >...
> >defer_call_end(); <-- end of section, my_func(my_obj) is called once
> > 
> > Suggested-by: Ilya Maximets 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   MAINTAINERS   |  3 ++-
> >   include/qemu/defer-call.h | 15 +++
> >   include/sysemu/block-backend-io.h |  4 
> >   block/blkio.c |  1 +
> >   block/io_uring.c  |  1 +
> >   block/linux-aio.c |  1 +
> >   block/nvme.c  |  1 +
> >   hw/block/dataplane/xen-block.c|  1 +
> >   hw/block/virtio-blk.c |  1 +
> >   hw/scsi/virtio-scsi.c |  1 +
> >   block/plug.c => util/defer-call.c |  2 +-
> >   block/meson.build |  1 -
> >   util/meson.build  |  1 +
> >   13 files changed, 26 insertions(+), 7 deletions(-)
> >   create mode 100644 include/qemu/defer-call.h
> >   rename block/plug.c => util/defer-call.c (99%)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 6111b6b4d9..7cd7132ffc 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2676,12 +2676,13 @@ S: Supported
> >   F: util/async.c
> >   F: util/aio-*.c
> >   F: util/aio-*.h
> > +F: util/defer-call.c
> 
> If used by network/other backends, maybe worth adding a
> brand new section instead, rather than "Block I/O path".

Changes to defer-call.c will go through my block tree. We don't split
out the event loop (async.c, aio-*.c, etc) either even though it's
shared by other subsystems. The important thing is that
scripts/get_maintainer.pl identifies the maintainers.

I'd rather not create lots of micro-subsystems in MAINTAINERS that
duplicate my email and block git repo URL.

> 
> >   F: util/fdmon-*.c
> >   F: block/io.c
> > -F: block/plug.c
> >   F: migration/block*
> >   F: include/block/aio.h
> >   F: include/block/aio-wait.h
> > +F: include/qemu/defer-call.h
> >   F: scripts/qemugdb/aio.py
> >   F: tests/unit/test-fdmon-epoll.c
> >   T: git https://github.com/stefanha/qemu.git block
> > diff --git a/include/qemu/defer-call.h b/include/qemu/defer-call.h
> > new file mode 100644
> > index 00..291f86c987
> > --- /dev/null
> > +++ b/include/qemu/defer-call.h
> > @@ -0,0 +1,15 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +/*
> > + * Deferred calls
> > + *
> > + * Copyright Red Hat.
> > + */
> > +
> > +#ifndef QEMU_DEFER_CALL_H
> > +#define QEMU_DEFER_CALL_H
> > +
> 
> Please add smth like:
> 
>/* See documentation in util/defer-call.c */

Sure, will fix.

> 
> > +void defer_call_begin(void);
> > +void defer_call_end(void);
> > +void defer_call(void (*fn)(void *), void *opaque);
> > +
> > +#endif /* QEMU_DEFER_CALL_H */
> 
> Reviewed-by: Philippe Mathieu-Daudé 
> 


signature.asc
Description: PGP signature


Re: [PATCH 4/7] block/dirty-bitmap: Clean up local variable shadowing

2023-08-31 Thread Stefan Hajnoczi
On Thu, Aug 31, 2023 at 03:25:43PM +0200, Markus Armbruster wrote:
> Local variables shadowing other local variables or parameters make the
> code needlessly hard to understand.  Tracked down with -Wshadow=local.
> Clean up: delete inner declarations when they are actually redundant,
> else rename variables.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block/monitor/bitmap-qmp-cmds.c | 2 +-
>  block/qcow2-bitmap.c| 3 +--
>  2 files changed, 2 insertions(+), 3 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 5/7] block/vdi: Clean up local variable shadowing

2023-08-31 Thread Stefan Hajnoczi
On Thu, Aug 31, 2023 at 03:25:44PM +0200, Markus Armbruster wrote:
> Local variables shadowing other local variables or parameters make the
> code needlessly hard to understand.  Tracked down with -Wshadow=local.
> Clean up: delete inner declarations when they are actually redundant,
> else rename variables.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block/vdi.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 6/7] block: Clean up local variable shadowing

2023-08-31 Thread Stefan Hajnoczi
On Thu, Aug 31, 2023 at 03:25:45PM +0200, Markus Armbruster wrote:
> Local variables shadowing other local variables or parameters make the
> code needlessly hard to understand.  Tracked down with -Wshadow=local.
> Clean up: delete inner declarations when they are actually redundant,
> else rename variables.
> 
> Signed-off-by: Markus Armbruster 
> ---
>  block.c  |  7 ---
>  block/rbd.c  |  2 +-
>  block/stream.c   |  1 -
>  block/vvfat.c| 34 +-
>  hw/block/xen-block.c |  6 +++---
>  5 files changed, 25 insertions(+), 25 deletions(-)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[PATCH v2 4/4] virtio-blk: remove batch notification BH

2023-08-17 Thread Stefan Hajnoczi
There is a batching mechanism for virtio-blk Used Buffer Notifications
that is no longer needed because the previous commit added batching to
virtio_notify_irqfd().

Note that this mechanism was rarely used in practice because it is only
enabled when EVENT_IDX is not negotiated by the driver. Modern drivers
enable EVENT_IDX.

Signed-off-by: Stefan Hajnoczi 
---
 hw/block/dataplane/virtio-blk.c | 48 +
 1 file changed, 1 insertion(+), 47 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index da36fcfd0b..f83bb0f116 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -31,9 +31,6 @@ struct VirtIOBlockDataPlane {
 
 VirtIOBlkConf *conf;
 VirtIODevice *vdev;
-QEMUBH *bh; /* bh for guest notification */
-unsigned long *batch_notify_vqs;
-bool batch_notifications;
 
 /* Note that these EventNotifiers are assigned by value.  This is
  * fine as long as you do not call event_notifier_cleanup on them
@@ -47,36 +44,7 @@ struct VirtIOBlockDataPlane {
 /* Raise an interrupt to signal guest, if necessary */
 void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq)
 {
-if (s->batch_notifications) {
-set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
-qemu_bh_schedule(s->bh);
-} else {
-virtio_notify_irqfd(s->vdev, vq);
-}
-}
-
-static void notify_guest_bh(void *opaque)
-{
-VirtIOBlockDataPlane *s = opaque;
-unsigned nvqs = s->conf->num_queues;
-unsigned long bitmap[BITS_TO_LONGS(nvqs)];
-unsigned j;
-
-memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
-memset(s->batch_notify_vqs, 0, sizeof(bitmap));
-
-for (j = 0; j < nvqs; j += BITS_PER_LONG) {
-unsigned long bits = bitmap[j / BITS_PER_LONG];
-
-while (bits != 0) {
-unsigned i = j + ctzl(bits);
-VirtQueue *vq = virtio_get_queue(s->vdev, i);
-
-virtio_notify_irqfd(s->vdev, vq);
-
-bits &= bits - 1; /* clear right-most bit */
-}
-}
+virtio_notify_irqfd(s->vdev, vq);
 }
 
 /* Context: QEMU global mutex held */
@@ -126,9 +94,6 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, 
VirtIOBlkConf *conf,
 } else {
 s->ctx = qemu_get_aio_context();
 }
-s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
-   &DEVICE(vdev)->mem_reentrancy_guard);
-s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
 *dataplane = s;
 
@@ -146,8 +111,6 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 
 vblk = VIRTIO_BLK(s->vdev);
 assert(!vblk->dataplane_started);
-g_free(s->batch_notify_vqs);
-qemu_bh_delete(s->bh);
 if (s->iothread) {
 object_unref(OBJECT(s->iothread));
 }
@@ -173,12 +136,6 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
 s->starting = true;
 
-if (!virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
-s->batch_notifications = true;
-} else {
-s->batch_notifications = false;
-}
-
 /* Set up guest notifier (irq) */
 r = k->set_guest_notifiers(qbus->parent, nvqs, true);
 if (r != 0) {
@@ -370,9 +327,6 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
 aio_context_release(s->ctx);
 
-qemu_bh_cancel(s->bh);
-notify_guest_bh(s); /* final chance to notify guest */
-
 /* Clean up guest notifier (irq) */
 k->set_guest_notifiers(qbus->parent, nvqs, false);
 
-- 
2.41.0




[PATCH v2 3/4] virtio: use defer_call() in virtio_irqfd_notify()

2023-08-17 Thread Stefan Hajnoczi
virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used
Buffer Notifications from an IOThread. This involves an eventfd
write(2) syscall. Calling this repeatedly when completing multiple I/O
requests in a row is wasteful.

Use the defer_call() API to batch together virtio_irqfd_notify() calls
made during thread pool (aio=threads), Linux AIO (aio=native), and
io_uring (aio=io_uring) completion processing.

Behavior is unchanged for emulated devices that do not use
defer_call_begin()/defer_call_end() since defer_call() immediately
invokes the callback when called outside a
defer_call_begin()/defer_call_end() region.

fio rw=randread bs=4k iodepth=64 numjobs=8 IOPS increases by ~9% with a
single IOThread and 8 vCPUs. iodepth=1 decreases by ~1% but this could
be noise. Detailed performance data and configuration specifics are
available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd

This duplicates the BH that virtio-blk uses for batching. The next
commit will remove it.

Signed-off-by: Stefan Hajnoczi 
---
 block/io_uring.c   |  6 ++
 block/linux-aio.c  |  4 
 hw/virtio/virtio.c | 11 ++-
 util/thread-pool.c |  5 +
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index 3a1e1f45b3..7cdd00e9f1 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -125,6 +125,9 @@ static void luring_process_completions(LuringState *s)
 {
 struct io_uring_cqe *cqes;
 int total_bytes;
+
+defer_call_begin();
+
 /*
  * Request completion callbacks can run the nested event loop.
  * Schedule ourselves so the nested event loop will "see" remaining
@@ -217,7 +220,10 @@ end:
 aio_co_wake(luringcb->co);
 }
 }
+
 qemu_bh_cancel(s->completion_bh);
+
+defer_call_end();
 }
 
 static int ioq_submit(LuringState *s)
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 62380593c8..ab607ade6a 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -205,6 +205,8 @@ static void qemu_laio_process_completions(LinuxAioState *s)
 {
 struct io_event *events;
 
+defer_call_begin();
+
 /* Reschedule so nested event loops see currently pending completions */
 qemu_bh_schedule(s->completion_bh);
 
@@ -231,6 +233,8 @@ static void qemu_laio_process_completions(LinuxAioState *s)
  * own `for` loop.  If we are the last all counters droped to zero. */
 s->event_max = 0;
 s->event_idx = 0;
+
+defer_call_end();
 }
 
 static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 309038fd46..5eb1f91b41 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -15,6 +15,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-virtio.h"
 #include "trace.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
@@ -28,6 +29,7 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio-access.h"
+#include "sysemu/block-backend.h"
 #include "sysemu/dma.h"
 #include "sysemu/runstate.h"
 #include "virtio-qmp.h"
@@ -2426,6 +2428,13 @@ static bool virtio_should_notify(VirtIODevice *vdev, 
VirtQueue *vq)
 }
 }
 
+/* Batch irqs while inside a defer_call_begin()/defer_call_end() section */
+static void virtio_notify_irqfd_deferred_fn(void *opaque)
+{
+EventNotifier *notifier = opaque;
+event_notifier_set(notifier);
+}
+
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 {
 WITH_RCU_READ_LOCK_GUARD() {
@@ -2452,7 +2461,7 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue 
*vq)
  * to an atomic operation.
  */
 virtio_set_isr(vq->vdev, 0x1);
-event_notifier_set(&vq->guest_notifier);
+defer_call(virtio_notify_irqfd_deferred_fn, &vq->guest_notifier);
 }
 
 static void virtio_irq(VirtQueue *vq)
diff --git a/util/thread-pool.c b/util/thread-pool.c
index e3d8292d14..d84961779a 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -15,6 +15,7 @@
  * GNU GPL, version 2 or (at your option) any later version.
  */
 #include "qemu/osdep.h"
+#include "qemu/defer-call.h"
 #include "qemu/queue.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine.h"
@@ -175,6 +176,8 @@ static void thread_pool_completion_bh(void *opaque)
 ThreadPool *pool = opaque;
 ThreadPoolElement *elem, *next;
 
+defer_call_begin(); /* cb() may use defer_call() to coalesce work */
+
 restart:
 QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
 if (elem->state != THREAD_DONE) {
@@ -208,6 +211,8 @@ restart:
 qemu_aio_unref(elem);
 }
 }
+
+defer_call_end();
 }
 
 static void thread_pool_cancel(BlockAIOCB *acb)
-- 
2.41.0




[PATCH v2 2/4] util/defer-call: move defer_call() to util/

2023-08-17 Thread Stefan Hajnoczi
The networking subsystem may wish to use defer_call(), so move the code
to util/ where it can be reused.

As a reminder of what defer_call() does:

This API defers a function call within a defer_call_begin()/defer_call_end()
section, allowing multiple calls to batch up. This is a performance
optimization that is used in the block layer to submit several I/O requests
at once instead of individually:

  defer_call_begin(); <-- start of section
  ...
  defer_call(my_func, my_obj); <-- deferred my_func(my_obj) call
  defer_call(my_func, my_obj); <-- another
  defer_call(my_func, my_obj); <-- another
  ...
  defer_call_end(); <-- end of section, my_func(my_obj) is called once

Suggested-by: Ilya Maximets 
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |  3 ++-
 include/qemu/defer-call.h | 15 +++
 include/sysemu/block-backend-io.h |  4 
 block/blkio.c |  1 +
 block/io_uring.c  |  1 +
 block/linux-aio.c |  1 +
 block/nvme.c  |  1 +
 hw/block/dataplane/xen-block.c|  1 +
 hw/block/virtio-blk.c |  1 +
 hw/scsi/virtio-scsi.c |  1 +
 block/plug.c => util/defer-call.c |  2 +-
 block/meson.build |  1 -
 util/meson.build  |  1 +
 13 files changed, 26 insertions(+), 7 deletions(-)
 create mode 100644 include/qemu/defer-call.h
 rename block/plug.c => util/defer-call.c (99%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 6111b6b4d9..7cd7132ffc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2676,12 +2676,13 @@ S: Supported
 F: util/async.c
 F: util/aio-*.c
 F: util/aio-*.h
+F: util/defer-call.c
 F: util/fdmon-*.c
 F: block/io.c
-F: block/plug.c
 F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
+F: include/qemu/defer-call.h
 F: scripts/qemugdb/aio.py
 F: tests/unit/test-fdmon-epoll.c
 T: git https://github.com/stefanha/qemu.git block
diff --git a/include/qemu/defer-call.h b/include/qemu/defer-call.h
new file mode 100644
index 00..291f86c987
--- /dev/null
+++ b/include/qemu/defer-call.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Deferred calls
+ *
+ * Copyright Red Hat.
+ */
+
+#ifndef QEMU_DEFER_CALL_H
+#define QEMU_DEFER_CALL_H
+
+void defer_call_begin(void);
+void defer_call_end(void);
+void defer_call(void (*fn)(void *), void *opaque);
+
+#endif /* QEMU_DEFER_CALL_H */
diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index cfcfd85c1d..d174275a5c 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,10 +100,6 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-void defer_call_begin(void);
-void defer_call_end(void);
-void defer_call(void (*fn)(void *), void *opaque);
-
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
diff --git a/block/blkio.c b/block/blkio.c
index 7cf6d61f47..0a0a6c0f5f 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -13,6 +13,7 @@
 #include "block/block_int.h"
 #include "exec/memory.h"
 #include "exec/cpu-common.h" /* for qemu_ram_get_fd() */
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
diff --git a/block/io_uring.c b/block/io_uring.c
index 8429f341be..3a1e1f45b3 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -15,6 +15,7 @@
 #include "block/block.h"
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "sysemu/block-backend.h"
 #include "trace.h"
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 9a08219db0..62380593c8 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -14,6 +14,7 @@
 #include "block/raw-aio.h"
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "sysemu/block-backend.h"
 
diff --git a/block/nvme.c b/block/nvme.c
index dfbd1085fd..96b3f8f2fa 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -16,6 +16,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index e9dd8f8a99..c4bb28c66f 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -19,6 +19,7 @@
  */
 
 #include "qe

[PATCH v2 1/4] block: rename blk_io_plug_call() API to defer_call()

2023-08-17 Thread Stefan Hajnoczi
Prepare to move the blk_io_plug_call() API out of the block layer so
that other subsystems call use this deferred call mechanism. Rename it
to defer_call() but leave the code in block/plug.c.

The next commit will move the code out of the block layer.

Suggested-by: Ilya Maximets 
Signed-off-by: Stefan Hajnoczi 
---
 include/sysemu/block-backend-io.h |   6 +-
 block/blkio.c |   8 +--
 block/io_uring.c  |   4 +-
 block/linux-aio.c |   4 +-
 block/nvme.c  |   4 +-
 block/plug.c  | 109 +++---
 hw/block/dataplane/xen-block.c|  10 +--
 hw/block/virtio-blk.c |   4 +-
 hw/scsi/virtio-scsi.c |   6 +-
 9 files changed, 76 insertions(+), 79 deletions(-)

diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index be4dcef59d..cfcfd85c1d 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,9 +100,9 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-void blk_io_plug(void);
-void blk_io_unplug(void);
-void blk_io_plug_call(void (*fn)(void *), void *opaque);
+void defer_call_begin(void);
+void defer_call_end(void);
+void defer_call(void (*fn)(void *), void *opaque);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
diff --git a/block/blkio.c b/block/blkio.c
index 1dd495617c..7cf6d61f47 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -312,10 +312,10 @@ static void blkio_detach_aio_context(BlockDriverState *bs)
 }
 
 /*
- * Called by blk_io_unplug() or immediately if not plugged. Called without
- * blkio_lock.
+ * Called by defer_call_end() or immediately if not in a deferred section.
+ * Called without blkio_lock.
  */
-static void blkio_unplug_fn(void *opaque)
+static void blkio_deferred_fn(void *opaque)
 {
 BDRVBlkioState *s = opaque;
 
@@ -332,7 +332,7 @@ static void blkio_submit_io(BlockDriverState *bs)
 {
 BDRVBlkioState *s = bs->opaque;
 
-blk_io_plug_call(blkio_unplug_fn, s);
+defer_call(blkio_deferred_fn, s);
 }
 
 static int coroutine_fn
diff --git a/block/io_uring.c b/block/io_uring.c
index 69d9820928..8429f341be 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -306,7 +306,7 @@ static void ioq_init(LuringQueue *io_q)
 io_q->blocked = false;
 }
 
-static void luring_unplug_fn(void *opaque)
+static void luring_deferred_fn(void *opaque)
 {
 LuringState *s = opaque;
 trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
@@ -367,7 +367,7 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, 
LuringState *s,
 return ret;
 }
 
-blk_io_plug_call(luring_unplug_fn, s);
+defer_call(luring_deferred_fn, s);
 }
 return 0;
 }
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 561c71a9ae..9a08219db0 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -353,7 +353,7 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t 
dev_max_batch)
 return max_batch;
 }
 
-static void laio_unplug_fn(void *opaque)
+static void laio_deferred_fn(void *opaque)
 {
 LinuxAioState *s = opaque;
 
@@ -393,7 +393,7 @@ static int laio_do_submit(int fd, struct qemu_laiocb 
*laiocb, off_t offset,
 if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) {
 ioq_submit(s);
 } else {
-blk_io_plug_call(laio_unplug_fn, s);
+defer_call(laio_deferred_fn, s);
 }
 }
 
diff --git a/block/nvme.c b/block/nvme.c
index b6e95f0b7e..dfbd1085fd 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -476,7 +476,7 @@ static void nvme_trace_command(const NvmeCmd *cmd)
 }
 }
 
-static void nvme_unplug_fn(void *opaque)
+static void nvme_deferred_fn(void *opaque)
 {
 NVMeQueuePair *q = opaque;
 
@@ -503,7 +503,7 @@ static void nvme_submit_command(NVMeQueuePair *q, 
NVMeRequest *req,
 q->need_kick++;
 qemu_mutex_unlock(&q->lock);
 
-blk_io_plug_call(nvme_unplug_fn, q);
+defer_call(nvme_deferred_fn, q);
 }
 
 static void nvme_admin_cmd_sync_cb(void *opaque, int ret)
diff --git a/block/plug.c b/block/plug.c
index 98a155d2f4..f26173559c 100644
--- a/block/plug.c
+++ b/block/plug.c
@@ -1,24 +1,21 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 /*
- * Block I/O plugging
+ * Deferred calls
  *
  * Copyright Red Hat.
  *
- * This API defers a function call within a blk_io_plug()/blk_io_unplug()
+ * This API defers a function call within a defer_call_begin()/defer_call_end()
  * section, allowing multiple calls to batch up. This is a performance
  * optimization that is used in the block layer to submit several I/O requests
  * at once instead of individually:
  *
- *   blk_io_plug(); <-- start of plugged region
+ *   defer_call_begin(); <-- start of section
  *   ...
- * 

[PATCH v2 0/4] virtio-blk: use blk_io_plug_call() instead of notification BH

2023-08-17 Thread Stefan Hajnoczi
v2:
- Rename blk_io_plug() to defer_call() and move it to util/ so the net
  subsystem can use it [Ilya]
- Add defer_call_begin()/end() to thread_pool_completion_bh() to match Linux
  AIO and io_uring completion batching

Replace the seldom-used virtio-blk notification BH mechanism with
blk_io_plug(). This is part of an effort to enable the multi-queue block layer
in virtio-blk. The notification BH was not multi-queue friendly.

The blk_io_plug() mechanism improves fio rw=randread bs=4k iodepth=64 numjobs=8
IOPS by ~9% with a single IOThread and 8 vCPUs (this is not even a multi-queue
block layer configuration) compared to no completion batching. iodepth=1
decreases by ~1% but this could be noise. Benchmark details are available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd

Stefan Hajnoczi (4):
  block: rename blk_io_plug_call() API to defer_call()
  util/defer-call: move defer_call() to util/
  virtio: use defer_call() in virtio_irqfd_notify()
  virtio-blk: remove batch notification BH

 MAINTAINERS   |   3 +-
 include/qemu/defer-call.h |  15 +++
 include/sysemu/block-backend-io.h |   4 -
 block/blkio.c |   9 +-
 block/io_uring.c  |  11 ++-
 block/linux-aio.c |   9 +-
 block/nvme.c  |   5 +-
 block/plug.c  | 159 --
 hw/block/dataplane/virtio-blk.c   |  48 +
 hw/block/dataplane/xen-block.c|  11 ++-
 hw/block/virtio-blk.c |   5 +-
 hw/scsi/virtio-scsi.c |   7 +-
 hw/virtio/virtio.c|  11 ++-
 util/defer-call.c | 156 +
 util/thread-pool.c|   5 +
 block/meson.build |   1 -
 util/meson.build  |   1 +
 17 files changed, 227 insertions(+), 233 deletions(-)
 create mode 100644 include/qemu/defer-call.h
 delete mode 100644 block/plug.c
 create mode 100644 util/defer-call.c

-- 
2.41.0




Re: [PATCH] xen-block: fix segv on unrealize

2023-06-06 Thread Stefan Hajnoczi
Sorry!

Reviewed-by: Stefan Hajnoczi 



[PULL 1/8] block: add blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi
Introduce a new API for thread-local blk_io_plug() that does not
traverse the block graph. The goal is to make blk_io_plug() multi-queue
friendly.

Instead of having block drivers track whether or not we're in a plugged
section, provide an API that allows them to defer a function call until
we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
called multiple times with the same fn/opaque pair, then fn() is only
called once at the end of the function - resulting in batching.

This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
because the plug state is now thread-local.

Later patches convert block drivers to blk_io_plug_call() and then we
can finally remove .bdrv_co_io_plug() once all block drivers have been
converted.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-2-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |   1 +
 include/sysemu/block-backend-io.h |  13 +--
 block/block-backend.c |  22 -
 block/plug.c  | 159 ++
 hw/block/dataplane/xen-block.c|   8 +-
 hw/block/virtio-blk.c |   4 +-
 hw/scsi/virtio-scsi.c |   6 +-
 block/meson.build |   1 +
 8 files changed, 173 insertions(+), 41 deletions(-)
 create mode 100644 block/plug.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 4b025a7b63..89f274f85e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2650,6 +2650,7 @@ F: util/aio-*.c
 F: util/aio-*.h
 F: util/fdmon-*.c
 F: block/io.c
+F: block/plug.c
 F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
diff --git a/include/sysemu/block-backend-io.h 
b/include/sysemu/block-backend-io.h
index d62a7ee773..be4dcef59d 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,16 +100,9 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-/*
- * blk_io_plug/unplug are thread-local operations. This means that multiple
- * IOThreads can simultaneously call plug/unplug, but the caller must ensure
- * that each unplug() is called in the same IOThread of the matching plug().
- */
-void coroutine_fn blk_co_io_plug(BlockBackend *blk);
-void co_wrapper blk_io_plug(BlockBackend *blk);
-
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk);
-void co_wrapper blk_io_unplug(BlockBackend *blk);
+void blk_io_plug(void);
+void blk_io_unplug(void);
+void blk_io_plug_call(void (*fn)(void *), void *opaque);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index 241f643507..4009ed5fed 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2582,28 +2582,6 @@ void blk_add_insert_bs_notifier(BlockBackend *blk, 
Notifier *notify)
 notifier_list_add(&blk->insert_bs_notifiers, notify);
 }
 
-void coroutine_fn blk_co_io_plug(BlockBackend *blk)
-{
-BlockDriverState *bs = blk_bs(blk);
-IO_CODE();
-GRAPH_RDLOCK_GUARD();
-
-if (bs) {
-bdrv_co_io_plug(bs);
-}
-}
-
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk)
-{
-BlockDriverState *bs = blk_bs(blk);
-IO_CODE();
-GRAPH_RDLOCK_GUARD();
-
-if (bs) {
-bdrv_co_io_unplug(bs);
-}
-}
-
 BlockAcctStats *blk_get_stats(BlockBackend *blk)
 {
 IO_CODE();
diff --git a/block/plug.c b/block/plug.c
new file mode 100644
index 00..98a155d2f4
--- /dev/null
+++ b/block/plug.c
@@ -0,0 +1,159 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Block I/O plugging
+ *
+ * Copyright Red Hat.
+ *
+ * This API defers a function call within a blk_io_plug()/blk_io_unplug()
+ * section, allowing multiple calls to batch up. This is a performance
+ * optimization that is used in the block layer to submit several I/O requests
+ * at once instead of individually:
+ *
+ *   blk_io_plug(); <-- start of plugged region
+ *   ...
+ *   blk_io_plug_call(my_func, my_obj); <-- deferred my_func(my_obj) call
+ *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   ...
+ *   blk_io_unplug(); <-- end of plugged region, my_func(my_obj) is called once
+ *
+ * This code is actually generic and not tied to the block layer. If another
+ * subsystem needs this functionality, it could be renamed.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/coroutine-tls.h"
+#include "qemu/notify.h"
+#include "qemu/thread.h"
+#include "sysemu/block-backend.h"
+
+/* A function call that has been deferred until unplug() */
+typedef struct {
+void (*fn)(void *);
+void *opaque;
+} UnplugFn;
+
+/* Per-

[PULL 8/8] qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa

2023-06-01 Thread Stefan Hajnoczi
From: Stefano Garzarella 

The virtio-blk-vhost-vdpa driver in libblkio 1.3.0 supports the fd
passing through the new 'fd' property.

Since now we are using qemu_open() on '@path' if the virtio-blk driver
supports the fd passing, let's announce it.
In this way, the management layer can pass the file descriptor of an
already opened vhost-vdpa character device. This is useful especially
when the device can only be accessed with certain privileges.

Add the '@fdset' feature only when the virtio-blk-vhost-vdpa driver
in libblkio supports it.

Suggested-by: Markus Armbruster 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
Message-id: 20230530071941.8954-3-sgarz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 qapi/block-core.json | 6 ++
 meson.build  | 4 
 2 files changed, 10 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 98d9116dae..4bf89171c6 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3955,10 +3955,16 @@
 #
 # @path: path to the vhost-vdpa character device.
 #
+# Features:
+# @fdset: Member @path supports the special "/dev/fdset/N" path
+# (since 8.1)
+#
 # Since: 7.2
 ##
 { 'struct': 'BlockdevOptionsVirtioBlkVhostVdpa',
   'data': { 'path': 'str' },
+  'features': [ { 'name' :'fdset',
+  'if': 'CONFIG_BLKIO_VHOST_VDPA_FD' } ],
   'if': 'CONFIG_BLKIO' }
 
 ##
diff --git a/meson.build b/meson.build
index bc76ea96bf..a61d3e9b06 100644
--- a/meson.build
+++ b/meson.build
@@ -2106,6 +2106,10 @@ config_host_data.set('CONFIG_LZO', lzo.found())
 config_host_data.set('CONFIG_MPATH', mpathpersist.found())
 config_host_data.set('CONFIG_MPATH_NEW_API', mpathpersist_new_api)
 config_host_data.set('CONFIG_BLKIO', blkio.found())
+if blkio.found()
+  config_host_data.set('CONFIG_BLKIO_VHOST_VDPA_FD',
+   blkio.version().version_compare('>=1.3.0'))
+endif
 config_host_data.set('CONFIG_CURL', curl.found())
 config_host_data.set('CONFIG_CURSES', curses.found())
 config_host_data.set('CONFIG_GBM', gbm.found())
-- 
2.40.1




[PULL 6/8] block: remove bdrv_co_io_plug() API

2023-06-01 Thread Stefan Hajnoczi
No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
function pointers.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-7-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/block-io.h |  3 ---
 include/block/block_int-common.h | 11 --
 block/io.c   | 37 
 3 files changed, 51 deletions(-)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index a27e471a87..43af816d75 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -259,9 +259,6 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, 
AioContext *old_ctx);
 
 AioContext *child_of_bds_get_parent_aio_context(BdrvChild *c);
 
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_plug(BlockDriverState *bs);
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_unplug(BlockDriverState *bs);
-
 bool coroutine_fn GRAPH_RDLOCK
 bdrv_co_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
uint32_t granularity, Error **errp);
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index b1cbc1e00c..74195c3004 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -768,11 +768,6 @@ struct BlockDriver {
 void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_debug_event)(
 BlockDriverState *bs, BlkdebugEvent event);
 
-/* io queue for linux-aio */
-void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_plug)(BlockDriverState 
*bs);
-void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_unplug)(
-BlockDriverState *bs);
-
 bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs);
 
 bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_can_store_new_dirty_bitmap)(
@@ -1227,12 +1222,6 @@ struct BlockDriverState {
 unsigned int in_flight;
 unsigned int serialising_in_flight;
 
-/*
- * counter for nested bdrv_io_plug.
- * Accessed with atomic ops.
- */
-unsigned io_plugged;
-
 /* do we need to tell the quest if we have a volatile write cache? */
 int enable_write_cache;
 
diff --git a/block/io.c b/block/io.c
index 540bf8d26d..f2dfc7c405 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3223,43 +3223,6 @@ void *qemu_try_blockalign0(BlockDriverState *bs, size_t 
size)
 return mem;
 }
 
-void coroutine_fn bdrv_co_io_plug(BlockDriverState *bs)
-{
-BdrvChild *child;
-IO_CODE();
-assert_bdrv_graph_readable();
-
-QLIST_FOREACH(child, &bs->children, next) {
-bdrv_co_io_plug(child->bs);
-}
-
-if (qatomic_fetch_inc(&bs->io_plugged) == 0) {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_co_io_plug) {
-drv->bdrv_co_io_plug(bs);
-}
-}
-}
-
-void coroutine_fn bdrv_co_io_unplug(BlockDriverState *bs)
-{
-BdrvChild *child;
-IO_CODE();
-assert_bdrv_graph_readable();
-
-assert(bs->io_plugged);
-if (qatomic_fetch_dec(&bs->io_plugged) == 1) {
-BlockDriver *drv = bs->drv;
-if (drv && drv->bdrv_co_io_unplug) {
-drv->bdrv_co_io_unplug(bs);
-}
-}
-
-QLIST_FOREACH(child, &bs->children, next) {
-bdrv_co_io_unplug(child->bs);
-}
-}
-
 /* Helper that undoes bdrv_register_buf() when it fails partway through */
 static void GRAPH_RDLOCK
 bdrv_register_buf_rollback(BlockDriverState *bs, void *host, size_t size,
-- 
2.40.1




[PULL 5/8] block/linux-aio: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Note that a dev_max_batch check is dropped in laio_io_unplug() because
the semantics of unplug_fn() are different from .bdrv_co_unplug():
1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
   not every time blk_io_unplug() is called.
2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
   way to get per-BlockDriverState fields like dev_max_batch.

Therefore this condition cannot be moved to laio_unplug_fn(). It is not
obvious that this condition affects performance in practice, so I am
removing it instead of trying to come up with a more complex mechanism
to preserve the condition.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Acked-by: Kevin Wolf 
Reviewed-by: Stefano Garzarella 
Message-id: 20230530180959.1108766-6-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/raw-aio.h |  7 ---
 block/file-posix.c  | 28 
 block/linux-aio.c   | 41 +++--
 3 files changed, 11 insertions(+), 65 deletions(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index da60ca13ef..0f63c2800c 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -62,13 +62,6 @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, 
QEMUIOVector *qiov,
 
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
 void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
-
-/*
- * laio_io_plug/unplug work in the thread's current AioContext, therefore the
- * caller must ensure that they are paired in the same IOThread.
- */
-void laio_io_plug(void);
-void laio_io_unplug(uint64_t dev_max_batch);
 #endif
 /* io_uring.c - Linux io_uring implementation */
 #ifdef CONFIG_LINUX_IO_URING
diff --git a/block/file-posix.c b/block/file-posix.c
index 7baa8491dd..ac1ed54811 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2550,26 +2550,6 @@ static int coroutine_fn raw_co_pwritev(BlockDriverState 
*bs, int64_t offset,
 return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
 }
 
-static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
-{
-BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-if (s->use_linux_aio) {
-laio_io_plug();
-}
-#endif
-}
-
-static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
-{
-BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-if (s->use_linux_aio) {
-laio_io_unplug(s->aio_max_batch);
-}
-#endif
-}
-
 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
 {
 BDRVRawState *s = bs->opaque;
@@ -3914,8 +3894,6 @@ BlockDriver bdrv_file = {
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
 .bdrv_refresh_limits = raw_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
@@ -4286,8 +4264,6 @@ static BlockDriver bdrv_host_device = {
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
 .bdrv_refresh_limits = raw_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
@@ -4424,8 +4400,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
 .bdrv_refresh_limits= cdrom_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
@@ -4552,8 +4526,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
 .bdrv_refresh_limits= cdrom_refresh_limits,
-.bdrv_co_io_plug= raw_co_io_plug,
-.bdrv_co_io_unplug  = raw_co_io_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
 .bdrv_co_truncate   = raw_co_truncate,
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 916f001e32..561c71a9ae 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -15,6 +15,7 @@
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "sysemu/block-backend.h"
 
 /* Only used for assertions.  */
 #include "qemu/coroutine_int.h"
@@ -46,7 +47,6 @@ struct qemu_laio

[PULL 2/8] block/nvme: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-3-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/nvme.c   | 44 
 block/trace-events |  1 -
 2 files changed, 12 insertions(+), 33 deletions(-)

diff --git a/block/nvme.c b/block/nvme.c
index 17937d398d..7ca85bc44a 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -25,6 +25,7 @@
 #include "qemu/vfio-helpers.h"
 #include "block/block-io.h"
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 #include "sysemu/replay.h"
 #include "trace.h"
 
@@ -119,7 +120,6 @@ struct BDRVNVMeState {
 int blkshift;
 
 uint64_t max_transfer;
-bool plugged;
 
 bool supports_write_zeroes;
 bool supports_discard;
@@ -282,7 +282,7 @@ static void nvme_kick(NVMeQueuePair *q)
 {
 BDRVNVMeState *s = q->s;
 
-if (s->plugged || !q->need_kick) {
+if (!q->need_kick) {
 return;
 }
 trace_nvme_kick(s, q->index);
@@ -387,10 +387,6 @@ static bool nvme_process_completion(NVMeQueuePair *q)
 NvmeCqe *c;
 
 trace_nvme_process_completion(s, q->index, q->inflight);
-if (s->plugged) {
-trace_nvme_process_completion_queue_plugged(s, q->index);
-return false;
-}
 
 /*
  * Support re-entrancy when a request cb() function invokes aio_poll().
@@ -480,6 +476,15 @@ static void nvme_trace_command(const NvmeCmd *cmd)
 }
 }
 
+static void nvme_unplug_fn(void *opaque)
+{
+NVMeQueuePair *q = opaque;
+
+QEMU_LOCK_GUARD(&q->lock);
+nvme_kick(q);
+nvme_process_completion(q);
+}
+
 static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
 NvmeCmd *cmd, BlockCompletionFunc cb,
 void *opaque)
@@ -496,8 +501,7 @@ static void nvme_submit_command(NVMeQueuePair *q, 
NVMeRequest *req,
q->sq.tail * NVME_SQ_ENTRY_BYTES, cmd, sizeof(*cmd));
 q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE;
 q->need_kick++;
-nvme_kick(q);
-nvme_process_completion(q);
+blk_io_plug_call(nvme_unplug_fn, q);
 qemu_mutex_unlock(&q->lock);
 }
 
@@ -1567,27 +1571,6 @@ static void nvme_attach_aio_context(BlockDriverState *bs,
 }
 }
 
-static void coroutine_fn nvme_co_io_plug(BlockDriverState *bs)
-{
-BDRVNVMeState *s = bs->opaque;
-assert(!s->plugged);
-s->plugged = true;
-}
-
-static void coroutine_fn nvme_co_io_unplug(BlockDriverState *bs)
-{
-BDRVNVMeState *s = bs->opaque;
-assert(s->plugged);
-s->plugged = false;
-for (unsigned i = INDEX_IO(0); i < s->queue_count; i++) {
-NVMeQueuePair *q = s->queues[i];
-qemu_mutex_lock(&q->lock);
-nvme_kick(q);
-nvme_process_completion(q);
-qemu_mutex_unlock(&q->lock);
-}
-}
-
 static bool nvme_register_buf(BlockDriverState *bs, void *host, size_t size,
   Error **errp)
 {
@@ -1664,9 +1647,6 @@ static BlockDriver bdrv_nvme = {
 .bdrv_detach_aio_context  = nvme_detach_aio_context,
 .bdrv_attach_aio_context  = nvme_attach_aio_context,
 
-.bdrv_co_io_plug  = nvme_co_io_plug,
-.bdrv_co_io_unplug= nvme_co_io_unplug,
-
 .bdrv_register_buf= nvme_register_buf,
 .bdrv_unregister_buf  = nvme_unregister_buf,
 };
diff --git a/block/trace-events b/block/trace-events
index 32665158d6..048ad27519 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -141,7 +141,6 @@ nvme_kick(void *s, unsigned q_index) "s %p q #%u"
 nvme_dma_flush_queue_wait(void *s) "s %p"
 nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) 
"cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
 nvme_process_completion(void *s, unsigned q_index, int inflight) "s %p q #%u 
inflight %d"
-nvme_process_completion_queue_plugged(void *s, unsigned q_index) "s %p q #%u"
 nvme_complete_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
 nvme_submit_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
 nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int 
c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x"
-- 
2.40.1




[PULL 7/8] block/blkio: use qemu_open() to support fd passing for virtio-blk

2023-06-01 Thread Stefan Hajnoczi
From: Stefano Garzarella 

Some virtio-blk drivers (e.g. virtio-blk-vhost-vdpa) supports the fd
passing. Let's expose this to the user, so the management layer
can pass the file descriptor of an already opened path.

If the libblkio virtio-blk driver supports fd passing, let's always
use qemu_open() to open the `path`, so we can handle fd passing
from the management layer through the "/dev/fdset/N" special path.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
Message-id: 20230530071941.8954-2-sgarz...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/blkio.c | 53 ++-
 1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 11be8787a3..527323d625 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -673,25 +673,60 @@ static int blkio_virtio_blk_common_open(BlockDriverState 
*bs,
 {
 const char *path = qdict_get_try_str(options, "path");
 BDRVBlkioState *s = bs->opaque;
-int ret;
+bool fd_supported = false;
+int fd, ret;
 
 if (!path) {
 error_setg(errp, "missing 'path' option");
 return -EINVAL;
 }
 
-ret = blkio_set_str(s->blkio, "path", path);
-qdict_del(options, "path");
-if (ret < 0) {
-error_setg_errno(errp, -ret, "failed to set path: %s",
- blkio_get_error_msg());
-return ret;
-}
-
 if (!(flags & BDRV_O_NOCACHE)) {
 error_setg(errp, "cache.direct=off is not supported");
 return -EINVAL;
 }
+
+if (blkio_get_int(s->blkio, "fd", &fd) == 0) {
+fd_supported = true;
+}
+
+/*
+ * If the libblkio driver supports fd passing, let's always use qemu_open()
+ * to open the `path`, so we can handle fd passing from the management
+ * layer through the "/dev/fdset/N" special path.
+ */
+if (fd_supported) {
+int open_flags;
+
+if (flags & BDRV_O_RDWR) {
+open_flags = O_RDWR;
+} else {
+open_flags = O_RDONLY;
+}
+
+fd = qemu_open(path, open_flags, errp);
+if (fd < 0) {
+return -EINVAL;
+}
+
+ret = blkio_set_int(s->blkio, "fd", fd);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "failed to set fd: %s",
+ blkio_get_error_msg());
+qemu_close(fd);
+return ret;
+}
+} else {
+ret = blkio_set_str(s->blkio, "path", path);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "failed to set path: %s",
+ blkio_get_error_msg());
+return ret;
+}
+}
+
+qdict_del(options, "path");
+
 return 0;
 }
 
-- 
2.40.1




[PULL 4/8] block/io_uring: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-5-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 include/block/raw-aio.h |  7 ---
 block/file-posix.c  | 10 --
 block/io_uring.c| 44 -
 block/trace-events  |  5 ++---
 4 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index 0fe85ade77..da60ca13ef 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -81,13 +81,6 @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int 
fd, uint64_t offset,
   QEMUIOVector *qiov, int type);
 void luring_detach_aio_context(LuringState *s, AioContext *old_context);
 void luring_attach_aio_context(LuringState *s, AioContext *new_context);
-
-/*
- * luring_io_plug/unplug work in the thread's current AioContext, therefore the
- * caller must ensure that they are paired in the same IOThread.
- */
-void luring_io_plug(void);
-void luring_io_unplug(void);
 #endif
 
 #ifdef _WIN32
diff --git a/block/file-posix.c b/block/file-posix.c
index 0ab158efba..7baa8491dd 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2558,11 +2558,6 @@ static void coroutine_fn raw_co_io_plug(BlockDriverState 
*bs)
 laio_io_plug();
 }
 #endif
-#ifdef CONFIG_LINUX_IO_URING
-if (s->use_linux_io_uring) {
-luring_io_plug();
-}
-#endif
 }
 
 static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
@@ -2573,11 +2568,6 @@ static void coroutine_fn 
raw_co_io_unplug(BlockDriverState *bs)
 laio_io_unplug(s->aio_max_batch);
 }
 #endif
-#ifdef CONFIG_LINUX_IO_URING
-if (s->use_linux_io_uring) {
-luring_io_unplug();
-}
-#endif
 }
 
 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
diff --git a/block/io_uring.c b/block/io_uring.c
index 3a77480e16..69d9820928 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -16,6 +16,7 @@
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "sysemu/block-backend.h"
 #include "trace.h"
 
 /* Only used for assertions.  */
@@ -41,7 +42,6 @@ typedef struct LuringAIOCB {
 } LuringAIOCB;
 
 typedef struct LuringQueue {
-int plugged;
 unsigned int in_queue;
 unsigned int in_flight;
 bool blocked;
@@ -267,7 +267,7 @@ static void 
luring_process_completions_and_submit(LuringState *s)
 {
 luring_process_completions(s);
 
-if (!s->io_q.plugged && s->io_q.in_queue > 0) {
+if (s->io_q.in_queue > 0) {
 ioq_submit(s);
 }
 }
@@ -301,29 +301,17 @@ static void qemu_luring_poll_ready(void *opaque)
 static void ioq_init(LuringQueue *io_q)
 {
 QSIMPLEQ_INIT(&io_q->submit_queue);
-io_q->plugged = 0;
 io_q->in_queue = 0;
 io_q->in_flight = 0;
 io_q->blocked = false;
 }
 
-void luring_io_plug(void)
+static void luring_unplug_fn(void *opaque)
 {
-AioContext *ctx = qemu_get_current_aio_context();
-LuringState *s = aio_get_linux_io_uring(ctx);
-trace_luring_io_plug(s);
-s->io_q.plugged++;
-}
-
-void luring_io_unplug(void)
-{
-AioContext *ctx = qemu_get_current_aio_context();
-LuringState *s = aio_get_linux_io_uring(ctx);
-assert(s->io_q.plugged);
-trace_luring_io_unplug(s, s->io_q.blocked, s->io_q.plugged,
-   s->io_q.in_queue, s->io_q.in_flight);
-if (--s->io_q.plugged == 0 &&
-!s->io_q.blocked && s->io_q.in_queue > 0) {
+LuringState *s = opaque;
+trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
+   s->io_q.in_flight);
+if (!s->io_q.blocked && s->io_q.in_queue > 0) {
 ioq_submit(s);
 }
 }
@@ -370,14 +358,16 @@ static int luring_do_submit(int fd, LuringAIOCB 
*luringcb, LuringState *s,
 
 QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
 s->io_q.in_queue++;
-trace_luring_do_submit(s, s->io_q.blocked, s->io_q.plugged,
-   s->io_q.in_queue, s->io_q.in_flight);
-if (!s->io_q.blocked &&
-(!s->io_q.plugged ||
- s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
-ret = ioq_submit(s);
-trace_luring_do_submit_done(s, ret);
-return ret;
+trace_luring_do_submit(s, s->io_q.blocked, s->io_q.in_queue,
+   s->io_q.in_flight);
+if (!s->io_q.blocked) {
+if (s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES) {
+ret = ioq_submit(s)

[PULL 3/8] block/blkio: convert to blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi
Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Eric Blake 
Reviewed-by: Stefano Garzarella 
Acked-by: Kevin Wolf 
Message-id: 20230530180959.1108766-4-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 
---
 block/blkio.c | 43 ---
 1 file changed, 24 insertions(+), 19 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 72117fa005..11be8787a3 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -17,6 +17,7 @@
 #include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
+#include "sysemu/block-backend.h"
 #include "exec/memory.h" /* for ram_block_discard_disable() */
 
 #include "block/block-io.h"
@@ -320,16 +321,30 @@ static void blkio_detach_aio_context(BlockDriverState *bs)
NULL, NULL, NULL);
 }
 
-/* Call with s->blkio_lock held to submit I/O after enqueuing a new request */
-static void blkio_submit_io(BlockDriverState *bs)
+/*
+ * Called by blk_io_unplug() or immediately if not plugged. Called without
+ * blkio_lock.
+ */
+static void blkio_unplug_fn(void *opaque)
 {
-if (qatomic_read(&bs->io_plugged) == 0) {
-BDRVBlkioState *s = bs->opaque;
+BDRVBlkioState *s = opaque;
 
+WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_do_io(s->blkioq, NULL, 0, 0, NULL);
 }
 }
 
+/*
+ * Schedule I/O submission after enqueuing a new request. Called without
+ * blkio_lock.
+ */
+static void blkio_submit_io(BlockDriverState *bs)
+{
+BDRVBlkioState *s = bs->opaque;
+
+blk_io_plug_call(blkio_unplug_fn, s);
+}
+
 static int coroutine_fn
 blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
@@ -340,9 +355,9 @@ blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, 
int64_t bytes)
 
 WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_discard(s->blkioq, offset, bytes, &cod, 0);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 return cod.ret;
 }
@@ -373,9 +388,9 @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, 
int64_t bytes,
 
 WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_readv(s->blkioq, offset, iov, iovcnt, &cod, 0);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 
 if (use_bounce_buffer) {
@@ -418,9 +433,9 @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState 
*bs, int64_t offset,
 
 WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_writev(s->blkioq, offset, iov, iovcnt, &cod, blkio_flags);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 
 if (use_bounce_buffer) {
@@ -439,9 +454,9 @@ static int coroutine_fn blkio_co_flush(BlockDriverState *bs)
 
 WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_flush(s->blkioq, &cod, 0);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 return cod.ret;
 }
@@ -467,22 +482,13 @@ static int coroutine_fn 
blkio_co_pwrite_zeroes(BlockDriverState *bs,
 
 WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
 blkioq_write_zeroes(s->blkioq, offset, bytes, &cod, blkio_flags);
-blkio_submit_io(bs);
 }
 
+blkio_submit_io(bs);
 qemu_coroutine_yield();
 return cod.ret;
 }
 
-static void coroutine_fn blkio_co_io_unplug(BlockDriverState *bs)
-{
-BDRVBlkioState *s = bs->opaque;
-
-WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
-blkio_submit_io(bs);
-}
-}
-
 typedef enum {
 BMRR_OK,
 BMRR_SKIP,
@@ -1004,7 +1010,6 @@ static void blkio_refresh_limits(BlockDriverState *bs, 
Error **errp)
 .bdrv_co_pwritev = blkio_co_pwritev, \
 .bdrv_co_flush_to_disk   = blkio_co_flush, \
 .bdrv_co_pwrite_zeroes   = blkio_co_pwrite_zeroes, \
-.bdrv_co_io_unplug   = blkio_co_io_unplug, \
 .bdrv_refresh_limits = blkio_refresh_limits, \
 .bdrv_register_buf   = blkio_register_buf, \
 .bdrv_unregister_buf = blkio_unregister_buf, \
-- 
2.40.1




[PULL 0/8] Block patches

2023-06-01 Thread Stefan Hajnoczi
The following changes since commit c6a5fc2ac76c5ab709896ee1b0edd33685a67ed1:

  decodetree: Add --output-null for meson testing (2023-05-31 19:56:42 -0700)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 98b126f5e3228a346c774e569e26689943b401dd:

  qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa (2023-06-01 
11:08:21 -0400)


Pull request

- Stefano Garzarella's blkio block driver 'fd' parameter
- My thread-local blk_io_plug() series

----

Stefan Hajnoczi (6):
  block: add blk_io_plug_call() API
  block/nvme: convert to blk_io_plug_call() API
  block/blkio: convert to blk_io_plug_call() API
  block/io_uring: convert to blk_io_plug_call() API
  block/linux-aio: convert to blk_io_plug_call() API
  block: remove bdrv_co_io_plug() API

Stefano Garzarella (2):
  block/blkio: use qemu_open() to support fd passing for virtio-blk
  qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa

 MAINTAINERS   |   1 +
 qapi/block-core.json  |   6 ++
 meson.build   |   4 +
 include/block/block-io.h  |   3 -
 include/block/block_int-common.h  |  11 ---
 include/block/raw-aio.h   |  14 ---
 include/sysemu/block-backend-io.h |  13 +--
 block/blkio.c |  96 --
 block/block-backend.c |  22 -
 block/file-posix.c|  38 ---
 block/io.c|  37 ---
 block/io_uring.c  |  44 -
 block/linux-aio.c |  41 +++-
 block/nvme.c  |  44 +++--
 block/plug.c  | 159 ++
 hw/block/dataplane/xen-block.c|   8 +-
 hw/block/virtio-blk.c |   4 +-
 hw/scsi/virtio-scsi.c |   6 +-
 block/meson.build |   1 +
 block/trace-events|   6 +-
 20 files changed, 293 insertions(+), 265 deletions(-)
 create mode 100644 block/plug.c

-- 
2.40.1




Re: [PATCH v3 0/6] block: add blk_io_plug_call() API

2023-06-01 Thread Stefan Hajnoczi
On Tue, May 30, 2023 at 02:09:53PM -0400, Stefan Hajnoczi wrote:
> v3
> - Patch 5: Mention why dev_max_batch condition was dropped [Stefano]
> v2
> - Patch 1: "is not be freed" -> "is not freed" [Eric]
> - Patch 2: Remove unused nvme_process_completion_queue_plugged trace event
>   [Stefano]
> - Patch 3: Add missing #include and fix blkio_unplug_fn() prototype [Stefano]
> - Patch 4: Removed whitespace hunk [Eric]
> 
> The existing blk_io_plug() API is not block layer multi-queue friendly because
> the plug state is per-BlockDriverState.
> 
> Change blk_io_plug()'s implementation so it is thread-local. This is done by
> introducing the blk_io_plug_call() function that block drivers use to batch
> calls while plugged. It is relatively easy to convert block drivers from
> .bdrv_co_io_plug() to blk_io_plug_call().
> 
> Random read 4KB performance with virtio-blk on a host NVMe block device:
> 
> iodepth   iops   change vs today
> 145612   -4%
> 287967   +2%
> 4   129872   +0%
> 8   171096   -3%
> 16  194508   -4%
> 32  208947   -1%
> 64  217647   +0%
> 128 229629   +0%
> 
> The results are within the noise for these benchmarks. This is to be expected
> because the plugging behavior for a single thread hasn't changed in this patch
> series, only that the state is thread-local now.
> 
> The following graph compares several approaches:
> https://vmsplice.net/~stefan/blk_io_plug-thread-local.png
> - v7.2.0: before most of the multi-queue block layer changes landed.
> - with-blk_io_plug: today's post-8.0.0 QEMU.
> - blk_io_plug-thread-local: this patch series.
> - no-blk_io_plug: what happens when we simply remove plugging?
> - call-after-dispatch: what if we integrate plugging into the event loop? I
>   decided against this approach in the end because it's more likely to
>   introduce performance regressions since I/O submission is deferred until the
>   end of the event loop iteration.
> 
> Aside from the no-blk_io_plug case, which bottlenecks much earlier than the
> others, we see that all plugging approaches are more or less equivalent in 
> this
> benchmark. It is also clear that QEMU 8.0.0 has lower performance than 7.2.0.
> 
> The Ansible playbook, fio results, and a Jupyter notebook are available here:
> https://github.com/stefanha/qemu-perf/tree/remove-blk_io_plug
> 
> Stefan Hajnoczi (6):
>   block: add blk_io_plug_call() API
>   block/nvme: convert to blk_io_plug_call() API
>   block/blkio: convert to blk_io_plug_call() API
>   block/io_uring: convert to blk_io_plug_call() API
>   block/linux-aio: convert to blk_io_plug_call() API
>   block: remove bdrv_co_io_plug() API
> 
>  MAINTAINERS   |   1 +
>  include/block/block-io.h  |   3 -
>  include/block/block_int-common.h  |  11 ---
>  include/block/raw-aio.h   |  14 ---
>  include/sysemu/block-backend-io.h |  13 +--
>  block/blkio.c |  43 
>  block/block-backend.c |  22 -
>  block/file-posix.c|  38 ---
>  block/io.c|  37 ---
>  block/io_uring.c  |  44 -
>  block/linux-aio.c |  41 +++-
>  block/nvme.c  |  44 +++--
>  block/plug.c  | 159 ++
>  hw/block/dataplane/xen-block.c|   8 +-
>  hw/block/virtio-blk.c |   4 +-
>  hw/scsi/virtio-scsi.c |   6 +-
>  block/meson.build |   1 +
>  block/trace-events|   6 +-
>  18 files changed, 239 insertions(+), 256 deletions(-)
>  create mode 100644 block/plug.c
> 
> -- 
> 2.40.1
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature


  1   2   3   4   >