Re: [PATCH] xen/events: close evtchn after mapping cleanup

2024-02-08 Thread Maximilian Heyne
On Wed, Jan 24, 2024 at 04:31:28PM +, Maximilian Heyne wrote:
> shutdown_pirq and startup_pirq are not taking the
> irq_mapping_update_lock because they can't due to lock inversion. Both
> are called with the irq_desc->lock being taking. The lock order,
> however, is first irq_mapping_update_lock and then irq_desc->lock.
> 
> This opens multiple races:
> - shutdown_pirq can be interrupted by a function that allocates an event
>   channel:
> 
>   CPU0CPU1
>   shutdown_pirq {
> xen_evtchn_close(e)
>   __startup_pirq {
> EVTCHNOP_bind_pirq
>   -> returns just freed evtchn e
> set_evtchn_to_irq(e, irq)
>   }
> xen_irq_info_cleanup() {
>   set_evtchn_to_irq(e, -1)
> }
>   }
> 
>   Assume here event channel e refers here to the same event channel
>   number.
>   After this race the evtchn_to_irq mapping for e is invalid (-1).
> 
> - __startup_pirq races with __unbind_from_irq in a similar way. Because
>   __startup_pirq doesn't take irq_mapping_update_lock it can grab the
>   evtchn that __unbind_from_irq is currently freeing and cleaning up. In
>   this case even though the event channel is allocated, its mapping can
>   be unset in evtchn_to_irq.
> 
> The fix is to first cleanup the mappings and then close the event
> channel. In this way, when an event channel gets allocated it's
> potential previous evtchn_to_irq mappings are guaranteed to be unset already.
> This is also the reverse order of the allocation where first the event
> channel is allocated and then the mappings are setup.
> 
> On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal
> [un]bind interfaces"), we hit a BUG like the following during probing of NVMe
> devices. The issue is that during nvme_setup_io_queues, pci_free_irq
> is called for every device which results in a call to shutdown_pirq.
> With many nvme devices it's therefore likely to hit this race during
> boot because there will be multiple calls to shutdown_pirq and
> startup_pirq are running potentially in parallel.
> 
>   [ cut here ]
>   blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; 
> indirect descriptors: enabled; bounce buffer: enabled
>   kernel BUG at drivers/xen/events/events_base.c:499!
>   invalid opcode:  [#1] SMP PTI
>   CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 
> 5.10.201-191.748.amzn2.x86_64 #1
>   Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
>   Workqueue: nvme-reset-wq nvme_reset_work
>   RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
>   Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 
> 84 64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 
> 1f 84 00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
>   RSP: :c9000d533b08 EFLAGS: 00010046
>   RAX:  RBX:  RCX: 0006
>   RDX: 0028 RSI:  RDI: 
>   RBP: 888107419680 R08:  R09: 82d72b00
>   R10:  R11:  R12: 01ed
>   R13:  R14:  R15: 0002
>   FS:  () GS:88bc8b50() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2:  CR3: 02610001 CR4: 001706e0
>   DR0:  DR1:  DR2: 
>   DR3:  DR6: fffe0ff0 DR7: 0400
>   Call Trace:
>? show_trace_log_lvl+0x1c1/0x2d9
>? show_trace_log_lvl+0x1c1/0x2d9
>? set_affinity_irq+0xdc/0x1c0
>? __die_body.cold+0x8/0xd
>? die+0x2b/0x50
>? do_trap+0x90/0x110
>? bind_evtchn_to_cpu+0xdf/0xf0
>? do_error_trap+0x65/0x80
>? bind_evtchn_to_cpu+0xdf/0xf0
>? exc_invalid_op+0x4e/0x70
>? bind_evtchn_to_cpu+0xdf/0xf0
>? asm_exc_invalid_op+0x12/0x20
>? bind_evtchn_to_cpu+0xdf/0xf0
>? bind_evtchn_to_cpu+0xc5/0xf0
>set_affinity_irq+0xdc/0x1c0
>irq_do_set_affinity+0x1d7/0x1f0
>irq_setup_affinity+0xd6/0x1a0
>irq_startup+0x8a/0xf0
>__setup_irq+0x639/0x6d0
>? nvme_suspend+0x150/0x150
>request_threaded_irq+0x10c/0x180
>? nvme_suspend+0x150/0x150
>pci_request_irq+0xa8/0xf0
>? __blk_mq_free_request+0x74/0xa0
>queue_request_irq+0x6f/0x80
>nvme_create_queue+0x1af/0x200
>nvme_create_io_queues+0xbd/0xf0
>nvme_setup_io_queues+0x246/0x320
>? nvme_irq_check+0x30/0x30
>nvme_reset_w

[PATCH] xen/events: close evtchn after mapping cleanup

2024-01-24 Thread Maximilian Heyne
shutdown_pirq and startup_pirq are not taking the
irq_mapping_update_lock because they can't due to lock inversion. Both
are called with the irq_desc->lock being taking. The lock order,
however, is first irq_mapping_update_lock and then irq_desc->lock.

This opens multiple races:
- shutdown_pirq can be interrupted by a function that allocates an event
  channel:

  CPU0CPU1
  shutdown_pirq {
xen_evtchn_close(e)
  __startup_pirq {
EVTCHNOP_bind_pirq
  -> returns just freed evtchn e
set_evtchn_to_irq(e, irq)
  }
xen_irq_info_cleanup() {
  set_evtchn_to_irq(e, -1)
}
  }

  Assume here event channel e refers here to the same event channel
  number.
  After this race the evtchn_to_irq mapping for e is invalid (-1).

- __startup_pirq races with __unbind_from_irq in a similar way. Because
  __startup_pirq doesn't take irq_mapping_update_lock it can grab the
  evtchn that __unbind_from_irq is currently freeing and cleaning up. In
  this case even though the event channel is allocated, its mapping can
  be unset in evtchn_to_irq.

The fix is to first cleanup the mappings and then close the event
channel. In this way, when an event channel gets allocated it's
potential previous evtchn_to_irq mappings are guaranteed to be unset already.
This is also the reverse order of the allocation where first the event
channel is allocated and then the mappings are setup.

On a 5.10 kernel prior to commit 3fcdaf3d7634 ("xen/events: modify internal
[un]bind interfaces"), we hit a BUG like the following during probing of NVMe
devices. The issue is that during nvme_setup_io_queues, pci_free_irq
is called for every device which results in a call to shutdown_pirq.
With many nvme devices it's therefore likely to hit this race during
boot because there will be multiple calls to shutdown_pirq and
startup_pirq are running potentially in parallel.

  [ cut here ]
  blkfront: xvda: barrier or flush: disabled; persistent grants: enabled; 
indirect descriptors: enabled; bounce buffer: enabled
  kernel BUG at drivers/xen/events/events_base.c:499!
  invalid opcode:  [#1] SMP PTI
  CPU: 44 PID: 375 Comm: kworker/u257:23 Not tainted 
5.10.201-191.748.amzn2.x86_64 #1
  Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
  Workqueue: nvme-reset-wq nvme_reset_work
  RIP: 0010:bind_evtchn_to_cpu+0xdf/0xf0
  Code: 5d 41 5e c3 cc cc cc cc 44 89 f7 e8 2b 55 ad ff 49 89 c5 48 85 c0 0f 84 
64 ff ff ff 4c 8b 68 30 41 83 fe ff 0f 85 60 ff ff ff <0f> 0b 66 66 2e 0f 1f 84 
00 00 00 00 00 0f 1f 40 00 0f 1f 44 00 00
  RSP: :c9000d533b08 EFLAGS: 00010046
  RAX:  RBX:  RCX: 0006
  RDX: 0028 RSI:  RDI: 
  RBP: 888107419680 R08:  R09: 82d72b00
  R10:  R11:  R12: 01ed
  R13:  R14:  R15: 0002
  FS:  () GS:88bc8b50() knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2:  CR3: 02610001 CR4: 001706e0
  DR0:  DR1:  DR2: 
  DR3:  DR6: fffe0ff0 DR7: 0400
  Call Trace:
   ? show_trace_log_lvl+0x1c1/0x2d9
   ? show_trace_log_lvl+0x1c1/0x2d9
   ? set_affinity_irq+0xdc/0x1c0
   ? __die_body.cold+0x8/0xd
   ? die+0x2b/0x50
   ? do_trap+0x90/0x110
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? do_error_trap+0x65/0x80
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? exc_invalid_op+0x4e/0x70
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? asm_exc_invalid_op+0x12/0x20
   ? bind_evtchn_to_cpu+0xdf/0xf0
   ? bind_evtchn_to_cpu+0xc5/0xf0
   set_affinity_irq+0xdc/0x1c0
   irq_do_set_affinity+0x1d7/0x1f0
   irq_setup_affinity+0xd6/0x1a0
   irq_startup+0x8a/0xf0
   __setup_irq+0x639/0x6d0
   ? nvme_suspend+0x150/0x150
   request_threaded_irq+0x10c/0x180
   ? nvme_suspend+0x150/0x150
   pci_request_irq+0xa8/0xf0
   ? __blk_mq_free_request+0x74/0xa0
   queue_request_irq+0x6f/0x80
   nvme_create_queue+0x1af/0x200
   nvme_create_io_queues+0xbd/0xf0
   nvme_setup_io_queues+0x246/0x320
   ? nvme_irq_check+0x30/0x30
   nvme_reset_work+0x1c8/0x400
   process_one_work+0x1b0/0x350
   worker_thread+0x49/0x310
   ? process_one_work+0x350/0x350
   kthread+0x11b/0x140
   ? __kthread_bind_mask+0x60/0x60
   ret_from_fork+0x22/0x30
  Modules linked in:
  ---[ end trace a11715de1eee1873 ]---

Fixes: d46a78b05c0e ("xen: implement pirq type event channels")
Cc: sta...@vger.kernel.org
Co-debugged-by: Andrew Panyakin 
Signed-off-by: Maximilian Heyne 
---
 drivers/xen/events/events_base.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/events/events_base.c

Re: [PATCH] x86/pci/xen: populate MSI sysfs entries

2023-05-24 Thread Maximilian Heyne
On Wed, May 03, 2023 at 01:16:53PM +, Maximilian Heyne wrote:
> Commit bf5e758f02fc ("genirq/msi: Simplify sysfs handling") reworked the
> creation of sysfs entries for MSI IRQs. The creation used to be in
> msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs.
> Then it moved into __msi_domain_alloc_irqs which is an implementation of
> domain_alloc_irqs. However, Xen comes with the only other implementation
> of domain_alloc_irqs and hence doesn't run the sysfs population code
> anymore.
> 
> Commit 6c796996ee70 ("x86/pci/xen: Fixup fallout from the PCI/MSI
> overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info
> but that doesn't actually have an effect because Xen uses it's own
> domain_alloc_irqs implementation.
> 
> Fix this by making use of the fallback functions for sysfs population.
> 
> Fixes: bf5e758f02fc ("genirq/msi: Simplify sysfs handling")
> Signed-off-by: Maximilian Heyne 


Any other feedback on this one? This is definitely a bug but I understand that
there might be different ways to fix it.



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH] x86/pci/xen: populate MSI sysfs entries

2023-05-03 Thread Maximilian Heyne
Commit bf5e758f02fc ("genirq/msi: Simplify sysfs handling") reworked the
creation of sysfs entries for MSI IRQs. The creation used to be in
msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs.
Then it moved into __msi_domain_alloc_irqs which is an implementation of
domain_alloc_irqs. However, Xen comes with the only other implementation
of domain_alloc_irqs and hence doesn't run the sysfs population code
anymore.

Commit 6c796996ee70 ("x86/pci/xen: Fixup fallout from the PCI/MSI
overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info
but that doesn't actually have an effect because Xen uses it's own
domain_alloc_irqs implementation.

Fix this by making use of the fallback functions for sysfs population.

Fixes: bf5e758f02fc ("genirq/msi: Simplify sysfs handling")
Signed-off-by: Maximilian Heyne 
---
 arch/x86/pci/xen.c  | 8 +---
 include/linux/msi.h | 9 -
 kernel/irq/msi.c| 4 ++--
 3 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 8babce71915f..014c508e914d 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -198,7 +198,7 @@ static int xen_setup_msi_irqs(struct pci_dev *dev, int 
nvec, int type)
i++;
}
kfree(v);
-   return 0;
+   return msi_device_populate_sysfs(>dev);
 
 error:
if (ret == -ENOSYS)
@@ -254,7 +254,7 @@ static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int 
nvec, int type)
dev_dbg(>dev,
"xen: msi --> pirq=%d --> irq=%d\n", pirq, irq);
}
-   return 0;
+   return msi_device_populate_sysfs(>dev);
 
 error:
dev_err(>dev, "Failed to create MSI%s! ret=%d!\n",
@@ -346,7 +346,7 @@ static int xen_initdom_setup_msi_irqs(struct pci_dev *dev, 
int nvec, int type)
if (ret < 0)
goto out;
}
-   ret = 0;
+   ret = msi_device_populate_sysfs(>dev);
 out:
return ret;
 }
@@ -394,6 +394,8 @@ static void xen_teardown_msi_irqs(struct pci_dev *dev)
xen_destroy_irq(msidesc->irq + i);
msidesc->irq = 0;
}
+
+   msi_device_destroy_sysfs(>dev);
 }
 
 static void xen_pv_teardown_msi_irqs(struct pci_dev *dev)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index cdb14a1ef268..a50ea79522f8 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -383,6 +383,13 @@ int arch_setup_msi_irq(struct pci_dev *dev, struct 
msi_desc *desc);
 void arch_teardown_msi_irq(unsigned int irq);
 int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
 void arch_teardown_msi_irqs(struct pci_dev *dev);
+#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
+
+/*
+ * Xen uses non-default msi_domain_ops and hence needs a way to populate sysfs
+ * entries of MSI IRQs.
+ */
+#if defined(CONFIG_PCI_XEN) || defined(CONFIG_PCI_MSI_ARCH_FALLBACKS)
 #ifdef CONFIG_SYSFS
 int msi_device_populate_sysfs(struct device *dev);
 void msi_device_destroy_sysfs(struct device *dev);
@@ -390,7 +397,7 @@ void msi_device_destroy_sysfs(struct device *dev);
 static inline int msi_device_populate_sysfs(struct device *dev) { return 0; }
 static inline void msi_device_destroy_sysfs(struct device *dev) { }
 #endif /* !CONFIG_SYSFS */
-#endif /* CONFIG_PCI_MSI_ARCH_FALLBACKS */
+#endif /* CONFIG_PCI_XEN || CONFIG_PCI_MSI_ARCH_FALLBACKS */
 
 /*
  * The restore hook is still available even for fully irq domain based
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 7a97bcb086bf..b4c31a5c1147 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -542,7 +542,7 @@ static int msi_sysfs_populate_desc(struct device *dev, 
struct msi_desc *desc)
return ret;
 }
 
-#ifdef CONFIG_PCI_MSI_ARCH_FALLBACKS
+#if defined(CONFIG_PCI_MSI_ARCH_FALLBACKS) || defined(CONFIG_PCI_XEN)
 /**
  * msi_device_populate_sysfs - Populate msi_irqs sysfs entries for a device
  * @dev:   The device (PCI, platform etc) which will get sysfs entries
@@ -574,7 +574,7 @@ void msi_device_destroy_sysfs(struct device *dev)
msi_for_each_desc(desc, dev, MSI_DESC_ALL)
msi_sysfs_remove_desc(dev, desc);
 }
-#endif /* CONFIG_PCI_MSI_ARCH_FALLBACK */
+#endif /* CONFIG_PCI_MSI_ARCH_FALLBACK || CONFIG_PCI_XEN */
 #else /* CONFIG_SYSFS */
 static inline int msi_sysfs_create_group(struct device *dev) { return 0; }
 static inline int msi_sysfs_populate_desc(struct device *dev, struct msi_desc 
*desc) { return 0; }
-- 
2.39.2




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






Re: [PATCH v2 1/3] xen-blkback: Advertise feature-persistent as user requested

2022-09-02 Thread Maximilian Heyne
On Fri, Sep 02, 2022 at 09:53:22AM +, Pratyush Yadav wrote:
> On 31/08/22 04:58PM, SeongJae Park wrote:
> > The advertisement of the persistent grants feature (writing
> > 'feature-persistent' to xenbus) should mean not the decision for using
> > the feature but only the availability of the feature.  However, commit
> > aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent
> > grants") made a field of blkback, which was a place for saving only the
> > negotiation result, to be used for yet another purpose: caching of the
> > 'feature_persistent' parameter value.  As a result, the advertisement,
> > which should follow only the parameter value, becomes inconsistent.
> > 
> > This commit fixes the misuse of the semantic by making blkback saves the
> > parameter value in a separate place and advertises the support based on
> > only the saved value.
> > 
> > Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of 
> > persistent grants")
> > Cc:  # 5.10.x
> > Suggested-by: Juergen Gross 
> > Signed-off-by: SeongJae Park 
> > ---
> >  drivers/block/xen-blkback/common.h | 3 +++
> >  drivers/block/xen-blkback/xenbus.c | 6 --
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/block/xen-blkback/common.h 
> > b/drivers/block/xen-blkback/common.h
> > index bda5c815e441..a28473470e66 100644
> > --- a/drivers/block/xen-blkback/common.h
> > +++ b/drivers/block/xen-blkback/common.h
> > @@ -226,6 +226,9 @@ struct xen_vbd {
> > sector_tsize;
> > unsigned intflush_support:1;
> > unsigned intdiscard_secure:1;
> > +   /* Connect-time cached feature_persistent parameter value */
> > +   unsigned intfeature_gnt_persistent_parm:1;
> 
> Continuing over from the previous version:
> 
> > > If feature_gnt_persistent_parm is always going to be equal to 
> > > feature_persistent, then why introduce it at all? Why not just use 
> > > feature_persistent directly? This way you avoid adding an extra flag 
> > > whose purpose is not immediately clear, and you also avoid all the 
> > > mess with setting this flag at the right time.
> >
> > Mainly because the parameter should read twice (once for 
> > advertisement, and once later just before the negotitation, for 
> > checking if we advertised or not), and the user might change the 
> > parameter value between the two reads.
> >
> > For the detailed available sequence of the race, you could refer to the 
> > prior conversation[1].
> >
> > [1] https://lore.kernel.org/linux-block/20200922111259.GJ19254@Air-de-Roger/
> 
> Okay, I see. Thanks for the pointer. But still, I think it would be 
> better to not maintain two copies of the value. How about doing:
> 
>   blkif->vbd.feature_gnt_persistent =
>   xenbus_read_unsigned(dev->nodename, "feature-persistent", 0) &&
>   xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
> 
> This makes it quite clear that we only enable persistent grants if 
> _both_ ends support it. We can do the same for blkfront.

Currently, the feature-persistent xenstore entry is written to from connect()
which is called after connect_ring(). So it's not available like this.  Perhaps
it's possible to delay the decision whether to use persistent grants until
connect().

> 
> > +   /* Persistent grants feature negotiation result */
> > unsigned intfeature_gnt_persistent:1;
> > unsigned intoverflow_max_grants:1;
> >  };
> > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > b/drivers/block/xen-blkback/xenbus.c
> > index ee7ad2fb432d..c0227dfa4688 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -907,7 +907,7 @@ static void connect(struct backend_info *be)
> > xen_blkbk_barrier(xbt, be, be->blkif->vbd.flush_support);
> > 
> > err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
> > -   be->blkif->vbd.feature_gnt_persistent);
> > +   be->blkif->vbd.feature_gnt_persistent_parm);
> > if (err) {
> > xenbus_dev_fatal(dev, err, "writing %s/feature-persistent",
> >  dev->nodename);
> > @@ -1085,7 +1085,9 @@ static int connect_ring(struct backend_info *be)
> > return -ENOSYS;
> > }
> > 
> > -   blkif->vbd.feature_gnt_persistent = feature_persistent &&
> > +   blkif->vbd.feature_gnt_persistent_parm = feature_persistent;
> > +   blkif->vbd.feature_gnt_persistent =
> > +   blkif->vbd.feature_gnt_persistent_parm &&
> > xenbus_read_unsigned(dev->otherend, "feature-persistent", 
> > 0);
> > 
> > blkif->vbd.overflow_max_grants = 0;
> > --
> > 2.25.1
> > 
> 
> -- 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen 

Re: [PATCH v2 0/3] xen-blk{front, back}: Fix the broken semantic and flow of feature-persistent

2022-09-02 Thread Maximilian Heyne
On Wed, Aug 31, 2022 at 04:58:21PM +, SeongJae Park wrote:
> Changes from v1
> (https://lore.kernel.org/xen-devel/20220825161511.94922-1...@kernel.org/)
> - Fix the wrong feature_persistent caching position of blkfront
> - Set blkfront's feature_persistent field setting with simple '&&'
>   instead of 'if' (Pratyush Yadav)
> 
> This patchset fixes misuse of the 'feature-persistent' advertisement
> semantic (patches 1 and 2), and the wrong timing of the
> 'feature_persistent' value caching, which made persistent grants feature
> always disabled.
> 
> SeongJae Park (3):
>   xen-blkback: Advertise feature-persistent as user requested
>   xen-blkfront: Advertise feature-persistent as user requested
>   xen-blkfront: Cache feature_persistent value before advertisement
> 
>  drivers/block/xen-blkback/common.h |  3 +++
>  drivers/block/xen-blkback/xenbus.c |  6 --
>  drivers/block/xen-blkfront.c   | 20 
>  3 files changed, 19 insertions(+), 10 deletions(-)
> 
> --
> 2.25.1
> 

I've tested this patch series in the following ways:
* Only applied the blkback patch but not the blkfront patches
* Only applied the blkfront patches but not the blkback patch
* Applied both

All scenarios worked, so

Tested-by: Maximilian Heyne 

Actually I also wanted to test changing feature_persistent and try reconnecting
but I don't know how this is done. If anyone has a pointer here, I could test
that as well.



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






Re: [PATCH 2/2] xen-blkfront: Advertise feature-persistent as user requested

2022-08-26 Thread Maximilian Heyne
On Thu, Aug 25, 2022 at 04:15:11PM +, SeongJae Park wrote:
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you can confirm the sender and know the 
> content is safe.
> 
> 
> 
> Commit e94c6101e151 ("xen-blkback: Apply 'feature_persistent' parameter
> when connect") made blkback to advertise its support of the persistent
> grants feature only if the user sets the 'feature_persistent' parameter
> of the driver and the frontend advertised its support of the feature.
> However, following commit 402c43ea6b34 ("xen-blkfront: Apply
> 'feature_persistent' parameter when connect") made the blkfront to work
> in the same way.  That is, blkfront also advertises its support of the
> persistent grants feature only if the user sets the 'feature_persistent'
> parameter of the driver and the backend advertised its support of the
> feature.
> 
> Hence blkback and blkfront will never advertise their support of the
> feature but wait until the other advertises the support, even though
> users set the 'feature_persistent' parameters of the drivers.  As a
> result, the persistent grants feature is disabled always regardless of
> the 'feature_persistent' values[1].
> 
> The problem comes from the misuse of the semantic of the advertisement
> of the feature.  The advertisement of the feature should means only
> availability of the feature not the decision for using the feature.
> However, current behavior is working in the wrong way.
> 
> This commit fixes the issue by making the blkfront advertises its
> support of the feature as user requested via 'feature_persistent'
> parameter regardless of the otherend's support of the feature.
> 
> [1] 
> https://lore.kernel.org/xen-devel/bd818aba-4857-bc07-dc8a-e9b2f8c5f...@suse.com/
> 
> Fixes: 402c43ea6b34 ("xen-blkfront: Apply 'feature_persistent' parameter when 
> connect")
> Cc:  # 5.10.x
> Reported-by: Marek Marczykowski-Górecki 
> Suggested-by: Juergen Gross 
> Signed-off-by: SeongJae Park 
> ---
>  drivers/block/xen-blkfront.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 8e56e69fb4c4..dfae08115450 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -213,6 +213,9 @@ struct blkfront_info
> unsigned int feature_fua:1;
> unsigned int feature_discard:1;
> unsigned int feature_secdiscard:1;
> +   /* Connect-time cached feature_persistent parameter */
> +   unsigned int feature_persistent_parm:1;
> +   /* Persistent grants feature negotiation result */
> unsigned int feature_persistent:1;
> unsigned int bounce:1;
> unsigned int discard_granularity;
> @@ -1848,7 +1851,7 @@ static int talk_to_blkback(struct xenbus_device *dev,
> goto abort_transaction;
> }
> err = xenbus_printf(xbt, dev->nodename, "feature-persistent", "%u",
> -   info->feature_persistent);
> +   info->feature_persistent_parm);
> if (err)
> dev_warn(>dev,
>  "writing persistent grants feature to xenbus");
> @@ -2281,7 +2284,8 @@ static void blkfront_gather_backend_features(struct 
> blkfront_info *info)
> if (xenbus_read_unsigned(info->xbdev->otherend, "feature-discard", 0))
> blkfront_setup_discard(info);
> 
> -   if (feature_persistent)
> +   info->feature_persistent_parm = feature_persistent;

I think setting this here is too late because "feature-persistent" was already
written to xenstore via talk_to_blkback but with default 0. So during the
connect blkback will not see that the guest supports the feature and falls back
to no persistent grants.

Tested only this patch with some hacky dom0 kernel that doesn't have the patch
from your series yet. Will do more testing next week.

> +   if (info->feature_persistent_parm)
> info->feature_persistent =
> !!xenbus_read_unsigned(info->xbdev->otherend,
>"feature-persistent", 0);
> --
> 2.25.1
> 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






Re: [PATCH v4 0/3] xen-blk{back, front}: Fix two bugs in 'feature_persistent'

2022-08-01 Thread Maximilian Heyne
On Fri, Jul 15, 2022 at 10:51:05PM +, SeongJae Park wrote:
> 
> Introduction of 'feature_persistent' made two bugs.  First one is wrong
> overwrite of 'vbd->feature_gnt_persistent' in 'blkback' due to wrong
> parameter value caching position, and the second one is unintended
> behavioral change that could break previous dynamic frontend/backend
> persistent feature support changes.  This patchset fixes the issues.
> 
> Changes from v3
> (https://lore.kernel.org/xen-devel/20220715175521.126649-1...@kernel.org/)
> - Split 'blkback' patch for each of the two issues
> - Add 'Reported-by: Andrii Chepurnyi '
> 
> Changes from v2
> (https://lore.kernel.org/xen-devel/20220714224410.51147-1...@kernel.org/)
> - Keep the behavioral change of v1
> - Update blkfront's counterpart to follow the changed behavior
> - Update documents for the changed behavior
> 
> Changes from v1
> (https://lore.kernel.org/xen-devel/20220106091013.126076-1-mhe...@amazon.de/)
> - Avoid the behavioral change
>   (https://lore.kernel.org/xen-devel/20220121102309.27802-1...@kernel.org/)
> - Rebase on latest xen/tip/linux-next
> - Re-work by SeongJae Park 
> - Cc stable@
> 
> Maximilian Heyne (1):
>   xen-blkback: Apply 'feature_persistent' parameter when connect
> 
> SeongJae Park (2):
>   xen-blkback: fix persistent grants negotiation
>   xen-blkfront: Apply 'feature_persistent' parameter when connect
> 
>  .../ABI/testing/sysfs-driver-xen-blkback  |  2 +-
>  .../ABI/testing/sysfs-driver-xen-blkfront |  2 +-
>  drivers/block/xen-blkback/xenbus.c| 20 ---
>  drivers/block/xen-blkfront.c  |  4 +---
>  4 files changed, 11 insertions(+), 17 deletions(-)
> 
> --
> 2.25.1
> 

Changes look good to me. Thank you for reworking my patch and also fixing the
blkfront driver.

Reviewed-by: Maximilian Heyne 



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH] x86: xen: remove STACK_FRAME_NON_STANDARD from xen_cpuid

2022-05-17 Thread Maximilian Heyne
Since commit 4d65adfcd119 ("x86: xen: insn: Decode Xen and KVM
emulate-prefix signature"), objtool is able to correctly parse the
prefixed instruction in xen_cpuid and emit correct orc unwind
information. Hence, marking the function as STACKFRAME_NON_STANDARD is
no longer needed.

This commit is basically a revert of commit 983bb6d254c7 ("x86/xen: Mark
xen_cpuid() stack frame as non-standard").

Signed-off-by: Maximilian Heyne 
CC: Josh Poimboeuf 

cr: https://code.amazon.com/reviews/CR-69645080
---
 arch/x86/xen/enlighten_pv.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 5038edb79ad5..ca85d1409917 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -30,7 +30,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -165,7 +164,6 @@ static void xen_cpuid(unsigned int *ax, unsigned int *bx,
 
*bx &= maskebx;
 }
-STACK_FRAME_NON_STANDARD(xen_cpuid); /* XEN_EMULATE_PREFIX */
 
 static bool __init xen_check_mwait(void)
 {
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 4/4] x86: kprobes: Prohibit probing on instruction which has emulate prefix

2022-05-12 Thread Maximilian Heyne
From: Masami Hiramatsu 

commit 004e8dce9c5595697951f7cd0e9f66b35c92265e upstream

Prohibit probing on instruction which has XEN_EMULATE_PREFIX
or KVM's emulate prefix. Since that prefix is a marker for Xen
and KVM, if we modify the marker by kprobe's int3, that doesn't
work as expected.

Signed-off-by: Masami Hiramatsu 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: Boris Ostrovsky 
Cc: Ingo Molnar 
Cc: Stefano Stabellini 
Cc: Andrew Cooper 
Cc: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap 
Cc: Josh Poimboeuf 
Link: 
https://lkml.kernel.org/r/156777566048.25081.6296162369492175325.stgit@devnote2
Signed-off-by: Maximilian Heyne 
Cc: sta...@vger.kernel.org # 5.4.x
---
 arch/x86/kernel/kprobes/core.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index c205d77d57da..3700dc94847c 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -358,6 +358,10 @@ int __copy_instruction(u8 *dest, u8 *src, u8 *real, struct 
insn *insn)
kernel_insn_init(insn, dest, MAX_INSN_SIZE);
insn_get_length(insn);
 
+   /* We can not probe force emulate prefixed instruction */
+   if (insn_has_emulate_prefix(insn))
+   return 0;
+
/* Another subsystem puts a breakpoint, failed to recover */
if (insn->opcode.bytes[0] == BREAKPOINT_INSTRUCTION)
return 0;
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 3/4] x86: xen: insn: Decode Xen and KVM emulate-prefix signature

2022-05-12 Thread Maximilian Heyne
From: Masami Hiramatsu 

commit 4d65adfcd1196818659d3bd9b42dccab291e1751 upstream

Decode Xen and KVM's emulate-prefix signature by x86 insn decoder.
It is called "prefix" but actually not x86 instruction prefix, so
this adds insn.emulate_prefix_size field instead of reusing
insn.prefixes.

If x86 decoder finds a special sequence of instructions of
XEN_EMULATE_PREFIX and 'ud2a; .ascii "kvm"', it just counts the
length, set insn.emulate_prefix_size and fold it with the next
instruction. In other words, the signature and the next instruction
is treated as a single instruction.

Signed-off-by: Masami Hiramatsu 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Josh Poimboeuf 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: Boris Ostrovsky 
Cc: Ingo Molnar 
Cc: Stefano Stabellini 
Cc: Andrew Cooper 
Cc: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap 
Link: 
https://lkml.kernel.org/r/156777564986.25081.4964537658500952557.stgit@devnote2
[mheyne: resolved contextual conflict in tools/objtools/sync-check.sh]
Signed-off-by: Maximilian Heyne 
Cc: sta...@vger.kernel.org # 5.4.x
---
 arch/x86/include/asm/insn.h |  6 
 arch/x86/lib/insn.c | 34 +
 tools/arch/x86/include/asm/emulate_prefix.h | 14 +
 tools/arch/x86/include/asm/insn.h   |  6 
 tools/arch/x86/lib/insn.c   | 34 +
 tools/objtool/sync-check.sh |  3 +-
 tools/perf/check-headers.sh |  3 +-
 7 files changed, 98 insertions(+), 2 deletions(-)
 create mode 100644 tools/arch/x86/include/asm/emulate_prefix.h

diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index a51ffeea6d87..a8c3d284fa46 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -45,6 +45,7 @@ struct insn {
struct insn_field immediate2;   /* for 64bit imm or seg16 */
};
 
+   int emulate_prefix_size;
insn_attr_t attr;
unsigned char opnd_bytes;
unsigned char addr_bytes;
@@ -128,6 +129,11 @@ static inline int insn_is_evex(struct insn *insn)
return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_has_emulate_prefix(struct insn *insn)
+{
+   return !!insn->emulate_prefix_size;
+}
+
 /* Ensure this instruction is decoded completely */
 static inline int insn_complete(struct insn *insn)
 {
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 0b5862ba6a75..404279563891 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 
+#include 
+
 /* Verify next sizeof(t) bytes can be on the same instruction */
 #define validate_next(t, insn, n)  \
((insn)->next_byte + sizeof(t) + n <= (insn)->end_kaddr)
@@ -58,6 +60,36 @@ void insn_init(struct insn *insn, const void *kaddr, int 
buf_len, int x86_64)
insn->addr_bytes = 4;
 }
 
+static const insn_byte_t xen_prefix[] = { __XEN_EMULATE_PREFIX };
+static const insn_byte_t kvm_prefix[] = { __KVM_EMULATE_PREFIX };
+
+static int __insn_get_emulate_prefix(struct insn *insn,
+const insn_byte_t *prefix, size_t len)
+{
+   size_t i;
+
+   for (i = 0; i < len; i++) {
+   if (peek_nbyte_next(insn_byte_t, insn, i) != prefix[i])
+   goto err_out;
+   }
+
+   insn->emulate_prefix_size = len;
+   insn->next_byte += len;
+
+   return 1;
+
+err_out:
+   return 0;
+}
+
+static void insn_get_emulate_prefix(struct insn *insn)
+{
+   if (__insn_get_emulate_prefix(insn, xen_prefix, sizeof(xen_prefix)))
+   return;
+
+   __insn_get_emulate_prefix(insn, kvm_prefix, sizeof(kvm_prefix));
+}
+
 /**
  * insn_get_prefixes - scan x86 instruction prefix bytes
  * @insn:   insn containing instruction
@@ -76,6 +108,8 @@ void insn_get_prefixes(struct insn *insn)
if (prefixes->got)
return;
 
+   insn_get_emulate_prefix(insn);
+
nb = 0;
lb = 0;
b = peek_next(insn_byte_t, insn);
diff --git a/tools/arch/x86/include/asm/emulate_prefix.h 
b/tools/arch/x86/include/asm/emulate_prefix.h
new file mode 100644
index ..70f5b98a5286
--- /dev/null
+++ b/tools/arch/x86/include/asm/emulate_prefix.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_EMULATE_PREFIX_H
+#define _ASM_X86_EMULATE_PREFIX_H
+
+/*
+ * Virt escape sequences to trigger instruction emulation;
+ * ideally these would decode to 'whole' instruction and not destroy
+ * the instruction stream; sadly this is not true for the 'kvm' one :/
+ */
+
+#define __XEN_EMULATE_PREFIX  0x0f,0x0b,0x78,0x65,0x6e  /* ud2 ; .ascii "xen" 
*/
+#define __KVM_EMULATE_PREFIX  0x0f,0x0b,0x6b,0x76,0x6d /* ud2 ; .ascii "kvm" */
+
+#endif
diff --git a/tools/arch/x86/include/asm/insn.h 
b/tools/arch/x86/include/asm/insn.h
index d7f0ae8f3

[PATCH 2/4] x86: xen: kvm: Gather the definition of emulate prefixes

2022-05-12 Thread Maximilian Heyne
From: Masami Hiramatsu 

commit b3dc0695fa40c3b280230fb6fb7fb7a94ce28bf4 upstream

Gather the emulate prefixes, which forcibly make the following
instruction emulated on virtualization, in one place.

Suggested-by: Peter Zijlstra 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: Ingo Molnar 
Cc: Boris Ostrovsky 
Cc: Andrew Cooper 
Cc: Stefano Stabellini 
Cc: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap 
Cc: Josh Poimboeuf 
Link: 
https://lkml.kernel.org/r/156777563917.25081.7286628561790289995.stgit@devnote2
Signed-off-by: Maximilian Heyne 
Cc: sta...@vger.kernel.org # 5.4.x
---
 arch/x86/include/asm/emulate_prefix.h | 14 ++
 arch/x86/include/asm/xen/interface.h  | 11 ---
 arch/x86/kvm/x86.c|  4 +++-
 3 files changed, 21 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/include/asm/emulate_prefix.h

diff --git a/arch/x86/include/asm/emulate_prefix.h 
b/arch/x86/include/asm/emulate_prefix.h
new file mode 100644
index ..70f5b98a5286
--- /dev/null
+++ b/arch/x86/include/asm/emulate_prefix.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_EMULATE_PREFIX_H
+#define _ASM_X86_EMULATE_PREFIX_H
+
+/*
+ * Virt escape sequences to trigger instruction emulation;
+ * ideally these would decode to 'whole' instruction and not destroy
+ * the instruction stream; sadly this is not true for the 'kvm' one :/
+ */
+
+#define __XEN_EMULATE_PREFIX  0x0f,0x0b,0x78,0x65,0x6e  /* ud2 ; .ascii "xen" 
*/
+#define __KVM_EMULATE_PREFIX  0x0f,0x0b,0x6b,0x76,0x6d /* ud2 ; .ascii "kvm" */
+
+#endif
diff --git a/arch/x86/include/asm/xen/interface.h 
b/arch/x86/include/asm/xen/interface.h
index 62ca03ef5c65..9139b3e86316 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -379,12 +379,9 @@ struct xen_pmu_arch {
  * Prefix forces emulation of some non-trapping instructions.
  * Currently only CPUID.
  */
-#ifdef __ASSEMBLY__
-#define XEN_EMULATE_PREFIX .byte 0x0f,0x0b,0x78,0x65,0x6e ;
-#define XEN_CPUID  XEN_EMULATE_PREFIX cpuid
-#else
-#define XEN_EMULATE_PREFIX ".byte 0x0f,0x0b,0x78,0x65,0x6e ; "
-#define XEN_CPUID  XEN_EMULATE_PREFIX "cpuid"
-#endif
+#include 
+
+#define XEN_EMULATE_PREFIX __ASM_FORM(.byte __XEN_EMULATE_PREFIX ;)
+#define XEN_CPUID  XEN_EMULATE_PREFIX __ASM_FORM(cpuid)
 
 #endif /* _ASM_X86_XEN_INTERFACE_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f7dfa5aa42d..6dd77e426889 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -68,6 +68,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define CREATE_TRACE_POINTS
@@ -5583,6 +5584,7 @@ EXPORT_SYMBOL_GPL(kvm_write_guest_virt_system);
 
 int handle_ud(struct kvm_vcpu *vcpu)
 {
+   static const char kvm_emulate_prefix[] = { __KVM_EMULATE_PREFIX };
int emul_type = EMULTYPE_TRAP_UD;
char sig[5]; /* ud2; .ascii "kvm" */
struct x86_exception e;
@@ -5590,7 +5592,7 @@ int handle_ud(struct kvm_vcpu *vcpu)
if (force_emulation_prefix &&
kvm_read_guest_virt(vcpu, kvm_get_linear_rip(vcpu),
sig, sizeof(sig), ) == 0 &&
-   memcmp(sig, "\xf\xbkvm", sizeof(sig)) == 0) {
+   memcmp(sig, kvm_emulate_prefix, sizeof(sig)) == 0) {
kvm_rip_write(vcpu, kvm_rip_read(vcpu) + sizeof(sig));
emul_type = EMULTYPE_TRAP_UD_FORCED;
}
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 0/4] x86: decode Xen/KVM emulate prefixes

2022-05-12 Thread Maximilian Heyne
This is a backport of a patch series for 5.4.x.

The patch series allows the x86 decoder to decode the Xen and KVM emulate
prefixes.

In particular this solves the following issue that appeared when commit
db6c6a0df840 ("objtool: Fix noreturn detection for ignored functions") was
backported to 5.4.69:

  arch/x86/xen/enlighten_pv.o: warning: objtool: xen_cpuid()+0x25: can't find 
jump dest instruction at .text+0x9c

Also now that this decoding is possible, also backport the commit which prevents
kprobes on probing such prefixed instructions. This was also part of the
original series.

The series applied mostly cleanly on 5.4.192 except for a contextual problem in
the 3rd patch ("x86: xen: insn: Decode Xen and KVM emulate-prefix signature").

Masami Hiramatsu (4):
  x86/asm: Allow to pass macros to __ASM_FORM()
  x86: xen: kvm: Gather the definition of emulate prefixes
  x86: xen: insn: Decode Xen and KVM emulate-prefix signature
  x86: kprobes: Prohibit probing on instruction which has emulate prefix

 arch/x86/include/asm/asm.h  |  8 +++--
 arch/x86/include/asm/emulate_prefix.h   | 14 +
 arch/x86/include/asm/insn.h |  6 
 arch/x86/include/asm/xen/interface.h| 11 +++
 arch/x86/kernel/kprobes/core.c  |  4 +++
 arch/x86/kvm/x86.c  |  4 ++-
 arch/x86/lib/insn.c | 34 +
 tools/arch/x86/include/asm/emulate_prefix.h | 14 +
 tools/arch/x86/include/asm/insn.h   |  6 
 tools/arch/x86/lib/insn.c   | 34 +
 tools/objtool/sync-check.sh |  3 +-
 tools/perf/check-headers.sh |  3 +-
 12 files changed, 128 insertions(+), 13 deletions(-)
 create mode 100644 arch/x86/include/asm/emulate_prefix.h
 create mode 100644 tools/arch/x86/include/asm/emulate_prefix.h


base-commit: 1d72b776f6dc973211f5d153453cf8955fb3d70a
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 1/4] x86/asm: Allow to pass macros to __ASM_FORM()

2022-05-12 Thread Maximilian Heyne
From: Masami Hiramatsu 

commit f7919fd943abf0c77aed4441ea9897a323d132f5 upstream

Use __stringify() at __ASM_FORM() so that user can pass
code including macros to __ASM_FORM().

Signed-off-by: Masami Hiramatsu 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: Boris Ostrovsky 
Cc: Ingo Molnar 
Cc: Stefano Stabellini 
Cc: Andrew Cooper 
Cc: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap 
Cc: Josh Poimboeuf 
Link: 
https://lkml.kernel.org/r/156777562873.25081.2288083344657460959.stgit@devnote2
Signed-off-by: Maximilian Heyne 
Cc: sta...@vger.kernel.org # 5.4.x
---
 arch/x86/include/asm/asm.h | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 3ff577c0b102..1b563f9167ea 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -7,9 +7,11 @@
 # define __ASM_FORM_RAW(x) x
 # define __ASM_FORM_COMMA(x) x,
 #else
-# define __ASM_FORM(x) " " #x " "
-# define __ASM_FORM_RAW(x) #x
-# define __ASM_FORM_COMMA(x) " " #x ","
+#include 
+
+# define __ASM_FORM(x) " " __stringify(x) " "
+# define __ASM_FORM_RAW(x) __stringify(x)
+# define __ASM_FORM_COMMA(x) " " __stringify(x) ","
 #endif
 
 #ifndef __x86_64__
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 4/4] x86: kprobes: Prohibit probing on instruction which has emulate prefix

2022-05-10 Thread Maximilian Heyne
From: Masami Hiramatsu 

commit 004e8dce9c5595697951f7cd0e9f66b35c92265e upstream

Prohibit probing on instruction which has XEN_EMULATE_PREFIX
or KVM's emulate prefix. Since that prefix is a marker for Xen
and KVM, if we modify the marker by kprobe's int3, that doesn't
work as expected.

Signed-off-by: Masami Hiramatsu 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: Boris Ostrovsky 
Cc: Ingo Molnar 
Cc: Stefano Stabellini 
Cc: Andrew Cooper 
Cc: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap 
Cc: Josh Poimboeuf 
Link: 
https://lkml.kernel.org/r/156777566048.25081.6296162369492175325.stgit@devnote2
Signed-off-by: Maximilian Heyne 
Cc: sta...@vger.kernel.org # 5.4.x
---
 arch/x86/kernel/kprobes/core.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index c205d77d57da..3700dc94847c 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -358,6 +358,10 @@ int __copy_instruction(u8 *dest, u8 *src, u8 *real, struct 
insn *insn)
kernel_insn_init(insn, dest, MAX_INSN_SIZE);
insn_get_length(insn);
 
+   /* We can not probe force emulate prefixed instruction */
+   if (insn_has_emulate_prefix(insn))
+   return 0;
+
/* Another subsystem puts a breakpoint, failed to recover */
if (insn->opcode.bytes[0] == BREAKPOINT_INSTRUCTION)
return 0;
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 2/4] x86: xen: kvm: Gather the definition of emulate prefixes

2022-05-10 Thread Maximilian Heyne
From: Masami Hiramatsu 

commit b3dc0695fa40c3b280230fb6fb7fb7a94ce28bf4 upstream

Gather the emulate prefixes, which forcibly make the following
instruction emulated on virtualization, in one place.

Suggested-by: Peter Zijlstra 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: Ingo Molnar 
Cc: Boris Ostrovsky 
Cc: Andrew Cooper 
Cc: Stefano Stabellini 
Cc: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap 
Cc: Josh Poimboeuf 
Link: 
https://lkml.kernel.org/r/156777563917.25081.7286628561790289995.stgit@devnote2
Signed-off-by: Maximilian Heyne 
Cc: sta...@vger.kernel.org # 5.4.x
---
 arch/x86/include/asm/emulate_prefix.h | 14 ++
 arch/x86/include/asm/xen/interface.h  | 11 ---
 arch/x86/kvm/x86.c|  4 +++-
 3 files changed, 21 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/include/asm/emulate_prefix.h

diff --git a/arch/x86/include/asm/emulate_prefix.h 
b/arch/x86/include/asm/emulate_prefix.h
new file mode 100644
index ..70f5b98a5286
--- /dev/null
+++ b/arch/x86/include/asm/emulate_prefix.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_EMULATE_PREFIX_H
+#define _ASM_X86_EMULATE_PREFIX_H
+
+/*
+ * Virt escape sequences to trigger instruction emulation;
+ * ideally these would decode to 'whole' instruction and not destroy
+ * the instruction stream; sadly this is not true for the 'kvm' one :/
+ */
+
+#define __XEN_EMULATE_PREFIX  0x0f,0x0b,0x78,0x65,0x6e  /* ud2 ; .ascii "xen" 
*/
+#define __KVM_EMULATE_PREFIX  0x0f,0x0b,0x6b,0x76,0x6d /* ud2 ; .ascii "kvm" */
+
+#endif
diff --git a/arch/x86/include/asm/xen/interface.h 
b/arch/x86/include/asm/xen/interface.h
index 62ca03ef5c65..9139b3e86316 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -379,12 +379,9 @@ struct xen_pmu_arch {
  * Prefix forces emulation of some non-trapping instructions.
  * Currently only CPUID.
  */
-#ifdef __ASSEMBLY__
-#define XEN_EMULATE_PREFIX .byte 0x0f,0x0b,0x78,0x65,0x6e ;
-#define XEN_CPUID  XEN_EMULATE_PREFIX cpuid
-#else
-#define XEN_EMULATE_PREFIX ".byte 0x0f,0x0b,0x78,0x65,0x6e ; "
-#define XEN_CPUID  XEN_EMULATE_PREFIX "cpuid"
-#endif
+#include 
+
+#define XEN_EMULATE_PREFIX __ASM_FORM(.byte __XEN_EMULATE_PREFIX ;)
+#define XEN_CPUID  XEN_EMULATE_PREFIX __ASM_FORM(cpuid)
 
 #endif /* _ASM_X86_XEN_INTERFACE_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1f7dfa5aa42d..6dd77e426889 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -68,6 +68,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define CREATE_TRACE_POINTS
@@ -5583,6 +5584,7 @@ EXPORT_SYMBOL_GPL(kvm_write_guest_virt_system);
 
 int handle_ud(struct kvm_vcpu *vcpu)
 {
+   static const char kvm_emulate_prefix[] = { __KVM_EMULATE_PREFIX };
int emul_type = EMULTYPE_TRAP_UD;
char sig[5]; /* ud2; .ascii "kvm" */
struct x86_exception e;
@@ -5590,7 +5592,7 @@ int handle_ud(struct kvm_vcpu *vcpu)
if (force_emulation_prefix &&
kvm_read_guest_virt(vcpu, kvm_get_linear_rip(vcpu),
sig, sizeof(sig), ) == 0 &&
-   memcmp(sig, "\xf\xbkvm", sizeof(sig)) == 0) {
+   memcmp(sig, kvm_emulate_prefix, sizeof(sig)) == 0) {
kvm_rip_write(vcpu, kvm_rip_read(vcpu) + sizeof(sig));
emul_type = EMULTYPE_TRAP_UD_FORCED;
}
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 0/4] x86: decode Xen/KVM emulate prefixes

2022-05-10 Thread Maximilian Heyne
This is a backport of a patch series for 5.4.x.

The patch series allows the x86 decoder to decode the Xen and KVM emulate
prefixes.

In particular this solves the following issue that appeared when commit
db6c6a0df840 ("objtool: Fix noreturn detection for ignored functions") was
backported to 5.4.69:

  arch/x86/xen/enlighten_pv.o: warning: objtool: xen_cpuid()+0x25: can't find 
jump dest instruction at .text+0x9c

Also now that this decoding is possible, also backport the commit which prevents
kprobes on probing such prefixed instructions. This was also part of the
original series.

The series applied mostly cleanly on 5.4.192 except for a contextual problem in
the 3rd patch ("x86: xen: insn: Decode Xen and KVM emulate-prefix signature").

Masami Hiramatsu (4):
  x86/asm: Allow to pass macros to __ASM_FORM()
  x86: xen: kvm: Gather the definition of emulate prefixes
  x86: xen: insn: Decode Xen and KVM emulate-prefix signature
  x86: kprobes: Prohibit probing on instruction which has emulate prefix

 arch/x86/include/asm/asm.h  |  8 +++--
 arch/x86/include/asm/emulate_prefix.h   | 14 +
 arch/x86/include/asm/insn.h |  6 
 arch/x86/include/asm/xen/interface.h| 11 +++
 arch/x86/kernel/kprobes/core.c  |  4 +++
 arch/x86/kvm/x86.c  |  4 ++-
 arch/x86/lib/insn.c | 34 +
 tools/arch/x86/include/asm/emulate_prefix.h | 14 +
 tools/arch/x86/include/asm/insn.h   |  6 
 tools/arch/x86/lib/insn.c   | 34 +
 tools/objtool/sync-check.sh |  3 +-
 tools/perf/check-headers.sh |  3 +-
 12 files changed, 128 insertions(+), 13 deletions(-)
 create mode 100644 arch/x86/include/asm/emulate_prefix.h
 create mode 100644 tools/arch/x86/include/asm/emulate_prefix.h


base-commit: 1d72b776f6dc973211f5d153453cf8955fb3d70a
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 1/4] x86/asm: Allow to pass macros to __ASM_FORM()

2022-05-10 Thread Maximilian Heyne
From: Masami Hiramatsu 

commit f7919fd943abf0c77aed4441ea9897a323d132f5 upstream

Use __stringify() at __ASM_FORM() so that user can pass
code including macros to __ASM_FORM().

Signed-off-by: Masami Hiramatsu 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: Boris Ostrovsky 
Cc: Ingo Molnar 
Cc: Stefano Stabellini 
Cc: Andrew Cooper 
Cc: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap 
Cc: Josh Poimboeuf 
Link: 
https://lkml.kernel.org/r/156777562873.25081.2288083344657460959.stgit@devnote2
Signed-off-by: Maximilian Heyne 
Cc: sta...@vger.kernel.org # 5.4.x
---
 arch/x86/include/asm/asm.h | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
index 3ff577c0b102..1b563f9167ea 100644
--- a/arch/x86/include/asm/asm.h
+++ b/arch/x86/include/asm/asm.h
@@ -7,9 +7,11 @@
 # define __ASM_FORM_RAW(x) x
 # define __ASM_FORM_COMMA(x) x,
 #else
-# define __ASM_FORM(x) " " #x " "
-# define __ASM_FORM_RAW(x) #x
-# define __ASM_FORM_COMMA(x) " " #x ","
+#include 
+
+# define __ASM_FORM(x) " " __stringify(x) " "
+# define __ASM_FORM_RAW(x) __stringify(x)
+# define __ASM_FORM_COMMA(x) " " __stringify(x) ","
 #endif
 
 #ifndef __x86_64__
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH 3/4] x86: xen: insn: Decode Xen and KVM emulate-prefix signature

2022-05-10 Thread Maximilian Heyne
From: Masami Hiramatsu 

commit 4d65adfcd1196818659d3bd9b42dccab291e1751 upstream

Decode Xen and KVM's emulate-prefix signature by x86 insn decoder.
It is called "prefix" but actually not x86 instruction prefix, so
this adds insn.emulate_prefix_size field instead of reusing
insn.prefixes.

If x86 decoder finds a special sequence of instructions of
XEN_EMULATE_PREFIX and 'ud2a; .ascii "kvm"', it just counts the
length, set insn.emulate_prefix_size and fold it with the next
instruction. In other words, the signature and the next instruction
is treated as a single instruction.

Signed-off-by: Masami Hiramatsu 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Josh Poimboeuf 
Cc: Juergen Gross 
Cc: x...@kernel.org
Cc: Boris Ostrovsky 
Cc: Ingo Molnar 
Cc: Stefano Stabellini 
Cc: Andrew Cooper 
Cc: Borislav Petkov 
Cc: xen-devel@lists.xenproject.org
Cc: Randy Dunlap 
Link: 
https://lkml.kernel.org/r/156777564986.25081.4964537658500952557.stgit@devnote2
[mheyne: resolved contextual conflict in tools/objtools/sync-check.sh]
Signed-off-by: Maximilian Heyne 
Cc: sta...@vger.kernel.org # 5.4.x
---
 arch/x86/include/asm/insn.h |  6 
 arch/x86/lib/insn.c | 34 +
 tools/arch/x86/include/asm/emulate_prefix.h | 14 +
 tools/arch/x86/include/asm/insn.h   |  6 
 tools/arch/x86/lib/insn.c   | 34 +
 tools/objtool/sync-check.sh |  3 +-
 tools/perf/check-headers.sh |  3 +-
 7 files changed, 98 insertions(+), 2 deletions(-)
 create mode 100644 tools/arch/x86/include/asm/emulate_prefix.h

diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index a51ffeea6d87..a8c3d284fa46 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -45,6 +45,7 @@ struct insn {
struct insn_field immediate2;   /* for 64bit imm or seg16 */
};
 
+   int emulate_prefix_size;
insn_attr_t attr;
unsigned char opnd_bytes;
unsigned char addr_bytes;
@@ -128,6 +129,11 @@ static inline int insn_is_evex(struct insn *insn)
return (insn->vex_prefix.nbytes == 4);
 }
 
+static inline int insn_has_emulate_prefix(struct insn *insn)
+{
+   return !!insn->emulate_prefix_size;
+}
+
 /* Ensure this instruction is decoded completely */
 static inline int insn_complete(struct insn *insn)
 {
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 0b5862ba6a75..404279563891 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -13,6 +13,8 @@
 #include 
 #include 
 
+#include 
+
 /* Verify next sizeof(t) bytes can be on the same instruction */
 #define validate_next(t, insn, n)  \
((insn)->next_byte + sizeof(t) + n <= (insn)->end_kaddr)
@@ -58,6 +60,36 @@ void insn_init(struct insn *insn, const void *kaddr, int 
buf_len, int x86_64)
insn->addr_bytes = 4;
 }
 
+static const insn_byte_t xen_prefix[] = { __XEN_EMULATE_PREFIX };
+static const insn_byte_t kvm_prefix[] = { __KVM_EMULATE_PREFIX };
+
+static int __insn_get_emulate_prefix(struct insn *insn,
+const insn_byte_t *prefix, size_t len)
+{
+   size_t i;
+
+   for (i = 0; i < len; i++) {
+   if (peek_nbyte_next(insn_byte_t, insn, i) != prefix[i])
+   goto err_out;
+   }
+
+   insn->emulate_prefix_size = len;
+   insn->next_byte += len;
+
+   return 1;
+
+err_out:
+   return 0;
+}
+
+static void insn_get_emulate_prefix(struct insn *insn)
+{
+   if (__insn_get_emulate_prefix(insn, xen_prefix, sizeof(xen_prefix)))
+   return;
+
+   __insn_get_emulate_prefix(insn, kvm_prefix, sizeof(kvm_prefix));
+}
+
 /**
  * insn_get_prefixes - scan x86 instruction prefix bytes
  * @insn:   insn containing instruction
@@ -76,6 +108,8 @@ void insn_get_prefixes(struct insn *insn)
if (prefixes->got)
return;
 
+   insn_get_emulate_prefix(insn);
+
nb = 0;
lb = 0;
b = peek_next(insn_byte_t, insn);
diff --git a/tools/arch/x86/include/asm/emulate_prefix.h 
b/tools/arch/x86/include/asm/emulate_prefix.h
new file mode 100644
index ..70f5b98a5286
--- /dev/null
+++ b/tools/arch/x86/include/asm/emulate_prefix.h
@@ -0,0 +1,14 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_EMULATE_PREFIX_H
+#define _ASM_X86_EMULATE_PREFIX_H
+
+/*
+ * Virt escape sequences to trigger instruction emulation;
+ * ideally these would decode to 'whole' instruction and not destroy
+ * the instruction stream; sadly this is not true for the 'kvm' one :/
+ */
+
+#define __XEN_EMULATE_PREFIX  0x0f,0x0b,0x78,0x65,0x6e  /* ud2 ; .ascii "xen" 
*/
+#define __KVM_EMULATE_PREFIX  0x0f,0x0b,0x6b,0x76,0x6d /* ud2 ; .ascii "kvm" */
+
+#endif
diff --git a/tools/arch/x86/include/asm/insn.h 
b/tools/arch/x86/include/asm/insn.h
index d7f0ae8f3

[PATCH] xen, blkback: fix persistent grants negotiation

2022-01-06 Thread Maximilian Heyne
Given dom0 supports persistent grants but the guest does not.
Then, when attaching a block device during runtime of the guest, dom0
will enable persistent grants for this newly attached block device:

  $ xenstore-ls -f | grep 20674 | grep persistent
  /local/domain/0/backend/vbd/20674/768/feature-persistent = "0"
  /local/domain/0/backend/vbd/20674/51792/feature-persistent = "1"

Here disk 768 was attached during guest creation while 51792 was
attached at runtime. If the guest would have advertised the persistent
grant feature, there would be a xenstore entry like:

  /local/domain/20674/device/vbd/51792/feature-persistent = "1"

Persistent grants are also used when the guest tries to access the disk
which can be seen when enabling log stats:

  $ echo 1 > /sys/module/xen_blkback/parameters/log_stats
  $ dmesg
  xen-blkback: (20674.xvdf-0): oo   0  |  rd0  |  wr0  |  f0 |  ds  
  0 | pg:1/1056

The "pg: 1/1056" shows that one persistent grant is used.

Before commit aac8a70db24b ("xen-blkback: add a parameter for disabling
of persistent grants") vbd->feature_gnt_persistent was set in
connect_ring. After the commit it was intended to be initialized in
xen_vbd_create and then set according to the guest feature availability
in connect_ring. However, with a running guest, connect_ring might be
called before xen_vbd_create and vbd->feature_gnt_persistent will be
incorrectly initialized. xen_vbd_create will overwrite it with the value
of feature_persistent regardless whether the guest actually supports
persistent grants.

With this commit, vbd->feature_gnt_persistent is set only in
connect_ring and this is the only use of the module parameter
feature_persistent. This avoids races when the module parameter changes
during the block attachment process.

Note that vbd->feature_gnt_persistent doesn't need to be initialized in
xen_vbd_create. It's next use is in connect which can only be called
once connect_ring has initialized the rings. xen_update_blkif_status is
checking for this.

Fixes: aac8a70db24b ("xen-blkback: add a parameter for disabling of persistent 
grants")
Signed-off-by: Maximilian Heyne 
---
 drivers/block/xen-blkback/xenbus.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 914587aabca0c..51b6ec0380ca4 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -522,8 +522,6 @@ static int xen_vbd_create(struct xen_blkif *blkif, 
blkif_vdev_t handle,
if (q && blk_queue_secure_erase(q))
vbd->discard_secure = true;
 
-   vbd->feature_gnt_persistent = feature_persistent;
-
pr_debug("Successful creation of handle=%04x (dom=%u)\n",
handle, blkif->domid);
return 0;
@@ -1090,10 +1088,9 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
-   if (blkif->vbd.feature_gnt_persistent)
-   blkif->vbd.feature_gnt_persistent =
-   xenbus_read_unsigned(dev->otherend,
-   "feature-persistent", 0);
+
+   blkif->vbd.feature_gnt_persistent = feature_persistent &&
+   xenbus_read_unsigned(dev->otherend, "feature-persistent", 0);
 
blkif->vbd.overflow_max_grants = 0;
 
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH v2] xen/events: Fix race in set_evtchn_to_irq

2021-08-12 Thread Maximilian Heyne
There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq
mapping are lazily allocated in this function. The check whether the row
is already present and the row initialization is not synchronized. Two
threads can at the same time allocate a new row for evtchn_to_irq and
add the irq mapping to the their newly allocated row. One thread will
overwrite what the other has set for evtchn_to_irq[row] and therefore
the irq mapping is lost. This will trigger a BUG_ON later in
bind_evtchn_to_cpu:

  INFO: pci :1a:15.4: [1d0f:8061] type 00 class 0x010802
  INFO: nvme :1a:12.1: enabling device ( -> 0002)
  INFO: nvme nvme77: 1/0/0 default/read/poll queues
  CRIT: kernel BUG at drivers/xen/events/events_base.c:427!
  WARN: invalid opcode:  [#1] SMP NOPTI
  WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme]
  WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0
  WARN: Call Trace:
  WARN:  set_affinity_irq+0x121/0x150
  WARN:  irq_do_set_affinity+0x37/0xe0
  WARN:  irq_setup_affinity+0xf6/0x170
  WARN:  irq_startup+0x64/0xe0
  WARN:  __setup_irq+0x69e/0x740
  WARN:  ? request_threaded_irq+0xad/0x160
  WARN:  request_threaded_irq+0xf5/0x160
  WARN:  ? nvme_timeout+0x2f0/0x2f0 [nvme]
  WARN:  pci_request_irq+0xa9/0xf0
  WARN:  ? pci_alloc_irq_vectors_affinity+0xbb/0x130
  WARN:  queue_request_irq+0x4c/0x70 [nvme]
  WARN:  nvme_reset_work+0x82d/0x1550 [nvme]
  WARN:  ? check_preempt_wakeup+0x14f/0x230
  WARN:  ? check_preempt_curr+0x29/0x80
  WARN:  ? nvme_irq_check+0x30/0x30 [nvme]
  WARN:  process_one_work+0x18e/0x3c0
  WARN:  worker_thread+0x30/0x3a0
  WARN:  ? process_one_work+0x3c0/0x3c0
  WARN:  kthread+0x113/0x130
  WARN:  ? kthread_park+0x90/0x90
  WARN:  ret_from_fork+0x3a/0x50

This patch sets evtchn_to_irq rows via a cmpxchg operation so that they
will be set only once. The row is now cleared before writing it to
evtchn_to_irq in order to not create a race once the row is visible for
other threads.

While at it, do not require the page to be zeroed, because it will be
overwritten with -1's in clear_evtchn_to_irq_row anyway.

Signed-off-by: Maximilian Heyne 
Fixes: d0b075ffeede ("xen/events: Refactor evtchn_to_irq array to be 
dynamically allocated")
---
 drivers/xen/events/events_base.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index d7e361fb0548..0e44098f3977 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -198,12 +198,12 @@ static void disable_dynirq(struct irq_data *data);
 
 static DEFINE_PER_CPU(unsigned int, irq_epoch);
 
-static void clear_evtchn_to_irq_row(unsigned row)
+static void clear_evtchn_to_irq_row(int *evtchn_row)
 {
unsigned col;
 
for (col = 0; col < EVTCHN_PER_ROW; col++)
-   WRITE_ONCE(evtchn_to_irq[row][col], -1);
+   WRITE_ONCE(evtchn_row[col], -1);
 }
 
 static void clear_evtchn_to_irq_all(void)
@@ -213,7 +213,7 @@ static void clear_evtchn_to_irq_all(void)
for (row = 0; row < EVTCHN_ROW(xen_evtchn_max_channels()); row++) {
if (evtchn_to_irq[row] == NULL)
continue;
-   clear_evtchn_to_irq_row(row);
+   clear_evtchn_to_irq_row(evtchn_to_irq[row]);
}
 }
 
@@ -221,6 +221,7 @@ static int set_evtchn_to_irq(evtchn_port_t evtchn, unsigned 
int irq)
 {
unsigned row;
unsigned col;
+   int *evtchn_row;
 
if (evtchn >= xen_evtchn_max_channels())
return -EINVAL;
@@ -233,11 +234,18 @@ static int set_evtchn_to_irq(evtchn_port_t evtchn, 
unsigned int irq)
if (irq == -1)
return 0;
 
-   evtchn_to_irq[row] = (int *)get_zeroed_page(GFP_KERNEL);
-   if (evtchn_to_irq[row] == NULL)
+   evtchn_row = (int *) __get_free_pages(GFP_KERNEL, 0);
+   if (evtchn_row == NULL)
return -ENOMEM;
 
-   clear_evtchn_to_irq_row(row);
+   clear_evtchn_to_irq_row(evtchn_row);
+
+   /*
+* We've prepared an empty row for the mapping. If a different
+* thread was faster inserting it, we can drop ours.
+*/
+   if (cmpxchg(_to_irq[row], NULL, evtchn_row) != NULL)
+   free_page((unsigned long) evtchn_row);
}
 
WRITE_ONCE(evtchn_to_irq[row][col], irq);
-- 
2.32.0




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879






[PATCH] xen/events: Fix race in set_evtchn_to_irq

2021-08-11 Thread Maximilian Heyne
There is a TOCTOU issue in set_evtchn_to_irq. Rows in the evtchn_to_irq
mapping are lazily allocated in this function. The check whether the row
is already present and the row initialization is not synchronized. Two
threads can at the same time allocate a new row for evtchn_to_irq and
add the irq mapping to the their newly allocated row. One thread will
overwrite what the other has set for evtchn_to_irq[row] and therefore
the irq mapping is lost. This will trigger a BUG_ON later in
bind_evtchn_to_cpu:

  INFO: pci :1a:15.4: [1d0f:8061] type 00 class 0x010802
  INFO: nvme :1a:12.1: enabling device ( -> 0002)
  INFO: nvme nvme77: 1/0/0 default/read/poll queues
  CRIT: kernel BUG at drivers/xen/events/events_base.c:427!
  WARN: invalid opcode:  [#1] SMP NOPTI
  WARN: Workqueue: nvme-reset-wq nvme_reset_work [nvme]
  WARN: RIP: e030:bind_evtchn_to_cpu+0xc2/0xd0
  WARN: Call Trace:
  WARN:  set_affinity_irq+0x121/0x150
  WARN:  irq_do_set_affinity+0x37/0xe0
  WARN:  irq_setup_affinity+0xf6/0x170
  WARN:  irq_startup+0x64/0xe0
  WARN:  __setup_irq+0x69e/0x740
  WARN:  ? request_threaded_irq+0xad/0x160
  WARN:  request_threaded_irq+0xf5/0x160
  WARN:  ? nvme_timeout+0x2f0/0x2f0 [nvme]
  WARN:  pci_request_irq+0xa9/0xf0
  WARN:  ? pci_alloc_irq_vectors_affinity+0xbb/0x130
  WARN:  queue_request_irq+0x4c/0x70 [nvme]
  WARN:  nvme_reset_work+0x82d/0x1550 [nvme]
  WARN:  ? check_preempt_wakeup+0x14f/0x230
  WARN:  ? check_preempt_curr+0x29/0x80
  WARN:  ? nvme_irq_check+0x30/0x30 [nvme]
  WARN:  process_one_work+0x18e/0x3c0
  WARN:  worker_thread+0x30/0x3a0
  WARN:  ? process_one_work+0x3c0/0x3c0
  WARN:  kthread+0x113/0x130
  WARN:  ? kthread_park+0x90/0x90
  WARN:  ret_from_fork+0x3a/0x50

This patch sets evtchn_to_irq rows via a cmpxchg operation so that they
will be set only once. Clearing the row was moved up before writing the
row to evtchn_to_irq in order to not create a race once the row is
visible for other threads. Accesses to the rows are now guarded by
READ_ONCE and WRITE_ONCE just as for the columns in the data structure.

Signed-off-by: Maximilian Heyne 
Fixes: d0b075ffeede ("xen/events: Refactor evtchn_to_irq array to be 
dynamically allocated")
---
 drivers/xen/events/events_base.c | 35 ++--
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index d7e361fb0548..7582a7f52313 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -198,22 +198,24 @@ static void disable_dynirq(struct irq_data *data);
 
 static DEFINE_PER_CPU(unsigned int, irq_epoch);
 
-static void clear_evtchn_to_irq_row(unsigned row)
+static void clear_evtchn_to_irq_row(int *evtchn_row)
 {
unsigned col;
 
for (col = 0; col < EVTCHN_PER_ROW; col++)
-   WRITE_ONCE(evtchn_to_irq[row][col], -1);
+   WRITE_ONCE(evtchn_row[col], -1);
 }
 
 static void clear_evtchn_to_irq_all(void)
 {
unsigned row;
+   int *evtchn_row;
 
for (row = 0; row < EVTCHN_ROW(xen_evtchn_max_channels()); row++) {
-   if (evtchn_to_irq[row] == NULL)
+   evtchn_row = READ_ONCE(evtchn_to_irq[row]);
+   if (evtchn_row == NULL)
continue;
-   clear_evtchn_to_irq_row(row);
+   clear_evtchn_to_irq_row(evtchn_row);
}
 }
 
@@ -221,36 +223,47 @@ static int set_evtchn_to_irq(evtchn_port_t evtchn, 
unsigned int irq)
 {
unsigned row;
unsigned col;
+   int *evtchn_row;
 
if (evtchn >= xen_evtchn_max_channels())
return -EINVAL;
 
row = EVTCHN_ROW(evtchn);
col = EVTCHN_COL(evtchn);
+   evtchn_row = READ_ONCE(evtchn_to_irq[row]);
 
-   if (evtchn_to_irq[row] == NULL) {
+   if (evtchn_row == NULL) {
/* Unallocated irq entries return -1 anyway */
if (irq == -1)
return 0;
 
-   evtchn_to_irq[row] = (int *)get_zeroed_page(GFP_KERNEL);
-   if (evtchn_to_irq[row] == NULL)
+   evtchn_row = (int *) get_zeroed_page(GFP_KERNEL);
+   if (evtchn_row == NULL)
return -ENOMEM;
 
-   clear_evtchn_to_irq_row(row);
+   clear_evtchn_to_irq_row(evtchn_row);
+
+   if (cmpxchg(_to_irq[row], NULL, evtchn_row) != NULL) {
+   free_page((unsigned long) evtchn_row);
+   evtchn_row = READ_ONCE(evtchn_to_irq[row]);
+   }
}
 
-   WRITE_ONCE(evtchn_to_irq[row][col], irq);
+   WRITE_ONCE(evtchn_row[col], irq);
return 0;
 }
 
 int get_evtchn_to_irq(evtchn_port_t evtchn)
 {
+   int *evtchn_row;
+
if (evtchn >= xen_evtchn_max_channels())
return -1;
-   if (evtchn_to_irq[EVTCHN_ROW(evtchn)] == NULL)
+
+   evtchn_row = READ_ONCE(evtchn

Re: [PATCH 0/3] Cleanup IOREQ server on exit

2020-04-07 Thread Maximilian Heyne

Could someone please have a look at this patch? It solves an actual issue:
Try soft-reset with qemu-xen-traditional and it will fail.

On 3/13/20 1:33 PM, Maximilian Heyne wrote:

Following up on commit 9c0eed61 ("qemu-trad: stop using the default IOREQ
server"), clean up the IOREQ server on exit. This fixes a bug with soft-reset
that shows up as "bind interdomain ioctl error 22" because the event channels
were not closed at the soft-reset and can't be bound again.

For this I used the exit notifiers from QEMU that I backported together with the
required generic notifier lists.

Anthony Liguori (1):
   Add support for generic notifier lists

Gerd Hoffmann (1):
   Add exit notifiers.

Maximilian Heyne (1):
   xen: cleanup IOREQ server on exit

  Makefile|  1 +
  hw/xen_machine_fv.c | 11 +++
  notify.c| 39 +++
  notify.h| 43 +++
  sys-queue.h |  5 +
  sysemu.h|  5 +
  vl.c| 20 
  7 files changed, 124 insertions(+)
  create mode 100644 notify.c
  create mode 100644 notify.h





Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




[Xen-devel] [PATCH 1/3] Add support for generic notifier lists

2020-03-13 Thread Maximilian Heyne
From: Anthony Liguori 

Notifiers are data-less callbacks and a notifier list is a list of registered
notifiers that all are interested in a particular event.

We'll use this in a few patches to implement mouse change notification.

Signed-off-by: Anthony Liguori 
---
v1 -> v2
 - Do not do memory allocations by placing list nodes in notifier

[cherry-picked from d1e70c5e6d1472856c52969301247fe8c3c8389d
conflicts: used the sys-qeue interface and added required
LIST_REMOVE_SAFE function to that]
Signed-off-by: Maximilian Heyne 
---
 Makefile|  1 +
 notify.c| 39 +++
 notify.h| 43 +++
 sys-queue.h |  5 +
 4 files changed, 88 insertions(+)
 create mode 100644 notify.c
 create mode 100644 notify.h

diff --git a/Makefile b/Makefile
index 0fbec990b..d921bcdf8 100644
--- a/Makefile
+++ b/Makefile
@@ -93,6 +93,7 @@ OBJS+=sd.o ssi-sd.o
 OBJS+=bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o usb-bt.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
+OBJS+=notify.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
diff --git a/notify.c b/notify.c
new file mode 100644
index 0..59e1e7c7d
--- /dev/null
+++ b/notify.c
@@ -0,0 +1,39 @@
+/*
+ * Notifier lists
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "notify.h"
+
+void notifier_list_init(NotifierList *list)
+{
+LIST_INIT(>notifiers);
+}
+
+void notifier_list_add(NotifierList *list, Notifier *notifier)
+{
+LIST_INSERT_HEAD(>notifiers, notifier, node);
+}
+
+void notifier_list_remove(Notifier *notifier)
+{
+LIST_REMOVE(notifier, node);
+}
+
+void notifier_list_notify(NotifierList *list)
+{
+Notifier *notifier, *next;
+
+LIST_FOREACH_SAFE(notifier, >notifiers, node, next) {
+notifier->notify(notifier);
+}
+}
diff --git a/notify.h b/notify.h
new file mode 100644
index 0..093c63f19
--- /dev/null
+++ b/notify.h
@@ -0,0 +1,43 @@
+/*
+ * Notifier lists
+ *
+ * Copyright IBM, Corp. 2010
+ *
+ * Authors:
+ *  Anthony Liguori   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_NOTIFY_H
+#define QEMU_NOTIFY_H
+
+#include "sys-queue.h"
+
+typedef struct Notifier Notifier;
+
+struct Notifier
+{
+void (*notify)(Notifier *notifier);
+LIST_ENTRY(Notifier) node;
+};
+
+typedef struct NotifierList
+{
+LIST_HEAD(, Notifier) notifiers;
+} NotifierList;
+
+#define NOTIFIER_LIST_INITIALIZER(head) \
+{ LIST_HEAD_INITIALIZER((head).notifiers) }
+
+void notifier_list_init(NotifierList *list);
+
+void notifier_list_add(NotifierList *list, Notifier *notifier);
+
+void notifier_list_remove(Notifier *notifier);
+
+void notifier_list_notify(NotifierList *list);
+
+#endif
diff --git a/sys-queue.h b/sys-queue.h
index 55c26fe7f..81ab044a8 100644
--- a/sys-queue.h
+++ b/sys-queue.h
@@ -132,6 +132,11 @@ struct {   
 \
 (var);  \
 (var) = ((var)->field.le_next))
 
+#define LIST_FOREACH_SAFE(var, head, field, next_var)   \
+for ((var) = ((head)->lh_first);\
+(var) && ((next_var) = ((var)->field.le_next), 1);  \
+(var) = (next_var))
+
 /*
  * List access methods.
  */
-- 
2.16.6




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 3/3] xen: cleanup IOREQ server on exit

2020-03-13 Thread Maximilian Heyne
Use the backported Notifier interface to register an atexit handler to
cleanup the IOREQ server. This is required since Xen commit a5a180f9
("x86/domain: don't destroy IOREQ servers on soft reset") is introduced
which requires Qemu to explicitly close the IOREQ server.

This is can be seen as a backport of ba7fdd64 ("xen: cleanup IOREQ
server on exit").

Signed-off-by: Maximilian Heyne 
---
 hw/xen_machine_fv.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/xen_machine_fv.c b/hw/xen_machine_fv.c
index f0989fad4..66eb4a1eb 100644
--- a/hw/xen_machine_fv.c
+++ b/hw/xen_machine_fv.c
@@ -31,6 +31,7 @@
 #include "qemu-aio.h"
 #include "xen_backend.h"
 #include "pci.h"
+#include "sysemu.h"
 
 #include 
 #include 
@@ -67,6 +68,8 @@ TAILQ_HEAD(map_cache_head, map_cache_rev) locked_entries = 
TAILQ_HEAD_INITIALIZE
 static unsigned long last_address_page = ~0UL;
 static uint8_t  *last_address_vaddr;
 
+static Notifier exit_notifier;
+
 static int qemu_map_cache_init(void)
 {
 unsigned long size;
@@ -283,6 +286,11 @@ void xen_disable_io(void)
 xc_hvm_set_ioreq_server_state(xc_handle, domid, ioservid, 0);
 }
 
+static void xen_exit_notifier(Notifier *n)
+{
+xc_hvm_destroy_ioreq_server(xc_handle, domid, ioservid);
+}
+
 static void xen_init_fv(ram_addr_t ram_size, int vga_ram_size,
const char *boot_device,
const char *kernel_filename,const char *kernel_cmdline,
@@ -317,6 +325,9 @@ static void xen_init_fv(ram_addr_t ram_size, int 
vga_ram_size,
 exit(-1);
 }
 
+exit_notifier.notify = xen_exit_notifier;
+qemu_add_exit_notifier(_notifier);
+
 if (xc_hvm_get_ioreq_server_info(xc_handle, domid, ioservid,
  _pfn, _pfn,
  _evtchn)) {
-- 
2.16.6




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 0/3] Cleanup IOREQ server on exit

2020-03-13 Thread Maximilian Heyne
Following up on commit 9c0eed61 ("qemu-trad: stop using the default IOREQ
server"), clean up the IOREQ server on exit. This fixes a bug with soft-reset
that shows up as "bind interdomain ioctl error 22" because the event channels
were not closed at the soft-reset and can't be bound again.

For this I used the exit notifiers from QEMU that I backported together with the
required generic notifier lists.

Anthony Liguori (1):
  Add support for generic notifier lists

Gerd Hoffmann (1):
  Add exit notifiers.

Maximilian Heyne (1):
  xen: cleanup IOREQ server on exit

 Makefile|  1 +
 hw/xen_machine_fv.c | 11 +++
 notify.c| 39 +++
 notify.h| 43 +++
 sys-queue.h |  5 +
 sysemu.h|  5 +
 vl.c| 20 
 7 files changed, 124 insertions(+)
 create mode 100644 notify.c
 create mode 100644 notify.h

-- 
2.16.6




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/3] Add exit notifiers.

2020-03-13 Thread Maximilian Heyne
From: Gerd Hoffmann 

Hook up any cleanup work which needs to be done here.  Advantages over
using atexit(3):

  (1) You get passed in a pointer to the notifier.  If you embed that
  into your state struct you can use container_of() to get get your
  state info.
  (2) You can unregister, say when un-plugging a device.

[ v2: move code out of #ifndef _WIN32 ]

Signed-off-by: Anthony Liguori 
(cherry picked from commit fd42deeb4cb42f90084046e3ebdb4383953195e3)
Signed-off-by: Maximilian Heyne 
---
 sysemu.h |  5 +
 vl.c | 20 
 2 files changed, 25 insertions(+)

diff --git a/sysemu.h b/sysemu.h
index 968258a84..759d0e9d5 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -2,6 +2,8 @@
 #define SYSEMU_H
 /* Misc. things related to the system emulator.  */
 
+#include "notify.h"
+
 /* vl.c */
 extern const char *bios_name;
 extern const char *bios_dir;
@@ -39,6 +41,9 @@ void qemu_system_powerdown(void);
 #endif
 void qemu_system_reset(void);
 
+void qemu_add_exit_notifier(Notifier *notify);
+void qemu_remove_exit_notifier(Notifier *notify);
+
 void do_savevm(const char *name);
 void do_loadvm(const char *name);
 void do_delvm(const char *name);
diff --git a/vl.c b/vl.c
index c3c5d630e..2163217ec 100644
--- a/vl.c
+++ b/vl.c
@@ -282,6 +282,9 @@ uint8_t qemu_uuid[16];
 
 #include "xen-vl-extra.c"
 
+static NotifierList exit_notifiers =
+NOTIFIER_LIST_INITIALIZER(exit_notifiers);
+
 /***/
 /* x86 ISA bus support */
 
@@ -4843,6 +4846,21 @@ static void vcpu_hex_str_to_bitmap(const char *optarg)
 }
 }
 
+void qemu_add_exit_notifier(Notifier *notify)
+{
+notifier_list_add(_notifiers, notify);
+}
+
+void qemu_remove_exit_notifier(Notifier *notify)
+{
+notifier_list_remove(notify);
+}
+
+static void qemu_run_exit_notifiers(void)
+{
+notifier_list_notify(_notifiers);
+}
+
 int main(int argc, char **argv, char **envp)
 {
 #ifdef CONFIG_GDBSTUB
@@ -4887,6 +4905,8 @@ int main(int argc, char **argv, char **envp)
 const char *chroot_dir = NULL;
 const char *run_as = NULL;
 
+atexit(qemu_run_exit_notifiers);
+
 qemu_cache_utils_init(envp);
 logfile = stderr; /* initial value */
 
-- 
2.16.6




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel