[RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0

2024-07-08 Thread Jiqian Chen
When dom0 is PVH, and passthrough a device to dumU, xl will
use the gsi number of device to do a pirq mapping, see
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
got from file /sys/bus/pci/devices//irq, that confuses
irq and gsi, they are in different space and are not equal,
so it will fail when mapping.
To solve this issue, use xc_physdev_gsi_from_dev to get the
real gsi and then to map pirq.

Besides, PVH dom doesn't have PIRQ flag, it doesn't do
PHYSDEVOP_map_pirq for each gsi. So grant function callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
requires passing in pirq, it is not suitable for dom0 that
doesn't have PIRQs to grant irq permission.
To solve this issue, use the new hypercall
XEN_DOMCTL_gsi_permission to grant the permission of irq(
translate from gsi) to dumU when dom0 has no PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
This patch must be merged after the patch on linux kernel side
---
 tools/include/xenctrl.h   |   5 ++
 tools/libs/ctrl/xc_domain.c   |  15 +
 tools/libs/light/libxl_arch.h |   4 ++
 tools/libs/light/libxl_arm.c  |  10 +++
 tools/libs/light/libxl_pci.c  |  17 ++
 tools/libs/light/libxl_x86.c  | 111 ++
 6 files changed, 162 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 3720e22b399a..9ff5f1810cf8 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ uint8_t access_flag);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..4c89f07e4d6e 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, );
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ uint8_t access_flag)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.access_flag = access_flag,
+};
+
+return do_domctl(xch, );
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h
index f88f11d6de1d..11b736067951 100644
--- a/tools/libs/light/libxl_arch.h
+++ b/tools/libs/light/libxl_arch.h
@@ -91,6 +91,10 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
   libxl_domain_config *dst,
   const libxl_domain_config *src);
 
+_hidden
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
+_hidden
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
 #if defined(__i386__) || defined(__x86_64__)
 
 #define LAPIC_BASE_ADDRESS  0xfee0
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index a4029e3ac810..d869bbec769e 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -1774,6 +1774,16 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
 {
 }
 
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+return -1;
+}
+
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..3d25997921cc 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -17,6 +17,7 @@
 #include "libxl_osdeps.h" /* must come before any other headers */
 
 #include "libxl_internal.h"
+#include "libxl_arch.h"
 
 #define PCI_BDF"%04x:%02x:%02x.%01x"
 #define PCI_BDF_SHORT  "%02x:%02x.%01x"
@@ -1478,6 +1479,16 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
+
+/*
+ * When dom0 is PVH and mapping a x86 gsi to pirq 

[RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev

2024-07-08 Thread Jiqian Chen
When passthrough a device to domU, QEMU and xl tools use its gsi
number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
from file /sys/bus/pci/devices//irq, that is wrong, because
irq is not equal with gsi, they are in different spaces, so pirq
mapping fails.

And in current codes, there is no method to get gsi for userspace.
For above purpose, add new function to get gsi, and the
corresponding ioctl is implemented on linux kernel side.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
This patch must be merged after the patch on linux kernel side

CC: Anthony PERARD 
Remaining comment @Anthony PERARD:
Do I need to make " opening of /dev/xen/privcmd " as a single function, then 
use it in this
patch and other libraries?
---
 tools/include/xen-sys/Linux/privcmd.h |  7 ++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/ctrl/xc_physdev.c  | 35 +++
 3 files changed, 44 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..4cf719102116 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_pcidev {
+   __u32 sbdf;
+   __u32 gsi;
+} privcmd_gsi_from_pcidev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: _hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_PCIDEV  \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_pcidev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..3720e22b399a 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index e9fcd755fa62..54edb0f3c0dc 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
 return rc;
 }
 
+int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)
+{
+int rc = -1;
+
+#if defined(__linux__)
+int fd;
+privcmd_gsi_from_pcidev_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = 0,
+};
+
+fd = open("/dev/xen/privcmd", O_RDWR);
+
+if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) {
+/* Fallback to /proc/xen/privcmd */
+fd = open("/proc/xen/privcmd", O_RDWR);
+}
+
+if (fd < 0) {
+PERROR("Could not obtain handle on privileged command interface");
+return rc;
+}
+
+rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, _gsi);
+close(fd);
+
+if (rc) {
+PERROR("Failed to get gsi from dev");
+} else {
+rc = dev_gsi.gsi;
+}
+#endif
+
+return rc;
+}
-- 
2.34.1




[XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-07-08 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
And add a new check to prevent (un)map when the subject domain
doesn't have a notion of PIRQ.

So that the interrupt of a passthrough device can be
successfully mapped to pirq for domU with a notion of PIRQ
when dom0 is PVH

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c |  6 ++
 xen/arch/x86/physdev.c   | 12 ++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0fab670a4871..03ada3c880bd 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 switch ( cmd )
 {
+/*
+* Only being permitted for management of other domains.
+* Further restrictions are enforced in do_physdev_op.
+*/
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index d6dd622952a9..9f30a8c63a06 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
-ret = physdev_map_pirq(d, map.type, , , );
+/* Only mapping when the subject domain has a notion of PIRQ */
+if ( !is_hvm_domain(d) || has_pirq(d) )
+ret = physdev_map_pirq(d, map.type, , , );
+else
+ret = -EOPNOTSUPP;
 
 rcu_unlock_domain(d);
 
@@ -346,7 +350,11 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
-ret = physdev_unmap_pirq(d, unmap.pirq);
+/* Only unmapping when the subject domain has a notion of PIRQ */
+if ( !is_hvm_domain(d) || has_pirq(d) )
+ret = physdev_unmap_pirq(d, unmap.pirq);
+else
+ret = -EOPNOTSUPP;
 
 rcu_unlock_domain(d);
 
-- 
2.34.1




[XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev

2024-07-08 Thread Jiqian Chen
When a device has been reset on dom0 side, the Xen hypervisor
doesn't get notification, so the cached state in vpci is all
out of date compare with the real device state.

To solve that problem, add a new hypercall to support the reset
of pcidev and clear the vpci state of device. So that once the
state of device is reset on dom0 side, dom0 can call this
hypercall to notify hypervisor.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 52 
 xen/drivers/vpci/vpci.c  | 10 +++
 xen/include/public/physdev.h | 16 +++
 xen/include/xen/vpci.h   |  8 ++
 5 files changed, 87 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..c0f47945d955 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset:
+{
+struct pci_device_state_reset dev_reset;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+ret = -EOPNOTSUPP;
+if ( !is_pci_passthrough_enabled() )
+break;
+
+ret = -EFAULT;
+if ( copy_from_guest(_reset, arg, 1) != 0 )
+break;
+
+sbdf = PCI_SBDF(dev_reset.dev.seg,
+dev_reset.dev.bus,
+dev_reset.dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(>domain->pci_lock);
+pcidevs_unlock();
+switch ( dev_reset.reset_type )
+{
+case PCI_DEVICE_STATE_RESET_COLD:
+case PCI_DEVICE_STATE_RESET_WARM:
+case PCI_DEVICE_STATE_RESET_HOT:
+case PCI_DEVICE_STATE_RESET_FLR:
+ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
+break;
+
+default:
+ret = -EOPNOTSUPP;
+break;
+}
+write_unlock(>domain->pci_lock);
+
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..7e914d1eff9f 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev,
+uint32_t reset_type)
+{
+ASSERT(rw_is_write_locked(>domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..3cfde3fd2389 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
@@ -305,6 +312,15 @@ struct physdev_pci_device {
 typedef struct physdev_pci_device physdev_pci_device_t;
 DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
 
+struct pci_device_state_reset {
+physdev_pci_device_t dev;
+#define PCI_DEVICE_STATE_RESET_COLD 0
+#define PCI_DEVICE_STATE_RESET_WARM 1
+#define PCI_DEVICE_STATE_RESET_HOT  2
+#define PCI_DEVICE_STATE_RESET_FLR  3
+uint32_t reset_type;
+};
+
 #define PHYSDEVOP_DBGP_RESET_PREPARE1
 #define PHYSDEVOP_DBGP_RESET_DONE   2
 
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index da8d0f41e6f4..6be812dbc04a 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpc

[XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq

2024-07-08 Thread Jiqian Chen
Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific
pirq or a free pirq, it depends on the parameter pirq(>0 or <0).
But in current xc_physdev_map_pirq, it set *pirq=index when
parameter pirq is <0, it causes to force all cases to be mapped
to a specific pirq. That has some problems, one is caller can't
get a free pirq value, another is that once the pecific pirq was
already mapped to other gsi, then it will fail.

So, change xc_physdev_map_pirq to allow to pass negative parameter
in and then get a free pirq.

There are four caller of xc_physdev_map_pirq in original codes, so
clarify the affect below(just need to clarify the pirq<0 case):

First, pci_add_dm_done->xc_physdev_map_pirq, it pass irq to pirq
parameter, if pirq<0 means irq<0, then it will fail at check
"index < 0" in allocate_and_map_gsi_pirq and get EINVAL, logic is
the same as original code.

Second, domcreate_launch_dm->libxl__arch_domain_map_irq->
xc_physdev_map_pirq, the passed pirq is always >=0, so no affect.

Third, pyxc_physdev_map_pirq->xc_physdev_map_pirq, not sure, so add
the check logic into pyxc_physdev_map_pirq to keep the same behavior.

Fourth, xen_pt_realize->xc_physdev_map_pirq, it wants to allocate a
pirq for gsi, but it isn't necessary to get pirq whose value is equal
with the value of gsi. After this patch, it will get a free pirq, and
it also can work.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/libs/ctrl/xc_physdev.c  | 2 +-
 tools/python/xen/lowlevel/xc/xc.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..e9fcd755fa62 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch,
 map.domid = domid;
 map.type = MAP_PIRQ_TYPE_GSI;
 map.index = index;
-map.pirq = *pirq < 0 ? index : *pirq;
+map.pirq = *pirq;
 
 rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, , sizeof(map));
 
diff --git a/tools/python/xen/lowlevel/xc/xc.c 
b/tools/python/xen/lowlevel/xc/xc.c
index 9feb12ae2b16..f8c9db7115ee 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -774,6 +774,8 @@ static PyObject *pyxc_physdev_map_pirq(PyObject *self,
 if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iii", kwd_list,
   , , ) )
 return NULL;
+if ( pirq < 0 )
+pirq = index;
 ret = xc_physdev_map_pirq(xc->xc_handle, dom, index, );
 if ( ret != 0 )
   return pyxc_error_to_exception(xc->xc_handle);
-- 
2.34.1




[XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-07-08 Thread Jiqian Chen
The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis may not get registered(see below
clarification), it causes the info of apic, pin and irq not be
added into irq_2_pin list, and the handler of irq_desc is not set,
then when passthrough a device, setting ioapic affinity and vector
will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Clarify two questions:
First, why the gsi of devices belong to PVH dom0 can work?
Because when probe a driver to a normal device, it uses the normal
probe function of pci device, in its callstack, it requests irq
and unmask corresponding ioapic of gsi, then trap into xen and
register gsi finally.
Callstack is(on linux kernel side) pci_device_probe->
request_threaded_irq-> irq_startup-> __unmask_ioapic->
io_apic_write, then trap into xen hvmemul_do_io->
hvm_io_intercept-> hvm_process_io_intercept->
vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
So that the gsi can be registered.

Second, why the gsi of passthrough device can't work when dom0
is PVH?
Because when assign a device to passthrough, it uses the specific
probe function of pciback, in its callstack, it doesn't install a
fake irq handler due to the ISR is not running. So that
mp_register_gsi on Xen side is never called, then the gsi is not
registered.
Callstack is(on linux kernel side) pcistub_probe->pcistub_seize->
pcistub_init_device-> xen_pcibk_reset_device->
xen_pcibk_control_isr->isr_on==0.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 03ada3c880bd..cfe82d0f96ed 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 return -ENOSYS;
 break;
 
+case PHYSDEVOP_setup_gsi:
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
-- 
2.34.1




[XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi

2024-07-08 Thread Jiqian Chen
Some type of domains don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq to set the access of irq, it is not suitable for
dom0 that doesn't have PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
the permission of irq(translate from x86 gsi) to dumU when dom0
has no PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
CC: Daniel P . Smith 
Remaining comment @Daniel P . Smith:
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
+goto gsi_permission_out;
Is it okay to issue the XSM check using the translated value, 
not the one that was originally passed into the hypercall?
---
 xen/arch/x86/domctl.c  | 32 ++
 xen/arch/x86/include/asm/io_apic.h |  2 ++
 xen/arch/x86/io_apic.c | 17 
 xen/arch/x86/mpparse.c |  5 ++---
 xen/include/public/domctl.h|  9 +
 xen/xsm/flask/hooks.c  |  1 +
 6 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9190e11faaa3..4e9e4c4cfed3 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int update_domain_cpu_policy(struct domain *d,
 xen_domctl_cpu_policy_t *xdpc)
@@ -237,6 +238,37 @@ long arch_do_domctl(
 break;
 }
 
+case XEN_DOMCTL_gsi_permission:
+{
+int irq;
+unsigned int gsi = domctl->u.gsi_permission.gsi;
+uint8_t access_flag = domctl->u.gsi_permission.access_flag;
+
+/* Check all bits and pads are zero except lowest bit */
+ret = -EINVAL;
+if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) )
+goto gsi_permission_out;
+for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
+if ( domctl->u.gsi_permission.pad[i] )
+goto gsi_permission_out;
+
+if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 )
+goto gsi_permission_out;
+
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
+goto gsi_permission_out;
+
+if ( access_flag )
+ret = irq_permit_access(d, irq);
+else
+ret = irq_deny_access(d, irq);
+
+gsi_permission_out:
+break;
+}
+
 case XEN_DOMCTL_getpageframeinfo3:
 {
 unsigned int num = domctl->u.getpageframeinfo3.num;
diff --git a/xen/arch/x86/include/asm/io_apic.h 
b/xen/arch/x86/include/asm/io_apic.h
index 78268ea8f666..7e86d8337758 100644
--- a/xen/arch/x86/include/asm/io_apic.h
+++ b/xen/arch/x86/include/asm/io_apic.h
@@ -213,5 +213,7 @@ unsigned highest_gsi(void);
 
 int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
 int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
+int mp_find_ioapic(int gsi);
+int gsi_2_irq(int gsi);
 
 #endif
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index d2a313c4ac72..5968c8055671 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
 return irq;
 }
 
+int gsi_2_irq(int gsi)
+{
+int ioapic, pin, irq;
+
+ioapic = mp_find_ioapic(gsi);
+if ( ioapic < 0 )
+return -EINVAL;
+
+pin = gsi - io_apic_gsi_base(ioapic);
+
+irq = apic_pin_2_gsi_irq(ioapic, pin);
+if ( irq <= 0 )
+return -EINVAL;
+
+return irq;
+}
+
 static inline int IO_APIC_irq_trigger(int irq)
 {
 int apic, idx, pin;
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index d8ccab2449c6..7786a3337760 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
 } mp_ioapic_routing[MAX_IO_APICS];
 
 
-static int mp_find_ioapic (
-   int gsi)
+int mp_find_ioapic(int gsi)
 {
unsigned inti;
 
@@ -914,7 +913,7 @@ void __init mp_register_ioapic (
return;
 }
 
-unsigned __init highest_gsi(void)
+unsigned highest_gsi(void)
 {
unsigned x, res = 0;
for (x = 0; x < nr_ioapics; x++)
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 2a49fe46ce25..877e35ab1376 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -464,6 +464,13 @@ struct xen_domctl_irq_permission {
 uint8_t pad[3];
 };
 

[XEN PATCH v12 0/7] Support device passthrough when dom0 is PVH on Xen

2024-07-08 Thread Jiqian Chen
Hi All,
This is v12 series to support passthrough when dom0 is PVH
The expected merge order of this series is the first three patches in this 
series, then patches on
kernel side, then the last four patches in this series.
v11->v12 changes:
* patch#1: Change the title of this patch.
   Remove unnecessary notes, erroneous stamps, and #define.
* patch#2: Avoid using return, set error code instead when (un)map is not 
allowed.
   Due to functional change in v11, remove the Reviewed-by of Stefano.
* patch#3: Add more detailed descriptions into commit message not just 
callstack.

patch#4 in v11: remove from this series and upstream individually.

* patch#4: is patch#5 of v11, change nr_irqs_gsi to highest_gsi() to check gsi 
boundary, then need to
   remove "__init" of highest_gsi function.
   Change the check of irq boundary from <0 to <=0, and remove 
unnecessary space.
   Add #define XEN_DOMCTL_GSI_PERMISSION_MASK 1 to get lowest bit.
* patch#5: Add explanation of whether the caller of xc_physdev_map_pirq is 
affected.


Best regards,
Jiqian Chen



v10->v11 changes:
* patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to 
the next line.
   Delete unnecessary local variables "struct physdev_pci_device *dev".
   Downgrade printk to dprintk.
   Moved struct pci_device_state_reset to the public header file.
   Delete enum pci_device_state_reset_type, and use macro definitions 
to represent different
   reset types.
   Delete pci_device_state_reset_method, and add switch cases in 
PHYSDEVOP_pci_device_state_reset
   to handle different reset functions.
   Add reset type as a function parameter for vpci_reset_device_state 
for possible future use
* patch#2: Delete the judgment of "d==currd", so that we can prevent 
physdev_(un)map_pirq from being
   executed when domU has no pirq, instead of just preventing 
self-mapping; and modify the
   description of the commit message accordingly.
* patch#3: Modify the commit message to explain why the gsi of normal devices 
can work in PVH dom0 and why
   the passthrough device does not work in PVH dom0.
* patch#4: New patch, modification of allocate_pirq function, return the 
allocated pirq when there is
   already an allocated pirq and the caller has no specific 
requirements for pirq, and make it
   successful.
* patch#5: Modification on the hypervisor side proposed from patch#5 of v10.
   Add non-zero judgment for other bits of allow_access.
   Delete unnecessary judgment "if ( is_pv_domain(currd) || 
has_pirq(currd) )".
   Change the error exit path identifier "out" to "gsi_permission_out".
   Use ARRAY_SIZE() instead of open coed.
* patch#6: New patch, modification of xc_physdev_map_pirq to support mapping 
gsi to an idle pirq.
* patch#7: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function 
xc_physdev_gsi_from_dev
   instead of adding unnecessary functions to libxencall.
   Change the type of gsi in the structure privcmd_gsi_from_dev from 
int to u32.
* patch#8: Modification of the tools part of patches#4 and #5 of v10, use 
privcmd_gsi_from_dev to get
   gsi, and use XEN_DOMCTL_gsi_permission to grant gsi.
   Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID.
   Add libxl__arch_hvm_map_gsi to distinguish x86 related 
implementations.
   Add a list pcidev_pirq_list to record the relationship between sbdf 
and pirq, which can be
   used to obtain the corresponding pirq when unmap PIRQ.


v9->v10 changes:
* patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code 
style.
* patch#3: Modified the description in the commit message, changing "it calls" 
to "it will need to call",
   indicating that there will be new codes on the kernel side that will 
call PHYSDEVOP_setup_gsi.
   Also added an explanation of why the interrupt of passthrough device 
does not work if gsi is not
   registered.
* patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate 
x86 code in libxl_pci.c.
* patch#5: Modified the commit message to further describe the purpose of 
adding XEN_DOMCTL_gsi_permission.
   Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission 
directly in pci_add_dm_done.
   Added a check for all zeros in the padding field in 
XEN_DOMCTL_gsi_permission, and used currd
   instead of current->domain.
   In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of 
the original new code, and
   error handling for irq0 was added.
   Deleted the extra spaces in the upper and lower lines of the struct 
xen_domctl_gsi_permission
   definition.
All patches have modified s

[PATCH for-4.19 v2] x86/physdev: Return pirq that irq was already mapped to

2024-07-08 Thread Jiqian Chen
Fix bug introduced by 0762e2502f1f ("x86/physdev: factor out the code to 
allocate and
map a pirq"). After that re-factoring, when pirq<0 and current_pirq>0, it means
caller want to allocate a free pirq for irq but irq already has a mapped pirq, 
then
it returns the negative pirq, so it fails. However, the logic before that
re-factoring is different, it should return the current_pirq that irq was 
already
mapped to and make the call success.

Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a 
pirq")

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Jan Beulich 
---
 xen/arch/x86/irq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 017a94e31155..47477d88171b 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2898,6 +2898,7 @@ static int allocate_pirq(struct domain *d, int index, int 
pirq, int irq,
 d->domain_id, index, pirq, current_pirq);
 if ( current_pirq < 0 )
 return -EBUSY;
+pirq = current_pirq;
 }
 else if ( type == MAP_PIRQ_TYPE_MULTI_MSI )
 {
-- 
2.34.1




[PATCH for-4.19] x86/physdev: Return pirq that irq was already mapped to

2024-07-08 Thread Jiqian Chen
Fix bug imported by 0762e2502f1f ("x86/physdev: factor out the code to allocate 
and
map a pirq"). After that re-factoring, when pirq<0 and current_pirq>0, it means
caller want to allocate a free pirq for irq but irq already has a mapped pirq, 
then
it returns the negative pirq, so it fails. However, the logic before that
re-factoring is different, it should return the current_pirq that irq was 
already
mapped to and make the call success.

Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a 
pirq")

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/irq.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 9a611c79e024..1a827ccc8498 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2897,6 +2897,7 @@ static int allocate_pirq(struct domain *d, int index, int 
pirq, int irq,
 d->domain_id, index, pirq, current_pirq);
 if ( current_pirq < 0 )
 return -EBUSY;
+pirq = current_pirq;
 }
 else if ( type == MAP_PIRQ_TYPE_MULTI_MSI )
 {
-- 
2.34.1




[XEN PATCH v11 6/8] tools/libxc: Allow gsi be mapped into a free pirq

2024-06-30 Thread Jiqian Chen
Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific
pirq or a free pirq, it depends on the parameter pirq(>0 or <0).
But in current xc_physdev_map_pirq, it set *pirq=index when
parameter pirq is <0, it causes to force all cases to be mapped
to a specific pirq. That has some problems, one is caller can't
get a free pirq value, another is that once the pecific pirq was
already mapped to other gsi, then it will fail.

So, change xc_physdev_map_pirq to allow to pass negative parameter
in and then get a free pirq.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/libs/ctrl/xc_physdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..e9fcd755fa62 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch,
 map.domid = domid;
 map.type = MAP_PIRQ_TYPE_GSI;
 map.index = index;
-map.pirq = *pirq < 0 ? index : *pirq;
+map.pirq = *pirq;
 
 rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, , sizeof(map));
 
-- 
2.34.1




[RFC XEN PATCH v11 8/8] tools: Add new function to do PIRQ (un)map on PVH dom0

2024-06-30 Thread Jiqian Chen
When dom0 is PVH, and passthrough a device to dumU, xl will
use the gsi number of device to do a pirq mapping, see
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is
got from file /sys/bus/pci/devices//irq, that confuses
irq and gsi, they are in different space and are not equal,
so it will fail when mapping.
To solve this issue, use xc_physdev_gsi_from_dev to get the
real gsi and then to map pirq.

Besides, PVH dom doesn't have PIRQ flag, it doesn't do
PHYSDEVOP_map_pirq for each gsi. So grant function callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq. And old hypercall XEN_DOMCTL_irq_permission
requires passing in pirq, it is not suitable for dom0 that
doesn't have PIRQs to grant irq permission.
To solve this issue, use the new hypercall
XEN_DOMCTL_gsi_permission to grant the permission of irq(
translate from gsi) to dumU when dom0 has no PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
This patch must be merged after the patch on linux kernel side
---
 tools/include/xenctrl.h   |   5 ++
 tools/libs/ctrl/xc_domain.c   |  15 +
 tools/libs/light/libxl_arch.h |   4 ++
 tools/libs/light/libxl_arm.c  |  10 +++
 tools/libs/light/libxl_pci.c  |  17 ++
 tools/libs/light/libxl_x86.c  | 111 ++
 6 files changed, 162 insertions(+)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 3720e22b399a..33810385535e 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, );
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, );
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_arch.h b/tools/libs/light/libxl_arch.h
index f88f11d6de1d..11b736067951 100644
--- a/tools/libs/light/libxl_arch.h
+++ b/tools/libs/light/libxl_arch.h
@@ -91,6 +91,10 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
   libxl_domain_config *dst,
   const libxl_domain_config *src);
 
+_hidden
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
+_hidden
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid);
 #if defined(__i386__) || defined(__x86_64__)
 
 #define LAPIC_BASE_ADDRESS  0xfee0
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index a4029e3ac810..d869bbec769e 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -1774,6 +1774,16 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
 {
 }
 
+int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+return -1;
+}
+
+int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
+{
+return -1;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..3d25997921cc 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -17,6 +17,7 @@
 #include "libxl_osdeps.h" /* must come before any other headers */
 
 #include "libxl_internal.h"
+#include "libxl_arch.h"
 
 #define PCI_BDF"%04x:%02x:%02x.%01x"
 #define PCI_BDF_SHORT  "%02x:%02x.%01x"
@@ -1478,6 +1479,16 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
+
+/*
+ * When dom0 is PVH and mapping a x86 gsi to pirq 

[XEN PATCH v11 0/8] Support device passthrough when dom0 is PVH on Xen

2024-06-30 Thread Jiqian Chen
Hi All,
This is v11 series to support passthrough when dom0 is PVH
v10->v11 changes:
* patch#1: Move the curly braces of "case PHYSDEVOP_pci_device_state_reset" to 
the next line.
   Delete unnecessary local variables "struct physdev_pci_device *dev".
   Downgrade printk to dprintk.
   Moved struct pci_device_state_reset to the public header file.
   Delete enum pci_device_state_reset_type, and use macro definitions 
to represent different
   reset types.
   Delete pci_device_state_reset_method, and add switch cases in 
PHYSDEVOP_pci_device_state_reset
   to handle different reset functions.
   Add reset type as a function parameter for vpci_reset_device_state 
for possible future use
* patch#2: Delete the judgment of "d==currd", so that we can prevent 
physdev_(un)map_pirq from being
   executed when domU has no pirq, instead of just preventing 
self-mapping; and modify the
   description of the commit message accordingly.
* patch#3: Modify the commit message to explain why the gsi of normal devices 
can work in PVH dom0 and why
   the passthrough device does not work in PVH dom0.
* patch#4: New patch, modification of allocate_pirq function, return the 
allocated pirq when there is
   already an allocated pirq and the caller has no specific 
requirements for pirq, and make it
   successful.
* patch#5: Modification on the hypervisor side proposed from patch#5 of v10.
   Add non-zero judgment for other bits of allow_access.
   Delete unnecessary judgment "if ( is_pv_domain(currd) || 
has_pirq(currd) )".
   Change the error exit path identifier "out" to "gsi_permission_out".
   Use ARRAY_SIZE() instead of open coed.
* patch#6: New patch, modification of xc_physdev_map_pirq to support mapping 
gsi to an idle pirq.
* patch#7: Patch#4 of v10, directly open "/dev/xen/privcmd" in the function 
xc_physdev_gsi_from_dev
   instead of adding unnecessary functions to libxencall.
   Change the type of gsi in the structure privcmd_gsi_from_dev from 
int to u32.
* patch#8: Modification of the tools part of patches#4 and #5 of v10, use 
privcmd_gsi_from_dev to get
   gsi, and use XEN_DOMCTL_gsi_permission to grant gsi.
   Change the hard-coded 0 to use LIBXL_TOOLSTACK_DOMID.
   Add libxl__arch_hvm_map_gsi to distinguish x86 related 
implementations.
   Add a list pcidev_pirq_list to record the relationship between sbdf 
and pirq, which can be
   used to obtain the corresponding pirq when unmap PIRQ.


Best regards,
Jiqian Chen



v9->v10 changes:
* patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code 
style.
* patch#3: Modified the description in the commit message, changing "it calls" 
to "it will need to call",
   indicating that there will be new codes on the kernel side that will 
call PHYSDEVOP_setup_gsi.
   Also added an explanation of why the interrupt of passthrough device 
does not work if gsi is not
   registered.
* patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate 
x86 code in libxl_pci.c.
* patch#5: Modified the commit message to further describe the purpose of 
adding XEN_DOMCTL_gsi_permission.
   Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission 
directly in pci_add_dm_done.
   Added a check for all zeros in the padding field in 
XEN_DOMCTL_gsi_permission, and used currd
   instead of current->domain.
   In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of 
the original new code, and
   error handling for irq0 was added.
   Deleted the extra spaces in the upper and lower lines of the struct 
xen_domctl_gsi_permission
   definition.
All patches have modified signatures as follows:
Signed-off-by: Jiqian Chen  means I am the author.
Signed-off-by: Huang Rui  means Rui sent them to upstream 
firstly.
Signed-off-by: Jiqian Chen  means I take continue to 
upstream.


v8->v9 changes:
* patch#1: Move pcidevs_unlock below write_lock, and remove 
"ASSERT(pcidevs_locked());"
   from vpci_reset_device_state;
   Add pci_device_state_reset_type to distinguish the reset types.
* patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this 
hypercall.
   Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == 
DOMID_SELF" to
   "d == current->domian".
* patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke 
in below.Although their return
   values are different, this difference is acceptable for the sake of 
code consistency
   if ( !is_hardware_domain(currd) )
   return -ENOSYS;
   break;
* patch#5: Change

[XEN PATCH v11 3/8] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-30 Thread Jiqian Chen
The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis don't get registered, it causes
the info of apic, pin and irq not be added into irq_2_pin list,
and the handler of irq_desc is not set, then when passthrough a
device, setting ioapic affinity and vector will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Clarify two questions:
First, why the gsi of devices belong to PVH dom0 can work?
Because when probe a driver to a normal device, it calls(on linux
kernel side) pci_device_probe-> request_threaded_irq->
irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen
side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept->
vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
So that the gsi can be registered.

Second, why the gsi of passthrough device can't work when dom0
is PVH?
Because when assign a device to passthrough, it uses pciback to
probe the device, and it calls pcistub_probe->pcistub_seize->
pcistub_init_device-> xen_pcibk_reset_device->
xen_pcibk_control_isr->isr_on, but isr_on is not set, so that the
fake IRQ handler is not installed, then the gsi isn't unmasked.
What's more, we can see on Xen side, the function
vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when
the gsi is unmasked, so that the gsi can't work for passthrough
device.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 03ada3c880bd..cfe82d0f96ed 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 return -ENOSYS;
 break;
 
+case PHYSDEVOP_setup_gsi:
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
-- 
2.34.1




[XEN PATCH v11 2/8] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-30 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
And add a new check to prevent (un)map when the subject domain
has no X86_EMU_USE_PIRQ flag.

So that the interrupt of a passthrough device can be
successfully mapped to pirq for domU with X86_EMU_USE_PIRQ flag
when dom0 is PVH

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  6 ++
 xen/arch/x86/physdev.c   | 14 ++
 2 files changed, 20 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0fab670a4871..03ada3c880bd 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 switch ( cmd )
 {
+/*
+* Only being permitted for management of other domains.
+* Further restrictions are enforced in do_physdev_op.
+*/
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index d6dd622952a9..a165f68225c1 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -323,6 +323,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
+/* Prevent mapping when the subject domain has no X86_EMU_USE_PIRQ */
+if ( is_hvm_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+
 ret = physdev_map_pirq(d, map.type, , , );
 
 rcu_unlock_domain(d);
@@ -346,6 +353,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
+/* Prevent unmapping when the subject domain has no X86_EMU_USE_PIRQ */
+if ( is_hvm_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+
 ret = physdev_unmap_pirq(d, unmap.pirq);
 
 rcu_unlock_domain(d);
-- 
2.34.1




[RFC XEN PATCH v11 7/8] tools: Add new function to get gsi from dev

2024-06-30 Thread Jiqian Chen
When passthrough a device to domU, QEMU and xl tools use its gsi
number to do pirq mapping, see QEMU code
xen_pt_realize->xc_physdev_map_pirq, and xl code
pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
from file /sys/bus/pci/devices//irq, that is wrong, because
irq is not equal with gsi, they are in different spaces, so pirq
mapping fails.

And in current codes, there is no method to get gsi for userspace.
For above purpose, add new function to get gsi, and the
corresponding ioctl is implemented on linux kernel side.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs to wait for the corresponding third patch on linux kernel side to 
be merged.
https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
This patch must be merged after the patch on linux kernel side
---
 tools/include/xen-sys/Linux/privcmd.h |  7 ++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/ctrl/xc_physdev.c  | 35 +++
 3 files changed, 44 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..4cf719102116 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_pcidev {
+   __u32 sbdf;
+   __u32 gsi;
+} privcmd_gsi_from_pcidev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: _hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_PCIDEV  \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_pcidev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..3720e22b399a 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index e9fcd755fa62..54edb0f3c0dc 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
 return rc;
 }
 
+int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)
+{
+int rc = -1;
+
+#if defined(__linux__)
+int fd;
+privcmd_gsi_from_pcidev_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = 0,
+};
+
+fd = open("/dev/xen/privcmd", O_RDWR);
+
+if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) {
+/* Fallback to /proc/xen/privcmd */
+fd = open("/proc/xen/privcmd", O_RDWR);
+}
+
+if (fd < 0) {
+PERROR("Could not obtain handle on privileged command interface");
+return rc;
+}
+
+rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, _gsi);
+close(fd);
+
+if (rc) {
+PERROR("Failed to get gsi from dev");
+} else {
+rc = dev_gsi.gsi;
+}
+#endif
+
+return rc;
+}
-- 
2.34.1




[XEN PATCH v11 5/8] x86/domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-30 Thread Jiqian Chen
Some type of domain don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq, it is not suitable for dom0 that doesn't have
PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the
permission of irq(translate from gsi) to dumU when dom0 has no
PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/domctl.c  | 33 ++
 xen/arch/x86/include/asm/io_apic.h |  2 ++
 xen/arch/x86/io_apic.c | 17 +++
 xen/arch/x86/mpparse.c |  3 +--
 xen/include/public/domctl.h|  8 
 xen/xsm/flask/hooks.c  |  1 +
 6 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9190e11faaa3..5f20febabbf2 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static int update_domain_cpu_policy(struct domain *d,
 xen_domctl_cpu_policy_t *xdpc)
@@ -237,6 +238,38 @@ long arch_do_domctl(
 break;
 }
 
+case XEN_DOMCTL_gsi_permission:
+{
+int irq;
+uint8_t mask = 1;
+unsigned int gsi = domctl->u.gsi_permission.gsi;
+bool allow = domctl->u.gsi_permission.allow_access;
+
+/* Check all bits and pads are zero except lowest bit */
+ret = -EINVAL;
+if ( domctl->u.gsi_permission.allow_access & ( !mask ) )
+goto gsi_permission_out;
+for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
+if ( domctl->u.gsi_permission.pad[i] )
+goto gsi_permission_out;
+
+if ( gsi >= nr_irqs_gsi || ( irq = gsi_2_irq(gsi) ) < 0 )
+goto gsi_permission_out;
+
+ret = -EPERM;
+if ( !irq_access_permitted(currd, irq) ||
+ xsm_irq_permission(XSM_HOOK, d, irq, allow) )
+goto gsi_permission_out;
+
+if ( allow )
+ret = irq_permit_access(d, irq);
+else
+ret = irq_deny_access(d, irq);
+
+gsi_permission_out:
+break;
+}
+
 case XEN_DOMCTL_getpageframeinfo3:
 {
 unsigned int num = domctl->u.getpageframeinfo3.num;
diff --git a/xen/arch/x86/include/asm/io_apic.h 
b/xen/arch/x86/include/asm/io_apic.h
index 78268ea8f666..7e86d8337758 100644
--- a/xen/arch/x86/include/asm/io_apic.h
+++ b/xen/arch/x86/include/asm/io_apic.h
@@ -213,5 +213,7 @@ unsigned highest_gsi(void);
 
 int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
 int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
+int mp_find_ioapic(int gsi);
+int gsi_2_irq(int gsi);
 
 #endif
diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index d73108558e09..d54283955a60 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
 return irq;
 }
 
+int gsi_2_irq(int gsi)
+{
+int ioapic, pin, irq;
+
+ioapic = mp_find_ioapic(gsi);
+if ( ioapic < 0 )
+return -EINVAL;
+
+pin = gsi - io_apic_gsi_base(ioapic);
+
+irq = apic_pin_2_gsi_irq(ioapic, pin);
+if ( irq <= 0 )
+return -EINVAL;
+
+return irq;
+}
+
 static inline int IO_APIC_irq_trigger(int irq)
 {
 int apic, idx, pin;
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index d8ccab2449c6..c95da0de5770 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -841,8 +841,7 @@ static struct mp_ioapic_routing {
 } mp_ioapic_routing[MAX_IO_APICS];
 
 
-static int mp_find_ioapic (
-   int gsi)
+int mp_find_ioapic(int gsi)
 {
unsigned inti;
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 2a49fe46ce25..f7ae8b19d27d 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -464,6 +464,12 @@ struct xen_domctl_irq_permission {
 uint8_t pad[3];
 };
 
+/* XEN_DOMCTL_gsi_permission */
+struct xen_domctl_gsi_permission {
+uint32_t gsi;
+uint8_t allow_access;/* flag to specify enable/disable of x86 gsi 
access */
+uint8_t pad[3];
+};
 
 /* XEN_DOMCTL_iomem_permission */
 struct xen_domctl_iomem_permission {
@@ -1306,6 +1312,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_get_paging_mempool_size   85
 #define XEN_DOMCTL_set_paging_mempool_size   86
 #define XEN_DOMCTL_dt_overlay87
+#define XEN_DOMCTL_gsi_permission88
 #define XEN_DOMCTL_gdbsx_guestmemio1000
 #define XEN_DOMCTL_g

[XEN PATCH v11 4/8] x86/physdev: Return pirq that irq was already mapped to

2024-06-30 Thread Jiqian Chen
allocate_pirq is to allocate a pirq for a irq, and it supports to
allocate a free pirq(pirq parameter is <0) or a specific pirq (pirq
parameter is > 0).

For current code, it has four usecases.

First, pirq>0 and current_pirq>0, (current_pirq means if irq already
has a mapped pirq), if pirq==current_pirq means the irq already has
mapped to the pirq expected by the caller, it successes, if
pirq!=current_pirq means the pirq expected by the caller has been
mapped into other irq, it fails.

Second, pirq>0 and current_pirq<0, it means pirq expected by the
caller has not been allocated to any irqs, so it can be allocated to
caller, it successes.

Third, pirq<0 and current_pirq<0, it means caller want to allocate a
free pirq for irq and irq has no mapped pirq, it successes.

Fourth, pirq<0 and current_pirq>0, it means caller want to allocate
a free pirq for irq but irq has a mapped pirq, then it returns the
negative pirq, so it fails.

The problem is in Fourth, since the irq has a mapped pirq(current_pirq),
and the caller doesn't want to allocate a specified pirq to the irq, so
the current_pirq should be returned directly in this case, indicating
that the allocation is successful. That can help caller to success when
caller just want to allocate a free pirq but doesn't know if the irq
already has a mapped pirq or not.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/irq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 9a611c79e024..5ccca1646eb1 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -2897,6 +2897,8 @@ static int allocate_pirq(struct domain *d, int index, int 
pirq, int irq,
 d->domain_id, index, pirq, current_pirq);
 if ( current_pirq < 0 )
 return -EBUSY;
+else
+return current_pirq;
 }
 else if ( type == MAP_PIRQ_TYPE_MULTI_MSI )
 {
-- 
2.34.1




[XEN PATCH v11 1/8] xen/vpci: Clear all vpci status of device

2024-06-30 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 58 
 xen/drivers/vpci/vpci.c  | 10 +++
 xen/include/public/physdev.h | 20 +
 xen/include/xen/vpci.h   |  8 +
 5 files changed, 97 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..19a755d1c127 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,63 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset:
+{
+struct pci_device_state_reset dev_reset;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+ret = -EOPNOTSUPP;
+if ( !is_pci_passthrough_enabled() )
+break;
+
+ret = -EFAULT;
+if ( copy_from_guest(_reset, arg, 1) != 0 )
+break;
+
+sbdf = PCI_SBDF(dev_reset.dev.seg,
+dev_reset.dev.bus,
+dev_reset.dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(>domain->pci_lock);
+pcidevs_unlock();
+/* Implement FLR, other reset types may be implemented in future */
+switch ( dev_reset.reset_type )
+{
+case PCI_DEVICE_STATE_RESET_COLD:
+case PCI_DEVICE_STATE_RESET_WARM:
+case PCI_DEVICE_STATE_RESET_HOT:
+case PCI_DEVICE_STATE_RESET_FLR:
+{
+ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
+if ( ret )
+dprintk(XENLOG_ERR,
+"%pp: failed to reset vPCI device state\n", );
+break;
+}
+
+default:
+ret = -EOPNOTSUPP;
+break;
+}
+write_unlock(>domain->pci_lock);
+
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..7e914d1eff9f 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev,
+uint32_t reset_type)
+{
+ASSERT(rw_is_write_locked(>domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..ddbcdfb05248 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
@@ -305,6 +312,19 @@ struct physdev_pci_device {
 typedef struct physdev_pci_device physdev_pci_device_t;
 DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
 
+struct pci_device_state_reset {
+physdev_pci_device_t dev;
+#define _PCI_DEVICE_STATE_RESET_COLD 0
+#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
+#define _PCI_DEVICE_STATE_RESET_WARM 1
+#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)

[XEN PATCH v10 4/5] tools: Add new function to get gsi from dev

2024-06-17 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
irq number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use its gsi number
to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq, so it will fail when mapping.
And in current codes, there is no method to get gsi for
userspace.

For above purpose, add new function to get gsi. And call this
function before xc_physdev_(un)map_pirq

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs review and needs to wait for the corresponding third patch on 
linux kernel side to be merged.
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +
 tools/include/xencall.h   |  2 ++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/call/core.c|  5 
 tools/libs/call/libxencall.map|  2 ++
 tools/libs/call/linux.c   | 15 +++
 tools/libs/call/private.h |  9 +++
 tools/libs/ctrl/xc_physdev.c  |  4 +++
 tools/libs/light/Makefile |  2 +-
 tools/libs/light/libxl_pci.c  | 38 +++
 10 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..977f1a058797 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_dev {
+   __u32 sbdf;
+   int gsi;
+} privcmd_gsi_from_dev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: _hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_DEV \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xencall.h b/tools/include/xencall.h
index fc95ed0fe58e..750aab070323 100644
--- a/tools/include/xencall.h
+++ b/tools/include/xencall.h
@@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
  uint64_t arg1, uint64_t arg2, uint64_t arg3,
  uint64_t arg4, uint64_t arg5);
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
+
 /* Variant(s) of the above, as needed, returning "long" instead of "int". */
 long xencall2L(xencall_handle *xcall, unsigned int op,
uint64_t arg1, uint64_t arg2);
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..a0381f74d24b 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1aefa..6dae50c9a6ba 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
 return osdep_hypercall(xcall, );
 }
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
+{
+return osdep_oscall(xcall, sbdf);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
index d18a3174e9dc..b92a0b5dc12c 100644
--- a/tools/libs/call/libxencall.map
+++ b/tools/libs/call/libxencall.map
@@ -10,6 +10,8 @@ VERS_1.0 {
xencall4;
xencall5;
 
+   xen_oscall_gsi_from_dev;
+
xencall_alloc_buffer;
xencall_free_buffer;
xencall_alloc_buffer_pages;
diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
index 6d588e6bea8f..92c740e176f2 100644
--- a/tools/libs/call/linux.c
+++ b/tools/libs/call/linux.c
@@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, 
privcmd_hypercall_t *hypercall)
 return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
 }
 
+int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
+{
+privcmd_gsi_from_dev_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = -1,
+};
+
+if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, _gsi)) {
+PERROR("failed to get gsi from dev");
+retu

[XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-17 Thread Jiqian Chen
Some type of domain don't have PIRQs, like PVH, it doesn't do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest base on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
irq on Xen side.
What's more, current hypercall XEN_DOMCTL_irq_permission requires
passing in pirq, it is not suitable for dom0 that doesn't have
PIRQs.

So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the
permission of irq(translate from gsi) to dumU when dom0 has no
PIRQs.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
RFC: it needs review and needs to wait for the corresponding third patch on 
linux kernel side to be merged.
---
 tools/include/xenctrl.h|  5 +++
 tools/libs/ctrl/xc_domain.c| 15 +++
 tools/libs/light/libxl_pci.c   | 67 +++---
 xen/arch/x86/domctl.c  | 43 +++
 xen/arch/x86/include/asm/io_apic.h |  2 +
 xen/arch/x86/io_apic.c | 17 
 xen/arch/x86/mpparse.c |  3 +-
 xen/include/public/domctl.h|  8 
 xen/xsm/flask/hooks.c  |  1 +
 9 files changed, 153 insertions(+), 8 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index a0381f74d24b..f3feb6848e25 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, );
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, );
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 376f91759ac6..f027f22c0028 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1431,6 +1431,9 @@ static void pci_add_dm_done(libxl__egc *egc,
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, );
+#ifdef CONFIG_X86
+xc_domaininfo_t info;
+#endif
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
 rc = ERROR_FAIL;
 goto out;
 }
-r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+#ifdef CONFIG_X86
+/* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission */
+r = xc_domain_getinfo_single(ctx->xch, 0, );
 if (r < 0) {
-LOGED(ERROR, domainid,
-  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
+LOGED(ERROR, domainid, "getdomaininfo failed (error=%d)", errno);
 fclose(f);
 rc = ERROR_FAIL;
 goto out;
 }
+if (info.flags & XEN_DOMINF_hvm_guest &&
+!(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) &&
+gsi > 0) {
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
+if (r < 0) {
+LOGED(ERROR, domainid,
+"xc_domain_gsi_permission gsi=%d (error=%d)", gsi, errno);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
+}
+else
+#endif
+{
+r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+if (r < 0) {
+LOGED(ERROR, domainid,
+"xc_domain_irq_permission irq=%d (error=%d)", irq, errno);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
+}
 }
 fclose(f);
 
@@ -2200,6 +2228,10 @@ static void pci_rem

[XEN PATCH v10 0/5] Support device passthrough when dom0 is PVH on Xen

2024-06-17 Thread Jiqian Chen
Hi All,
This is v10 series to support passthrough when dom0 is PVH
v9->v10 changes:
* patch#2: Indent the comments above PHYSDEVOP_map_pirq according to the code 
style.
* patch#3: Modified the description in the commit message, changing "it calls" 
to "it will need to call",
   indicating that there will be new codes on the kernel side that will 
call PHYSDEVOP_setup_gsi.
   Also added an explanation of why the interrupt of passthrough device 
does not work if gsi is not
   registered.
* patch#4: Added define for CONFIG_X86 in tools/libs/light/Makefile to isolate 
x86 code in libxl_pci.c.
* patch#5: Modified the commit message to further describe the purpose of 
adding XEN_DOMCTL_gsi_permission.
   Deleted pci_device_set_gsi and called XEN_DOMCTL_gsi_permission 
directly in pci_add_dm_done.
   Added a check for all zeros in the padding field in 
XEN_DOMCTL_gsi_permission, and used currd
   instead of current->domain.
   In the function gsi_2_irq, apic_pin_2_gsi_irq was used instead of 
the original new code, and
   error handling for irq0 was added.
   Deleted the extra spaces in the upper and lower lines of the struct 
xen_domctl_gsi_permission
   definition.
All patches have modified signatures as follows:
Signed-off-by: Jiqian Chen  means I am the author.
Signed-off-by: Huang Rui  means Rui sent them to upstream 
firstly.
Signed-off-by: Jiqian Chen  means I take continue to 
upstream.


Best regards,
Jiqian Chen



v8->v9 changes:
* patch#1: Move pcidevs_unlock below write_lock, and remove 
"ASSERT(pcidevs_locked());"
   from vpci_reset_device_state;
   Add pci_device_state_reset_type to distinguish the reset types.
* patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this 
hypercall.
   Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == 
DOMID_SELF" to
   "d == current->domian".
* patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke 
in below.Although their return
   values are different, this difference is acceptable for the sake of 
code consistency
   if ( !is_hardware_domain(currd) )
   return -ENOSYS;
   break;
* patch#5: Change the commit message to describe more why we need this new 
hypercall.
   Add comment above "if ( is_pv_domain(current->domain) || 
has_pirq(current->domain) )" to explain
   why we need this check.
   Add gsi_2_irq to transform gsi to irq, instead of considering gsi == 
irq.
   Add explicit padding to struct xen_domctl_gsi_permission.


v7->v8 changes:
* patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when 
guest doesn't use pirq.
   That check was missed in the previous version.
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi by passing in the sbdf of pci device.
* patch#5: Remove the parameter "is_gsi", when there exist gsi, in 
pci_add_dm_done use a new function
   pci_device_set_gsi to do map_pirq and grant permission. That gets 
more intuitive code logic.


v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.


v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old 
function vpci_remove_device,
   vpci_add_handlers to vpci_deassign_device, vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to 
be understood, which to use
   gsi by default and be compatible with older kernel versions to 
continue to use irq


v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if 
the caller has PIRQ flag, and
   just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on 
it. And add the handling of errno
   and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall 
XEN_DOMCTL_gsi_permission to grant gsi


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move 
printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi 

[XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-17 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when subject domain has no
PIRQ flag.

So that domU with PIRQ flag can success to map pirq for
passthrough devices even dom0 has no PIRQ flag.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  6 ++
 xen/arch/x86/physdev.c   | 14 ++
 2 files changed, 20 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0fab670a4871..03ada3c880bd 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 switch ( cmd )
 {
+/*
+* Only being permitted for management of other domains.
+* Further restrictions are enforced in do_physdev_op.
+*/
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index d6dd622952a9..f38cc22c872e 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -323,6 +323,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
+/* Prevent self-map when currd has no X86_EMU_USE_PIRQ flag */
+if ( is_hvm_domain(d) && !has_pirq(d) && d == currd )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+
 ret = physdev_map_pirq(d, map.type, , , );
 
 rcu_unlock_domain(d);
@@ -346,6 +353,13 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 if ( !d )
 break;
 
+/* Prevent self-unmap when currd has no X86_EMU_USE_PIRQ flag */
+if ( is_hvm_domain(d) && !has_pirq(d) && d == currd )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+
 ret = physdev_unmap_pirq(d, unmap.pirq);
 
 rcu_unlock_domain(d);
-- 
2.34.1




[XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-17 Thread Jiqian Chen
The gsi of a passthrough device must be configured for it to be
able to be mapped into a hvm domU.
But When dom0 is PVH, the gsis don't get registered, it causes
the info of apic, pin and irq not be added into irq_2_pin list,
and the handler of irq_desc is not set, then when passthrough a
device, setting ioapic affinity and vector will fail.

To fix above problem, on Linux kernel side, a new code will
need to call PHYSDEVOP_setup_gsi for passthrough devices to
register gsi when dom0 is PVH.

So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
purpose.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
The code link that will call this hypercall on linux kernel side is as follows:
https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 03ada3c880bd..cfe82d0f96ed 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 return -ENOSYS;
 break;
 
+case PHYSDEVOP_setup_gsi:
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
-- 
2.34.1




[XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device

2024-06-17 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 43 
 xen/drivers/vpci/vpci.c  |  9 
 xen/include/public/physdev.h |  7 ++
 xen/include/xen/pci.h| 16 ++
 xen/include/xen/vpci.h   |  6 +
 6 files changed, 82 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..1cce508a73b1 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,11 +2,17 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
 #endif
 
+static const struct pci_device_state_reset_method
+pci_device_state_reset_methods[] = {
+[ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
+};
+
 ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 ret_t ret;
@@ -67,6 +73,43 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct pci_device_state_reset dev_reset;
+struct physdev_pci_device *dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(_reset, arg, 1) != 0 )
+break;
+dev = _reset.dev;
+sbdf = PCI_SBDF(dev->seg, dev->bus, dev->devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(>domain->pci_lock);
+pcidevs_unlock();
+ret = 
pci_device_state_reset_methods[dev_reset.reset_type].reset_fn(pdev);
+write_unlock(>domain->pci_lock);
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset vPCI device state\n", 
);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..ff67c2550ccb 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(rw_is_write_locked(>domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..a71da5892e5f 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 63e49f0117e9..376981f9da98 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -156,6 +156,22 @@ struct pci_dev {
 struct vpci *vpci;
 };
 
+struct pci_device_state_reset_method {
+int (*reset_fn)(struct pci_dev *pdev);
+};
+
+enum pci_device_state_reset_type {
+DEVICE_RESET_FLR,
+DEVICE_RESET_COLD,
+DEVICE_RESET_WARM,
+DEVICE_RESET_HOT,
+};
+
+struct pci_device_state_reset {
+struct physdev_pci_device dev;
+enum pci_device_state_reset_type reset_type;
+};
+
 #define for_each_pdev(domain, pdev) \
 list_for_each_entry(pdev, &(domain)-&g

[XEN PATCH v9 0/5] Support device passthrough when dom0 is PVH on Xen

2024-06-07 Thread Jiqian Chen
Hi All,
This is v9 series to support passthrough when dom0 is PVH
v8->v9 changes:
* patch#1: Move pcidevs_unlock below write_lock, and remove 
"ASSERT(pcidevs_locked());" from vpci_reset_device_state;
   Add pci_device_state_reset_type to distinguish the reset types.
* patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this 
hypercall.
   Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid == 
DOMID_SELF" to "d == current->domian".
* patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same checke 
in below.
* patch#5: Change the commit message to describe more why we need this new 
hypercall.
   Add comment above "if ( is_pv_domain(current->domain) || 
has_pirq(current->domain) )" to explain why we need this check.
   Add gsi_2_irq to transform gsi to irq, instead of 
considering gsi == irq.
   Add explicit padding to struct xen_domctl_gsi_permission.


Best regards,
Jiqian Chen



v7->v8 changes:
* patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when 
guest doesn't use pirq.
   That check was missed in the previous version.
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi by passing in the sbdf of pci device.
* patch#5: Remove the parameter "is_gsi", when there exist gsi, in 
pci_add_dm_done use a new function
   pci_device_set_gsi to do map_pirq and grant permission. That gets 
more intuitive code logic.


v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.


v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old 
function vpci_remove_device,
   vpci_add_handlers to vpci_deassign_device, vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to 
be understood, which to use
   gsi by default and be compatible with older kernel versions to 
continue to use irq


v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if 
the caller has PIRQ flag, and
   just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on 
it. And add the handling of errno
   and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall 
XEN_DOMCTL_gsi_permission to grant gsi


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move 
printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH 
is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to 
grant irq permission in
   XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi 
sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-jiqian.c...@amd.com/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete 
pci_reset_device_state; add
   xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; 
add description for
   PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel 
side(that it will do setup_gsi and
   map_pirq when assigning a device to passthrough), add 
PHYSDEVOP_setup_gsi for PVH dom0, and we need
   to support self mapping.
* patch#3: du to changes in the implementation of the second patch on kernel 
side(that adds a new sysfs for gsi
   instead of a new syscall), so read gsi number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-jiqian.c...@amd.com/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthr

[XEN PATCH v9 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-07 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when subject domain has no
PIRQ flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  6 ++
 xen/arch/x86/physdev.c   | 24 
 2 files changed, 30 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 0fab670a4871..fa5d50a0dd22 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 switch ( cmd )
 {
+/*
+ * Only being permitted for management of other domains.
+ * Further restrictions are enforced in do_physdev_op.
+ */
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 7efa17cf4c1e..61999882f836 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_map_pirq: {
 physdev_map_pirq_t map;
 struct msi_info msi;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( d == NULL )
+return -ESRCH;
+/* Prevent self-map when domain has no X86_EMU_USE_PIRQ flag */
+if ( is_hvm_domain(d) && !has_pirq(d) && d == current->domain )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 switch ( map.type )
 {
 case MAP_PIRQ_TYPE_MSI_SEG:
@@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 case PHYSDEVOP_unmap_pirq: {
 struct physdev_unmap_pirq unmap;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(unmap.domid);
+if ( d == NULL )
+return -ESRCH;
+/* Prevent self-unmap when domain has no X86_EMU_USE_PIRQ flag */
+if ( is_hvm_domain(d) && !has_pirq(d) && d == current->domain )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
 break;
 }
-- 
2.34.1




[RFC XEN PATCH v9 4/5] tools: Add new function to get gsi from dev

2024-06-07 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
irq number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use its gsi number
to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq, so it will fail when mapping.
And in current codes, there is no method to get gsi for
userspace.

For above purpose, add new function to get gsi. And call this
function before xc_physdev_(un)map_pirq

Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
RFC: it needs review and needs to wait for the corresponding third patch on 
linux kernel side to be merged.
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +++
 tools/include/xencall.h   |  2 ++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/call/core.c|  5 +
 tools/libs/call/libxencall.map|  2 ++
 tools/libs/call/linux.c   | 15 +++
 tools/libs/call/private.h |  9 +
 tools/libs/ctrl/xc_physdev.c  |  4 
 tools/libs/light/libxl_pci.c  | 23 +++
 9 files changed, 69 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..977f1a058797 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_dev {
+   __u32 sbdf;
+   int gsi;
+} privcmd_gsi_from_dev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: _hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_DEV \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xencall.h b/tools/include/xencall.h
index fc95ed0fe58e..750aab070323 100644
--- a/tools/include/xencall.h
+++ b/tools/include/xencall.h
@@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
  uint64_t arg1, uint64_t arg2, uint64_t arg3,
  uint64_t arg4, uint64_t arg5);
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
+
 /* Variant(s) of the above, as needed, returning "long" instead of "int". */
 long xencall2L(xencall_handle *xcall, unsigned int op,
uint64_t arg1, uint64_t arg2);
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 9ceca0cffc2f..a0381f74d24b 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1aefa..6dae50c9a6ba 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
 return osdep_hypercall(xcall, );
 }
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
+{
+return osdep_oscall(xcall, sbdf);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
index d18a3174e9dc..b92a0b5dc12c 100644
--- a/tools/libs/call/libxencall.map
+++ b/tools/libs/call/libxencall.map
@@ -10,6 +10,8 @@ VERS_1.0 {
xencall4;
xencall5;
 
+   xen_oscall_gsi_from_dev;
+
xencall_alloc_buffer;
xencall_free_buffer;
xencall_alloc_buffer_pages;
diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
index 6d588e6bea8f..92c740e176f2 100644
--- a/tools/libs/call/linux.c
+++ b/tools/libs/call/linux.c
@@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, 
privcmd_hypercall_t *hypercall)
 return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
 }
 
+int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
+{
+privcmd_gsi_from_dev_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = -1,
+};
+
+if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, _gsi)) {
+PERROR("failed to get gsi from dev");
+return -1;
+}
+
+return dev_gsi.gsi;
+}
+
 static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages)
 {
 

[RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-07 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, it do not do
PHYSDEVOP_map_pirq for each gsi. When passthrough a device
to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed at
domain_pirq_to_irq, because PVH has no mapping of gsi, pirq
and irq on Xen side.

What's more, current hypercall XEN_DOMCTL_irq_permission require
passing in pirq and grant the access of irq, it is not suitable
for dom0 that has no PIRQ flag, because passthrough a device
needs gsi and grant the corresponding irq to guest. So, add a
new hypercall to grant gsi permission when dom0 is not PV or dom0
has not PIRQ flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
RFC: it needs review and needs to wait for the corresponding third patch on 
linux kernel side to be merged.
---
 tools/include/xenctrl.h|  5 +++
 tools/libs/ctrl/xc_domain.c| 15 +++
 tools/libs/light/libxl_pci.c   | 72 +++---
 xen/arch/x86/domctl.c  | 38 
 xen/arch/x86/include/asm/io_apic.h |  2 +
 xen/arch/x86/io_apic.c | 21 +
 xen/arch/x86/mpparse.c |  3 +-
 xen/include/public/domctl.h| 10 +
 xen/xsm/flask/hooks.c  |  1 +
 9 files changed, 149 insertions(+), 18 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index a0381f74d24b..f3feb6848e25 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, );
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, );
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 7e44d4c3ae2b..b8ec37d8d7e3 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1412,6 +1412,37 @@ static bool pci_supp_legacy_irq(void)
 #define PCI_SBDF(seg, bus, devfn) \
 uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
 
+static int pci_device_set_gsi(libxl_ctx *ctx,
+  libxl_domid domid,
+  libxl_device_pci *pci,
+  bool map,
+  int *gsi_back)
+{
+int r, gsi, pirq;
+uint32_t sbdf;
+
+sbdf = PCI_SBDF(pci->domain, pci->bus, (PCI_DEVFN(pci->dev, pci->func)));
+r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
+*gsi_back = r;
+if (r < 0)
+return r;
+
+gsi = r;
+pirq = r;
+if (map)
+r = xc_physdev_map_pirq(ctx->xch, domid, gsi, );
+else
+r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq);
+if (r)
+return r;
+
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map);
+if (r && errno == EOPNOTSUPP)
+r = xc_domain_irq_permission(ctx->xch, domid, pirq, map);
+
+return r;
+}
+
 static void pci_add_dm_done(libxl__egc *egc,
 pci_add_state *pas,
 int rc)
@@ -1424,10 +1455,10 @@ static void pci_add_dm_done(libxl__egc *egc,
 unsigned long long start, end, flags, size;
 int irq, i;
 int r;
-uint32_t sbdf;
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, );
+int gsi;
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1485,6 +1516,19 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
+
+r = pci_device_set_gsi(ctx, domid, pci, 1, );
+if (gsi >= 0) {
+if (r < 0) {
+rc = ERROR_FAIL;
+LOGED(ERROR, domainid,
+  "pci_device_s

[XEN PATCH v9 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-07 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
The code link that will call this hypercall on linux kernel side is as follows
https://lore.kernel.org/lkml/20240607075109.126277-3-jiqian.c...@amd.com/T/#u
---
 xen/arch/x86/hvm/hypercall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index fa5d50a0dd22..164f4eefa043 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 return -ENOSYS;
 break;
 
+case PHYSDEVOP_setup_gsi:
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
-- 
2.34.1




[XEN PATCH v9 1/5] xen/vpci: Clear all vpci status of device

2024-06-07 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 43 
 xen/drivers/vpci/vpci.c  |  9 
 xen/include/public/physdev.h |  7 ++
 xen/include/xen/pci.h| 16 ++
 xen/include/xen/vpci.h   |  6 +
 6 files changed, 82 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 7fb3136f0c7c..0fab670a4871 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..1cce508a73b1 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,11 +2,17 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
 #endif
 
+static const struct pci_device_state_reset_method
+pci_device_state_reset_methods[] = {
+[ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
+};
+
 ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 ret_t ret;
@@ -67,6 +73,43 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct pci_device_state_reset dev_reset;
+struct physdev_pci_device *dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(_reset, arg, 1) != 0 )
+break;
+dev = _reset.dev;
+sbdf = PCI_SBDF(dev->seg, dev->bus, dev->devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(>domain->pci_lock);
+pcidevs_unlock();
+ret = 
pci_device_state_reset_methods[dev_reset.reset_type].reset_fn(pdev);
+write_unlock(>domain->pci_lock);
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset vPCI device state\n", 
);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 1e6aa5d799b9..ff67c2550ccb 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(rw_is_write_locked(>domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..a71da5892e5f 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 63e49f0117e9..376981f9da98 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -156,6 +156,22 @@ struct pci_dev {
 struct vpci *vpci;
 };
 
+struct pci_device_state_reset_method {
+int (*reset_fn)(struct pci_dev *pdev);
+};
+
+enum pci_device_state_reset_type {
+DEVICE_RESET_FLR,
+DEVICE_RESET_COLD,
+DEVICE_RESET_WARM,
+DEVICE_RESET_HOT,
+};
+
+struct pci_device_state_reset {
+struct physdev_pci_device dev;
+enum pci_device_state_reset_type reset_type;
+};
+
 #define for_each_pdev(domain, pdev) \
 list_for_each_entry(pdev, &(domain)->pdev_list, domain_list)

[RFC KERNEL PATCH v8 2/3] xen/pvh: Setup gsi for passthrough device

2024-06-07 Thread Jiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
RFC: it need to wait for the corresponding third patch on xen side to be merged.
---
 arch/x86/xen/enlighten_pvh.c   | 23 ++
 drivers/acpi/pci_irq.c |  2 +-
 drivers/xen/acpi.c | 50 ++
 drivers/xen/xen-pciback/pci_stub.c | 21 +
 include/linux/acpi.h   |  1 +
 include/xen/acpi.h | 10 ++
 6 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 27a2a02ef8fb..6caadf9c00ab 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -4,6 +4,7 @@
 #include 
 
 #include 
+#include 
 
 #include 
 #include 
@@ -27,6 +28,28 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+#ifdef CONFIG_XEN_DOM0
+int xen_pvh_setup_gsi(int gsi, int trigger, int polarity)
+{
+   int ret;
+   struct physdev_setup_gsi setup_gsi;
+
+   setup_gsi.gsi = gsi;
+   setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1);
+   setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   ret = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, _gsi);
+   if (ret == -EEXIST) {
+   xen_raw_printk("Already setup the GSI :%d\n", gsi);
+   ret = 0;
+   } else if (ret)
+   xen_raw_printk("Fail to setup GSI (%d)!\n", gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_setup_gsi);
+#endif
+
 void __init xen_pvh_init(struct boot_params *boot_params)
 {
u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c
index 6893c79fd2a1..9e2096524fbc 100644
--- a/drivers/xen/acpi.c
+++ b/drivers/xen/acpi.c
@@ -30,6 +30,7 @@
  * IN THE SOFTWARE.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -75,3 +76,52 @@ int xen_acpi_notify_hypervisor_extended_sleep(u8 sleep_state,
return xen_acpi_notify_hypervisor_state(sleep_state, val_a,
val_b, true);
 }
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;
+};
+
+int xen_acpi_get_gsi_info(struct pci_dev *dev,
+ int *gsi_out,
+ int *trigger_out,
+ int *polarity_out)
+{
+   int gsi;
+   u8 pin;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (!dev || !gsi_out || !trigger_out || !polarity_out)
+   return -EINVAL;
+
+   pin = dev->pin;
+   if (!pin)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->index,
+, ,
+NULL);
+   else
+   gsi = entry->index;
+   } else
+   gsi = -1;
+
+   if (gsi < 0)
+   return -EINVAL;
+
+   *gsi_out = gsi;
+   *trigger_out = trigger;
+   *polarity_out = polarity;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(xen_acpi_get_gsi_info);
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 73062e531c34..6b22e45188f5 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -21,6 +21,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_XEN_ACPI
+#include 
+#endif
 #include 
 #include 
 #include "pciback.h"
@@ -367,6 +370,9 @@ static int pcistub_match(struct pci_dev *dev)
 static int pcistub_init_device(struct pci_dev *dev)
 {
struct xen_pcibk_dev_data *dev_data;
+#ifdef CONFIG_XEN_ACPI
+   int gsi

[RFC KERNEL PATCH v8 3/3] xen/privcmd: Add new syscall to get gsi from dev

2024-06-07 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi 28,
it causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use device's gsi
number to do pirq mapping, but the gsi number is got from
file /sys/bus/pci/devices//irq, irq!= gsi, so it will
fail when mapping.
And in current linux codes, there is no method to get gsi
for userspace.

For above purpose, record gsi of pcistub devices when init
pcistub and add a new syscall into privcmd to let userspace
can get gsi when they have a need.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
RFC: it need review and need to wait for previous patch of this series to be 
merged.
---
 drivers/xen/privcmd.c  | 28 ++
 drivers/xen/xen-pciback/pci_stub.c | 38 +++---
 include/uapi/xen/privcmd.h |  7 ++
 include/xen/acpi.h |  9 +++
 4 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 67dfa4778864..5809b3168f25 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -45,6 +45,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_XEN_ACPI
+#include 
+#endif
 
 #include "privcmd.h"
 
@@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
return rc;
 }
 
+static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
+{
+#ifdef CONFIG_XEN_ACPI
+   struct privcmd_gsi_from_dev kdata;
+
+   if (copy_from_user(, udata, sizeof(kdata)))
+   return -EFAULT;
+
+   kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
+   if (kdata.gsi == -1)
+   return -EINVAL;
+
+   if (copy_to_user(udata, , sizeof(kdata)))
+   return -EFAULT;
+
+   return 0;
+#else
+   return -EINVAL;
+#endif
+}
+
 #ifdef CONFIG_XEN_PRIVCMD_EVENTFD
 /* Irqfd support */
 static struct workqueue_struct *irqfd_cleanup_wq;
@@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file,
ret = privcmd_ioctl_ioeventfd(file, udata);
break;
 
+   case IOCTL_PRIVCMD_GSI_FROM_DEV:
+   ret = privcmd_ioctl_gsi_from_dev(file, udata);
+   break;
+
default:
break;
}
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 6b22e45188f5..9d791d7a8098 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -56,6 +56,9 @@ struct pcistub_device {
 
struct pci_dev *dev;
struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */
+#ifdef CONFIG_XEN_ACPI
+   int gsi;
+#endif
 };
 
 /* Access to pcistub_devices & seized_devices lists and the initialize_devices
@@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
 
kref_init(>kref);
spin_lock_init(>lock);
+#ifdef CONFIG_XEN_ACPI
+   psdev->gsi = -1;
+#endif
 
return psdev;
 }
@@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct 
xen_pcibk_device *pdev,
return pci_dev;
 }
 
+#ifdef CONFIG_XEN_ACPI
+int pcistub_get_gsi_from_sbdf(unsigned int sbdf)
+{
+   struct pcistub_device *psdev;
+   int domain = (sbdf >> 16) & 0x;
+   int bus = PCI_BUS_NUM(sbdf);
+   int slot = PCI_SLOT(sbdf);
+   int func = PCI_FUNC(sbdf);
+
+   psdev = pcistub_device_find(domain, bus, slot, func);
+
+   if (!psdev)
+   return -1;
+
+   return psdev->gsi;
+}
+EXPORT_SYMBOL_GPL(pcistub_get_gsi_from_sbdf);
+#endif
+
 struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev,
int domain, int bus,
int slot, int func)
@@ -367,14 +392,20 @@ static int pcistub_match(struct pci_dev *dev)
return found;
 }
 
-static int pcistub_init_device(struct pci_dev *dev)
+static int pcistub_init_device(struct pcistub_device *psdev)
 {
struct xen_pcibk_dev_data *dev_data;
+   struct pci_dev *dev;
 #ifdef CONFIG_XEN_ACPI
int gsi, trigger, polarity;
 #endif
int err = 0;
 
+   if (!psdev)
+   return -EINVAL;
+
+   dev = psdev->dev;
+
dev_dbg(>dev, "initializing...\n");
 
/* The PCI backend is not intended to be a module (or to work with
@@ -448,6 +479,7 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(>dev, "Fail to get gsi info!\n");
goto config_release;
}
+   psdev->gsi = gsi;
 
if (xen_initial_domain() && xen_pvh_domain()) {
err

[RFC KERNEL PATCH v8 0/2] Support device passthrough when dom0 is PVH on Xen

2024-06-07 Thread Jiqian Chen
Hi All,
This is v8 series to support passthrough on Xen when dom0 is PVH.
v7->v8 change:
* patch#1: This is the patch#1 of v6, because it is reverted from the staging 
branch due to the API changes on Xen side.
   Add pci_device_state_reset_type_t to distinguish the reset types.
* patch#2: is the patch#1 of v7. Use CONFIG_XEN_ACPI instead of CONFIG_ACPI to 
wrap codes.
* patch#3: is the patch#2 of v7. In function privcmd_ioctl_gsi_from_dev, return 
-EINVAL when not confige CONFIG_XEN_ACPI.
   use PCI_BUS_NUM PCI_SLOT PCI_FUNC instead of open coding.


Best regards,
Jiqian Chen



v6->v7 change:
* the first patch of v6 was already merged into branch linux_next.
* patch#1: is the patch#2 of v6. move the implementation of function 
xen_acpi_get_gsi_info to
   file drivers/xen/acpi.c, that modification is more convenient for 
the subsequent
   patch to obtain gsi.
* patch#2: is the patch#3 of v6. add a new parameter "gsi" to struct 
pcistub_device and set
   gsi when pcistub initialize device. Then when userspace wants to get 
gsi by passing
   sbdf, we can return that gsi.


v5->v6 change:
* patch#3: change to add a new syscall to translate irq to gsi, instead adding 
a new gsi sysfs.


v4->v5 changes:
* patch#1: Add Reviewed-by Stefano
* patch#2: Add Reviewed-by Stefano
* patch#3: No changes


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new 
function
   pcistub_reset_device_state to wrap __pci_reset_function_locked and 
xen_reset_device_state,
   and call pcistub_reset_device_state in pci_stub.c
* patch#2: remove map_pirq from xen_pvh_passthrough_gsi


v2->v3 changes:
* patch#1: add condition to limit do xen_reset_device_state for no-pv domain in 
pcistub_init_device.
* patch#2: Abandoning previous implementations that call unmask_irq. To setup 
gsi and map pirq for
   passthrough device in pcistub_init_device.
* patch#3: Abandoning previous implementations that adds new syscall to get gsi 
from irq. To add a new
   sysfs for gsi, then userspace can get gsi number from sysfs.


Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will
call \u201cpcistub_init_device() -> pci_restore_state() -> 
pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword(), the pci
config write will trigger an io interrupt to bar_write() in the xen, but the 
bar->enabled was set before,
the write is not allowed now, and then when bar->Qemu config the passthrough 
device in xen_pt_realize(),
it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state
in pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to
reset the state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), 
v1 simply allow domU to
write pci bar, it does not comply with the design principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using 
gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. 
Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no 
X86_EMU_USE_PIRQ flag, it
will fail at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 
(at present dom0 is PVH).
The second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow 
PVH dom0 do
PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the 
has_pirq check
(xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a 
passthrough device for domU.
This fun

[RFC KERNEL PATCH v8 1/3] xen/pci: Add xen_reset_device_function_state

2024-06-07 Thread Jiqian Chen
When device on dom0 side has been reset, the vpci on Xen side
won't get notification, so that the cached state in vpci is
all out of date with the real device state.
To solve that problem, add a new function to clear all vpci
device state when device is reset on dom0 side.

And call that function in pcistub_init_device. Because when
using "pci-assignable-add" to assign a passthrough device in
Xen, it will reset passthrough device and the vpci state will
out of date, and then device will fail to restore bar state.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
RFC: it need to wait for the corresponding first patch on xen side to be merged.
---
 drivers/xen/pci.c  | 25 +
 drivers/xen/xen-pciback/pci_stub.c | 18 +++---
 include/xen/interface/physdev.h|  7 +++
 include/xen/pci.h  |  6 ++
 4 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 72d4e3f193af..57093e395982 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -177,6 +177,31 @@ static int xen_remove_device(struct device *dev)
return r;
 }
 
+enum pci_device_state_reset_type {
+   DEVICE_RESET_FLR,
+   DEVICE_RESET_COLD,
+   DEVICE_RESET_WARM,
+   DEVICE_RESET_HOT,
+};
+
+struct pci_device_state_reset {
+   struct physdev_pci_device dev;
+   enum pci_device_state_reset_type reset_type;
+};
+
+int xen_reset_device_function_state(const struct pci_dev *dev)
+{
+   struct pci_device_state_reset device = {
+   .dev.seg = pci_domain_nr(dev->bus),
+   .dev.bus = dev->bus->number,
+   .dev.devfn = dev->devfn,
+   .reset_type = DEVICE_RESET_FLR,
+   };
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, );
+}
+EXPORT_SYMBOL_GPL(xen_reset_device_function_state);
+
 static int xen_pci_notifier(struct notifier_block *nb,
unsigned long action, void *data)
 {
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index e34b623e4b41..73062e531c34 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
return psdev;
 }
 
+static int pcistub_reset_device_state(struct pci_dev *dev)
+{
+   __pci_reset_function_locked(dev);
+
+   if (!xen_pv_domain())
+   return xen_reset_device_function_state(dev);
+   else
+   return 0;
+}
+
 /* Don't call this directly as it's called by pcistub_device_put */
 static void pcistub_device_release(struct kref *kref)
 {
@@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref)
/* Call the reset function which does not take lock as this
 * is called from "unbind" which takes a device_lock mutex.
 */
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
if (dev_data &&
pci_load_and_free_saved_state(dev, _data->pci_saved_state))
dev_info(>dev, "Could not reload PCI state\n");
@@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 * (so it's ready for the next domain)
 */
device_lock_assert(>dev);
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
 
dev_data = pci_get_drvdata(dev);
ret = pci_load_saved_state(dev, dev_data->pci_saved_state);
@@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(>dev, "Could not store PCI conf saved state!\n");
else {
dev_dbg(>dev, "resetting (FLR, D3, etc) the device\n");
-   __pci_reset_function_locked(dev);
+   err = pcistub_reset_device_state(dev);
+   if (err)
+   goto config_release;
pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index a237af867873..b50646c993dd 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -256,6 +256,13 @@ struct physdev_pci_device_add {
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/include/xen/pci.h b/include/xen/pci.h
index b8337cf85fd1..7941809ab729 100644
--- a/include/xen/pci.h
+++ b/include/xen/pci.h
@@ -4,10 +4,16 @@
 #d

[RFC QEMU PATCH v7 1/1] xen/pci: get gsi for passthrough devices

2024-05-16 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, qemu wants to use
gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but
the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

Get gsi by using new function supported by Xen tools.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 hw/xen/xen-host-pci-device.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716a2..2fe6a60434ba 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -10,6 +10,7 @@
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "xen-host-pci-device.h"
+#include "hw/xen/xen_native.h"
 
 #define XEN_HOST_PCI_MAX_EXT_CAP \
 ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
@@ -329,12 +330,17 @@ int xen_host_pci_find_ext_cap_offset(XenHostPCIDevice *d, 
uint32_t cap)
 return -1;
 }
 
+#define PCI_SBDF(seg, bus, dev, func) \
+uint32_t)(seg)) << 16) | \
+(PCI_BUILD_BDF(bus, PCI_DEVFN(dev, func
+
 void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t domain,
  uint8_t bus, uint8_t dev, uint8_t func,
  Error **errp)
 {
 ERRP_GUARD();
 unsigned int v;
+uint32_t sdbf;
 
 d->config_fd = -1;
 d->domain = domain;
@@ -364,11 +370,16 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, 
uint16_t domain,
 }
 d->device_id = v;
 
-xen_host_pci_get_dec_value(d, "irq", , errp);
-if (*errp) {
-goto error;
+sdbf = PCI_SBDF(domain, bus, dev, func);
+d->irq = xc_physdev_gsi_from_dev(xen_xc, sdbf);
+/* fail to get gsi, fallback to irq */
+if (d->irq == -1) {
+xen_host_pci_get_dec_value(d, "irq", , errp);
+if (*errp) {
+goto error;
+}
+d->irq = v;
 }
-d->irq = v;
 
 xen_host_pci_get_hex_value(d, "class", , errp);
 if (*errp) {
-- 
2.34.1




[RFC QEMU PATCH v7 0/1] Support device passthrough when dom0 is PVH on Xen

2024-05-16 Thread Jiqian Chen
Hi All,
This is v7 series to support passthrough on Xen when dom0 is PVH.
v6->v7 changes:
* Due to changes in the implementation of obtaining gsi in the kernel and Xen. 
Change to use
  xc_physdev_gsi_from_dev, that requires passing in sbdf instead of irq.


Best regards,
Jiqian Chen



v5->v6 changes:
* Due to changes in the implementation of obtaining gsi in the kernel and Xen. 
Change to use
  xc_physdev_gsi_from_irq, instead of gsi sysfs.


v4->v5 changes:
* Add review by Stefano


v3->v4 changes:
* Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi 
sysfs
  if it exists, if there is no gsi sysfs, still use irq.


v2->v3 changes:
* Du to changes in the implementation of the second patch on kernel side(that 
adds
  a new sysfs for gsi instead of a new syscall), so read gsi number from the 
sysfs of gsi.


Below is the description of v2 cover letter:
This patch is the v2 of the implementation of passthrough when dom0 is PVH on 
Xen.
Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough 
device\u2019s gsi to pirq in
function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq,
but qemu pass irq to it and treat irq as gsi, it is got from file
/sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). 
But actually
the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a 
gsi in
function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the 
principle of
applying first, distributing first. And if you debug the kernel codes
(see function __irq_alloc_descs), you will find the irq number is allocated 
from small to
large by order, but the applying gsi number is not, gsi 38 may come before gsi 
28, that
causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want
to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add 
new syscall
to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() 
when dom0
initialize pci devices, and provide a syscall for userspace to get the gsi from 
irq. The
third patch of xen(tools: Add new function to get gsi from irq) add a new 
function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success.

This v2 on qemu side is the same as the v1 
(qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (1):
  xen/pci: get gsi for passthrough devices

 hw/xen/xen-host-pci-device.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

-- 
2.34.1




[RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-05-16 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, when
passthrough a device to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
at domain_pirq_to_irq.

So, add a new hypercall to grant/revoke gsi permission
when dom0 is not PV or dom0 has not PIRQ flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/include/xenctrl.h  |  5 +++
 tools/libs/ctrl/xc_domain.c  | 15 
 tools/libs/light/libxl_pci.c | 72 
 xen/arch/x86/domctl.c| 31 
 xen/include/public/domctl.h  |  9 +
 xen/xsm/flask/hooks.c|  1 +
 6 files changed, 117 insertions(+), 16 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 841db41ad7e4..c21a79d74be3 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, );
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, );
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 7e44d4c3ae2b..1d1b81dd2844 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1412,6 +1412,37 @@ static bool pci_supp_legacy_irq(void)
 #define PCI_SBDF(seg, bus, devfn) \
 uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
 
+static int pci_device_set_gsi(libxl_ctx *ctx,
+  libxl_domid domid,
+  libxl_device_pci *pci,
+  bool map,
+  int *gsi_back)
+{
+int r, gsi, pirq;
+uint32_t sbdf;
+
+sbdf = PCI_SBDF(pci->domain, pci->bus, (PCI_DEVFN(pci->dev, pci->func)));
+r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
+*gsi_back = r;
+if (r < 0)
+return r;
+
+gsi = r;
+pirq = r;
+if (map)
+r = xc_physdev_map_pirq(ctx->xch, domid, gsi, );
+else
+r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq);
+if (r)
+return r;
+
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map);
+if (r && errno == EOPNOTSUPP)
+r = xc_domain_irq_permission(ctx->xch, domid, gsi, map);
+
+return r;
+}
+
 static void pci_add_dm_done(libxl__egc *egc,
 pci_add_state *pas,
 int rc)
@@ -1424,10 +1455,10 @@ static void pci_add_dm_done(libxl__egc *egc,
 unsigned long long start, end, flags, size;
 int irq, i;
 int r;
-uint32_t sbdf;
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, );
+int gsi;
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1485,6 +1516,19 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
+
+r = pci_device_set_gsi(ctx, domid, pci, 1, );
+if (gsi >= 0) {
+if (r < 0) {
+rc = ERROR_FAIL;
+LOGED(ERROR, domainid,
+  "pci_device_set_gsi gsi=%d (error=%d)", gsi, errno);
+goto out;
+} else {
+goto process_permissive;
+}
+}
+/* if gsi < 0, keep using irq */
 sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
 pci->bus, pci->dev, pci->func);
 f = fopen(sysfs_path, "r");
@@ -1493,13 +1537,6 @@ static void pci_add_dm_done(libxl__egc *egc,
 goto out_no_irq;
 }
 if ((fscanf(f, "%u", ) == 1) && irq) {
-sbdf = PCI_SBDF(pci->domain, pci->bus,
- 

[RFC XEN PATCH v8 4/5] tools: Add new function to get gsi from dev

2024-05-16 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
irq number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use its gsi number
to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq, so it will fail when mapping.
And in current codes, there is no method to get gsi for
userspace.

For above purpose, add new function to get gsi. And call this
function before xc_physdev_(un)map_pirq

Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
 tools/include/xen-sys/Linux/privcmd.h |  7 +++
 tools/include/xencall.h   |  2 ++
 tools/include/xenctrl.h   |  2 ++
 tools/libs/call/core.c|  5 +
 tools/libs/call/libxencall.map|  2 ++
 tools/libs/call/linux.c   | 15 +++
 tools/libs/call/private.h |  9 +
 tools/libs/ctrl/xc_physdev.c  |  4 
 tools/libs/light/libxl_pci.c  | 23 +++
 9 files changed, 69 insertions(+)

diff --git a/tools/include/xen-sys/Linux/privcmd.h 
b/tools/include/xen-sys/Linux/privcmd.h
index bc60e8fd55eb..977f1a058797 100644
--- a/tools/include/xen-sys/Linux/privcmd.h
+++ b/tools/include/xen-sys/Linux/privcmd.h
@@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
__u64 addr;
 } privcmd_mmap_resource_t;
 
+typedef struct privcmd_gsi_from_dev {
+   __u32 sbdf;
+   int gsi;
+} privcmd_gsi_from_dev_t;
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: _hypercall_t
@@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE\
_IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
+#define IOCTL_PRIVCMD_GSI_FROM_DEV \
+   _IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_dev_t))
 #define IOCTL_PRIVCMD_UNIMPLEMENTED\
_IOC(_IOC_NONE, 'P', 0xFF, 0)
 
diff --git a/tools/include/xencall.h b/tools/include/xencall.h
index fc95ed0fe58e..750aab070323 100644
--- a/tools/include/xencall.h
+++ b/tools/include/xencall.h
@@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
  uint64_t arg1, uint64_t arg2, uint64_t arg3,
  uint64_t arg4, uint64_t arg5);
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
+
 /* Variant(s) of the above, as needed, returning "long" instead of "int". */
 long xencall2L(xencall_handle *xcall, unsigned int op,
uint64_t arg1, uint64_t arg2);
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 499685594427..841db41ad7e4 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1aefa..6dae50c9a6ba 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
 return osdep_hypercall(xcall, );
 }
 
+int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
+{
+return osdep_oscall(xcall, sbdf);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
index d18a3174e9dc..b92a0b5dc12c 100644
--- a/tools/libs/call/libxencall.map
+++ b/tools/libs/call/libxencall.map
@@ -10,6 +10,8 @@ VERS_1.0 {
xencall4;
xencall5;
 
+   xen_oscall_gsi_from_dev;
+
xencall_alloc_buffer;
xencall_free_buffer;
xencall_alloc_buffer_pages;
diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
index 6d588e6bea8f..92c740e176f2 100644
--- a/tools/libs/call/linux.c
+++ b/tools/libs/call/linux.c
@@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, 
privcmd_hypercall_t *hypercall)
 return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
 }
 
+int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
+{
+privcmd_gsi_from_dev_t dev_gsi = {
+.sbdf = sbdf,
+.gsi = -1,
+};
+
+if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, _gsi)) {
+PERROR("failed to get gsi from dev");
+return -1;
+}
+
+return dev_gsi.gsi;
+}
+
 static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages)
 {
 void *p;
diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h
index 9c3aa432efe2..cd6eb5a3e66f 

[XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device

2024-05-16 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 36 
 xen/drivers/vpci/vpci.c  | 10 ++
 xen/include/public/physdev.h |  7 +++
 xen/include/xen/vpci.h   |  6 ++
 5 files changed, 60 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 14679dd82971..56fbb69ab201 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..73dc8f058b0e 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct physdev_pci_device dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(, arg, 1) != 0 )
+break;
+sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(>domain->pci_lock);
+ret = vpci_reset_device_state(pdev);
+write_unlock(>domain->pci_lock);
+pcidevs_unlock();
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 97e115dc5798..424aec2d5c46 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(pcidevs_locked());
+ASSERT(rw_is_write_locked(>domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..f5bab1f29779 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 6e4c972f35ed..93b1c1d72c05 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -266,6 +267,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
 
 static inline void vpci_deassign_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1




[XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-05-16 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index d49fb8b548a3..98e3c6b176ff 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_unmap_pirq:
 break;
 
+case PHYSDEVOP_setup_gsi:
+if ( !is_hardware_domain(currd) )
+return -EOPNOTSUPP;
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-05-16 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when caller has no PIRQ
flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  2 ++
 xen/arch/x86/physdev.c   | 24 
 2 files changed, 26 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 56fbb69ab201..d49fb8b548a3 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 7efa17cf4c1e..1337f95171cd 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_map_pirq: {
 physdev_map_pirq_t map;
 struct msi_info msi;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If caller is the same HVM guest as current, check pirq flag */
+if ( !is_pv_domain(d) && !has_pirq(d) && map.domid == DOMID_SELF )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 switch ( map.type )
 {
 case MAP_PIRQ_TYPE_MSI_SEG:
@@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 case PHYSDEVOP_unmap_pirq: {
 struct physdev_unmap_pirq unmap;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(unmap.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If caller is the same HVM guest as current, check pirq flag */
+if ( !is_pv_domain(d) && !has_pirq(d) && unmap.domid == DOMID_SELF )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
 break;
 }
-- 
2.34.1




[XEN PATCH v8 0/5] Support device passthrough when dom0 is PVH on Xen

2024-05-16 Thread Jiqian Chen
Hi All,
This is v8 series to support passthrough when dom0 is PVH
v6->v7 changes:
* patch#2: Add the domid check(domid == DOMID_SELF) to prevent self map when 
guest doesn't use pirq.
   That check was missed in the previous version.
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi by passing in the sbdf of pci device.
* patch#5: Remove the parameter "is_gsi", when there exist gsi, in 
pci_add_dm_done use a new function
   pci_device_set_gsi to do map_pirq and grant permission. That gets 
more intuitive code logic.


Best regards,
Jiqian Chen



v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function
   to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.


v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old 
function vpci_remove_device,
   vpci_add_handlers to vpci_deassign_device, vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to 
be understood, which to use
   gsi by default and be compatible with older kernel versions to 
continue to use irq


v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if 
the caller has PIRQ flag, and
   just break for PHYSDEVOP_(un)map_pirq in hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on 
it. And add the handling of errno
   and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall 
XEN_DOMCTL_gsi_permission to grant gsi


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move 
printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH 
is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to 
grant irq permission in
   XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi 
sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-jiqian.c...@amd.com/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete 
pci_reset_device_state; add
   xsm_resource_setup_pci check for PHYSDEVOP_pci_device_state_reset; 
add description for
   PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel 
side(that it will do setup_gsi and
   map_pirq when assigning a device to passthrough), add 
PHYSDEVOP_setup_gsi for PVH dom0, and we need
   to support self mapping.
* patch#3: du to changes in the implementation of the second patch on kernel 
side(that adds a new sysfs for gsi
   instead of a new syscall), so read gsi number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-jiqian.c...@amd.com/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will call
pcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword()\u201d, the pci config
write will trigger an io interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state in
pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to reset the
state stor

[RFC KERNEL PATCH v7 0/2] Support device passthrough when dom0 is PVH on Xen

2024-05-15 Thread Jiqian Chen
Hi All,
This is v7 series to support passthrough on Xen when dom0 is PVH.
v6->v7 change:
* the first patch of v6 was already merged into branch linux_next.
* patch#1: is the patch#2 of v6. move the implementation of function 
xen_acpi_get_gsi_info to
   file drivers/xen/acpi.c, that modification is more convenient for 
the subsequent
   patch to obtain gsi.
* patch#2: is the patch#3 of v6. add a new parameter "gsi" to struct 
pcistub_device and set
   gsi when pcistub initialize device. Then when userspace wants to get 
gsi by passing
   sbdf, we can return that gsi.


Best regards,
Jiqian Chen




v5->v6 change:
* patch#3: change to add a new syscall to translate irq to gsi, instead adding 
a new gsi sysfs.


v4->v5 changes:
* patch#1: Add Reviewed-by Stefano
* patch#2: Add Reviewed-by Stefano
* patch#3: No changes


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new 
function
   pcistub_reset_device_state to wrap __pci_reset_function_locked and 
xen_reset_device_state,
   and call pcistub_reset_device_state in pci_stub.c
* patch#2: remove map_pirq from xen_pvh_passthrough_gsi


v2->v3 changes:
* patch#1: add condition to limit do xen_reset_device_state for no-pv domain in 
pcistub_init_device.
* patch#2: Abandoning previous implementations that call unmask_irq. To setup 
gsi and map pirq for
   passthrough device in pcistub_init_device.
* patch#3: Abandoning previous implementations that adds new syscall to get gsi 
from irq. To add a new
   sysfs for gsi, then userspace can get gsi number from sysfs.


Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will
call \u201cpcistub_init_device() -> pci_restore_state() -> 
pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword(), the pci
config write will trigger an io interrupt to bar_write() in the xen, but the 
bar->enabled was set before,
the write is not allowed now, and then when bar->Qemu config the passthrough 
device in xen_pt_realize(),
it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state
in pdev->vpci is all out of date and is different from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to
reset the state stored in vPCI when the state of real device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), 
v1 simply allow domU to
write pci bar, it does not comply with the design principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using 
gsi. See
xen_pt_realize->xc_physdev_map_pirq and pci_add_dm_done->xc_physdev_map_pirq. 
Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no 
X86_EMU_USE_PIRQ flag, it
will fail at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 
(at present dom0 is PVH).
The second patch of xen(x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0) allow 
PVH dom0 do
PHYSDEVOP_map_pirq. This v2 patch is better than v1, v1 simply remove the 
has_pirq check
(xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a 
passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to 
check if the gsi has
corresponding mappings in dom0. But it didn\u2019t, so failed. See
XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and it 
return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH 
dom0, because the
devices of PVH are using MSI(-X) interrupts. However, the IO-APIC pin must be 
configured for it to be
able to be mapped into a domU.

Re

[RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev

2024-05-15 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi 28,
it causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use device's gsi
number to do pirq mapping, but the gsi number is got from
file /sys/bus/pci/devices//irq, irq!= gsi, so it will
fail when mapping.
And in current linux codes, there is no method to get gsi
for userspace.

For above purpose, record gsi of pcistub devices when init
pcistub and add a new syscall into privcmd to let userspace
can get gsi when they have a need.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 drivers/xen/privcmd.c  | 28 ++
 drivers/xen/xen-pciback/pci_stub.c | 38 +++---
 include/uapi/xen/privcmd.h |  7 ++
 include/xen/acpi.h |  2 ++
 4 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 67dfa4778864..5953a03b5cb0 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -45,6 +45,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_ACPI
+#include 
+#endif
 
 #include "privcmd.h"
 
@@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
return rc;
 }
 
+static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
+{
+   struct privcmd_gsi_from_dev kdata;
+
+   if (copy_from_user(, udata, sizeof(kdata)))
+   return -EFAULT;
+
+#ifdef CONFIG_ACPI
+   kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
+   if (kdata.gsi == -1)
+   return -EINVAL;
+#else
+   kdata.gsi = -1;
+#endif
+
+   if (copy_to_user(udata, , sizeof(kdata)))
+   return -EFAULT;
+
+   return 0;
+}
+
 #ifdef CONFIG_XEN_PRIVCMD_EVENTFD
 /* Irqfd support */
 static struct workqueue_struct *irqfd_cleanup_wq;
@@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file,
ret = privcmd_ioctl_ioeventfd(file, udata);
break;
 
+   case IOCTL_PRIVCMD_GSI_FROM_DEV:
+   ret = privcmd_ioctl_gsi_from_dev(file, udata);
+   break;
+
default:
break;
}
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 2b90d832d0a7..4b62b4d377a9 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -56,6 +56,9 @@ struct pcistub_device {
 
struct pci_dev *dev;
struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */
+#ifdef CONFIG_ACPI
+   int gsi;
+#endif
 };
 
 /* Access to pcistub_devices & seized_devices lists and the initialize_devices
@@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
 
kref_init(>kref);
spin_lock_init(>lock);
+#ifdef CONFIG_ACPI
+   psdev->gsi = -1;
+#endif
 
return psdev;
 }
@@ -220,6 +226,25 @@ static struct pci_dev *pcistub_device_get_pci_dev(struct 
xen_pcibk_device *pdev,
return pci_dev;
 }
 
+#ifdef CONFIG_ACPI
+int pcistub_get_gsi_from_sbdf(unsigned int sbdf)
+{
+   struct pcistub_device *psdev;
+   int domain = sbdf >> 16;
+   int bus = (sbdf >> 8) & 0xff;
+   int slot = (sbdf >> 3) & 0x1f;
+   int func = sbdf & 0x7;
+
+   psdev = pcistub_device_find(domain, bus, slot, func);
+
+   if (!psdev)
+   return -1;
+
+   return psdev->gsi;
+}
+EXPORT_SYMBOL_GPL(pcistub_get_gsi_from_sbdf);
+#endif
+
 struct pci_dev *pcistub_get_pci_dev_by_slot(struct xen_pcibk_device *pdev,
int domain, int bus,
int slot, int func)
@@ -367,14 +392,20 @@ static int pcistub_match(struct pci_dev *dev)
return found;
 }
 
-static int pcistub_init_device(struct pci_dev *dev)
+static int pcistub_init_device(struct pcistub_device *psdev)
 {
struct xen_pcibk_dev_data *dev_data;
+   struct pci_dev *dev;
 #ifdef CONFIG_ACPI
int gsi, trigger, polarity;
 #endif
int err = 0;
 
+   if (!psdev)
+   return -EINVAL;
+
+   dev = psdev->dev;
+
dev_dbg(>dev, "initializing...\n");
 
/* The PCI backend is not intended to be a module (or to work with
@@ -448,6 +479,7 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(>dev, "Fail to get gsi info!\n");
goto config_release;
}
+   psdev->gsi = gsi;
 
if (xen_initial_domain() && xen_pvh_domain()) {
err = xen_pvh_setup_gsi(gsi, trigger, polarity);
@@ -495,7 +527,7 @@ static int __init pcistub_init_devices_

[RFC KERNEL PATCH v7 1/2] xen/pvh: Setup gsi for passthrough device

2024-05-15 Thread Jiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 arch/x86/xen/enlighten_pvh.c   | 21 +
 drivers/acpi/pci_irq.c |  2 +-
 drivers/xen/acpi.c | 50 ++
 drivers/xen/xen-pciback/pci_stub.c | 21 +
 include/linux/acpi.h   |  1 +
 include/xen/acpi.h | 10 ++
 6 files changed, 104 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index 27a2a02ef8fb..711cdcbc6916 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -4,6 +4,7 @@
 #include 
 
 #include 
+#include 
 
 #include 
 #include 
@@ -27,6 +28,26 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+int xen_pvh_setup_gsi(int gsi, int trigger, int polarity)
+{
+   int ret;
+   struct physdev_setup_gsi setup_gsi;
+
+   setup_gsi.gsi = gsi;
+   setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1);
+   setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   ret = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, _gsi);
+   if (ret == -EEXIST) {
+   xen_raw_printk("Already setup the GSI :%d\n", gsi);
+   ret = 0;
+   } else if (ret)
+   xen_raw_printk("Fail to setup GSI (%d)!\n", gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_setup_gsi);
+
 void __init xen_pvh_init(struct boot_params *boot_params)
 {
u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/acpi.c b/drivers/xen/acpi.c
index 6893c79fd2a1..9e2096524fbc 100644
--- a/drivers/xen/acpi.c
+++ b/drivers/xen/acpi.c
@@ -30,6 +30,7 @@
  * IN THE SOFTWARE.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -75,3 +76,52 @@ int xen_acpi_notify_hypervisor_extended_sleep(u8 sleep_state,
return xen_acpi_notify_hypervisor_state(sleep_state, val_a,
val_b, true);
 }
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;
+};
+
+int xen_acpi_get_gsi_info(struct pci_dev *dev,
+ int *gsi_out,
+ int *trigger_out,
+ int *polarity_out)
+{
+   int gsi;
+   u8 pin;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (!dev || !gsi_out || !trigger_out || !polarity_out)
+   return -EINVAL;
+
+   pin = dev->pin;
+   if (!pin)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->index,
+, ,
+NULL);
+   else
+   gsi = entry->index;
+   } else
+   gsi = -1;
+
+   if (gsi < 0)
+   return -EINVAL;
+
+   *gsi_out = gsi;
+   *trigger_out = trigger;
+   *polarity_out = polarity;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(xen_acpi_get_gsi_info);
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 46c40ec8a18e..2b90d832d0a7 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -21,6 +21,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_ACPI
+#include 
+#endif
 #include 
 #include 
 #include "pciback.h"
@@ -367,6 +370,9 @@ static int pcistub_match(struct pci_dev *dev)
 static int pcistub_init_device(struct pci_dev *dev)
 {
struct xen_pcibk_dev_data *dev_data;
+#ifdef CONFIG_ACPI
+   int gsi, trigger, polarity;
+#endif
int err = 0;
 
dev_dbg(>dev, "initializing...\n");
@@ -435,6 

[RFC QEMU PATCH v6 0/1] Support device passthrough when dom0 is PVH on Xen

2024-04-18 Thread Jiqian Chen
Hi All,
This is v6 series to support passthrough on Xen when dom0 is PVH.
v5->v6 changes:
* Due to changes in the implementation of obtaining gsi in the kernel and Xen. 
Change to use xc_physdev_gsi_from_irq, instead of gsi sysfs.

Best regards,
Jiqian Chen


v4->v5 changes:
* Add review by Stefano


v3->v4 changes:
* Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi 
sysfs
  if it exists, if there is no gsi sysfs, still use irq.


v2->v3 changes:
* Du to changes in the implementation of the second patch on kernel side(that 
adds
  a new sysfs for gsi instead of a new syscall), so read gsi number from the 
sysfs of gsi.


Below is the description of v2 cover letter:
This patch is the v2 of the implementation of passthrough when dom0 is PVH on 
Xen.
Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough 
device\u2019s gsi to pirq in
function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq,
but qemu pass irq to it and treat irq as gsi, it is got from file
/sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). 
But actually
the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a 
gsi in
function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the 
principle of
applying first, distributing first. And if you debug the kernel codes
(see function __irq_alloc_descs), you will find the irq number is allocated 
from small to
large by order, but the applying gsi number is not, gsi 38 may come before gsi 
28, that
causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want
to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add 
new syscall
to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() 
when dom0
initialize pci devices, and provide a syscall for userspace to get the gsi from 
irq. The
third patch of xen(tools: Add new function to get gsi from irq) add a new 
function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success.

This v2 on qemu side is the same as the v1 
(qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (1):
  xen/pci: get gsi from irq for passthrough devices

 hw/xen/xen-host-pci-device.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.34.1




[RFC QEMU PATCH v6 1/1] xen/pci: get gsi from irq for passthrough devices

2024-04-18 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, qemu wants to use
gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but
the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

Translate irq to gsi by using new function supported by
Xen tools.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 hw/xen/xen-host-pci-device.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716a2..5e9aa9679e3e 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -10,6 +10,7 @@
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "xen-host-pci-device.h"
+#include "hw/xen/xen_native.h"
 
 #define XEN_HOST_PCI_MAX_EXT_CAP \
 ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
@@ -368,7 +369,11 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 if (*errp) {
 goto error;
 }
-d->irq = v;
+d->irq = xc_physdev_gsi_from_irq(xen_xc, v);
+/* if fail to get gsi, fallback to irq */
+if (d->irq == -1) {
+d->irq = v;
+}
 
 xen_host_pci_get_hex_value(d, "class", , errp);
 if (*errp) {
-- 
2.34.1




[RFC XEN PATCH v7 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-04-18 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, when
passthrough a device to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
at domain_pirq_to_irq.

So, add a new hypercall to grant/revoke gsi permission
when dom0 is not PV or dom0 has not PIRQ flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/include/xenctrl.h  |  5 
 tools/libs/ctrl/xc_domain.c  | 15 
 tools/libs/light/libxl_pci.c | 46 
 xen/arch/x86/domctl.c| 31 
 xen/include/public/domctl.h  |  9 +++
 xen/xsm/flask/hooks.c|  1 +
 6 files changed, 97 insertions(+), 10 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2b9d55d2c6d7..adeaab93d0f7 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, );
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, );
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index d4313e196ebd..7e82f31ffc4f 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc,
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, );
+int gsi;
+bool is_gsi = false;
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1490,6 +1492,8 @@ static void pci_add_dm_done(libxl__egc *egc,
 r = xc_physdev_gsi_from_irq(ctx->xch, irq);
 if (r != -1) {
 irq = r;
+gsi = r;
+is_gsi = true;
 }
 r = xc_physdev_map_pirq(ctx->xch, domid, irq, );
 if (r < 0) {
@@ -1499,13 +1503,25 @@ static void pci_add_dm_done(libxl__egc *egc,
 rc = ERROR_FAIL;
 goto out;
 }
-r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
-if (r < 0) {
-LOGED(ERROR, domainid,
-  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
-fclose(f);
-rc = ERROR_FAIL;
-goto out;
+if (is_gsi) {
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
+if (r < 0 && errno != -EOPNOTSUPP) {
+LOGED(ERROR, domainid,
+  "xc_domain_gsi_permission gsi=%d (error=%d)", gsi, 
errno);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
+}
+if (!is_gsi || errno == -EOPNOTSUPP) {
+r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+if (r < 0) {
+LOGED(ERROR, domainid,
+"xc_domain_irq_permission irq=%d (error=%d)", irq, errno);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
 }
 }
 fclose(f);
@@ -2180,6 +2196,7 @@ static void pci_remove_detached(libxl__egc *egc,
 uint32_t domainid = prs->domid;
 bool isstubdom;
 int r;
+bool is_gsi = false;
 
 /* Convenience aliases */
 libxl_device_pci *const pci = >pci;
@@ -2249,6 +2266,7 @@ skip_bar:
 r = xc_physdev_gsi_from_irq(ctx->xch, irq);
 if (r != -1) {
 irq = r;
+is_gsi = true;
 }
 rc = xc_physdev_unmap_pirq(ctx->xch, domid, irq);
 if (rc < 0) {
@@ -2260,9 +2278,17 @@ skip_bar:
  */
 LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
 }
-rc = xc_domain

[RFC XEN PATCH v7 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-04-18 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index d49fb8b548a3..98e3c6b176ff 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_unmap_pirq:
 break;
 
+case PHYSDEVOP_setup_gsi:
+if ( !is_hardware_domain(currd) )
+return -EOPNOTSUPP;
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[XEN PATCH v7 1/5] xen/vpci: Clear all vpci status of device

2024-04-18 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 36 
 xen/drivers/vpci/vpci.c  | 10 ++
 xen/include/public/physdev.h |  7 +++
 xen/include/xen/vpci.h   |  6 ++
 5 files changed, 60 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 14679dd82971..56fbb69ab201 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..73dc8f058b0e 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct physdev_pci_device dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(, arg, 1) != 0 )
+break;
+sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(>domain->pci_lock);
+ret = vpci_reset_device_state(pdev);
+write_unlock(>domain->pci_lock);
+pcidevs_unlock();
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 97e115dc5798..424aec2d5c46 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(pcidevs_locked());
+ASSERT(rw_is_write_locked(>domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..f5bab1f29779 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index e89c571890b2..ea64d94e818b 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -266,6 +267,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
 
 static inline void vpci_deassign_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1




[RFC XEN PATCH v7 4/5] tools: Add new function to get gsi from irq

2024-04-18 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
irq number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use its gsi number
to do pirq mapping, see xen_pt_realize->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq, so it will fail when mapping.
And in current codes, there is no method to translate irq to
gsi for userspace.

For above purpose, add new function to get that translation.

And call this function before xc_physdev_(un)map_pirq

Signed-off-by: Huang Rui 
Signed-off-by: Chen Jiqian 
---
 tools/include/xencall.h|  2 ++
 tools/include/xenctrl.h|  2 ++
 tools/libs/call/core.c |  5 +
 tools/libs/call/libxencall.map |  2 ++
 tools/libs/call/linux.c| 15 +++
 tools/libs/call/private.h  |  9 +
 tools/libs/ctrl/xc_physdev.c   |  4 
 tools/libs/light/libxl_pci.c   | 11 +++
 8 files changed, 50 insertions(+)

diff --git a/tools/include/xencall.h b/tools/include/xencall.h
index fc95ed0fe58e..962cb45e1f1b 100644
--- a/tools/include/xencall.h
+++ b/tools/include/xencall.h
@@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
  uint64_t arg1, uint64_t arg2, uint64_t arg3,
  uint64_t arg4, uint64_t arg5);
 
+int xen_oscall_gsi_from_irq(xencall_handle *xcall, int irq);
+
 /* Variant(s) of the above, as needed, returning "long" instead of "int". */
 long xencall2L(xencall_handle *xcall, unsigned int op,
uint64_t arg1, uint64_t arg2);
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2ef8b4e05422..2b9d55d2c6d7 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
   uint32_t domid,
   int pirq);
 
+int xc_physdev_gsi_from_irq(xc_interface *xch, int irq);
+
 /*
  *  LOGGING AND ERROR REPORTING
  */
diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
index 02c4f8e1aefa..6f79f3babd19 100644
--- a/tools/libs/call/core.c
+++ b/tools/libs/call/core.c
@@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
 return osdep_hypercall(xcall, );
 }
 
+int xen_oscall_gsi_from_irq(xencall_handle *xcall, int irq)
+{
+return osdep_oscall(xcall, irq);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
index d18a3174e9dc..6cde8eda05e2 100644
--- a/tools/libs/call/libxencall.map
+++ b/tools/libs/call/libxencall.map
@@ -10,6 +10,8 @@ VERS_1.0 {
xencall4;
xencall5;
 
+   xen_oscall_gsi_from_irq;
+
xencall_alloc_buffer;
xencall_free_buffer;
xencall_alloc_buffer_pages;
diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
index 6d588e6bea8f..32b60c8b403e 100644
--- a/tools/libs/call/linux.c
+++ b/tools/libs/call/linux.c
@@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, 
privcmd_hypercall_t *hypercall)
 return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
 }
 
+long osdep_oscall(xencall_handle *xcall, int irq)
+{
+privcmd_gsi_from_irq_t gsi_irq = {
+.irq = irq,
+.gsi = -1,
+};
+
+if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_IRQ, _irq)) {
+PERROR("failed to get gsi from irq");
+return -1;
+}
+
+return gsi_irq.gsi;
+}
+
 static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages)
 {
 void *p;
diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h
index 9c3aa432efe2..2d86cfb1e099 100644
--- a/tools/libs/call/private.h
+++ b/tools/libs/call/private.h
@@ -57,6 +57,15 @@ int osdep_xencall_close(xencall_handle *xcall);
 
 long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall);
 
+#if defined(__linux__)
+long osdep_oscall(xencall_handle *xcall, int irq);
+#else
+static inline long osdep_oscall(xencall_handle *xcall, int irq)
+{
+return -1;
+}
+#endif
+
 void *osdep_alloc_pages(xencall_handle *xcall, size_t nr_pages);
 void osdep_free_pages(xencall_handle *xcall, void *p, size_t nr_pages);
 
diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
index 460a8e779ce8..4d3b138ebd0e 100644
--- a/tools/libs/ctrl/xc_physdev.c
+++ b/tools/libs/ctrl/xc_physdev.c
@@ -111,3 +111,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
 return rc;
 }
 
+int xc_physdev_gsi_from_irq(xc_interface *xch, int irq)
+{
+return xen_oscall_gsi_from_irq(xch->xcall, irq);
+}
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..d4313e196ebd 100644
--- a/tools/libs/light/libxl_pci.c
+++ 

[RFC XEN PATCH v7 0/5] Support device passthrough when dom0 is PVH on Xen

2024-04-18 Thread Jiqian Chen
Hi All,
This is v7 series to support passthrough when dom0 is PVH
v6->v7 changes:
* patch#4: Due to changes in the implementation of obtaining gsi in the kernel. 
Change to add a new function to get gsi from irq, instead of gsi sysfs.
* patch#5: Fix the issue with variable usage, rc->r.

Best regards,
Jiqian Chen


v5->v6 changes:
* patch#1: Add Reviewed-by Stefano and Stewart. Rebase code and change old 
function vpci_remove_device, vpci_add_handlers to vpci_deassign_device, 
vpci_assign_device
* patch#2: Add Reviewed-by Stefano
* patch#3: Remove unnecessary "ASSERT(!has_pirq(currd));"
* patch#4: Fix some coding style issues below directory tools
* patch#5: Modified some variable names and code logic to make code easier to 
be understood, which to use gsi by default and be compatible with older kernel 
versions to continue to use irq


v4->v5 changes:
* patch#1: add pci_lock wrap function vpci_reset_device_state
* patch#2: move the check of self map_pirq to physdev.c, and change to check if 
the caller has PIRQ flag, and just break for PHYSDEVOP_(un)map_pirq in 
hvm_physdev_op
* patch#3: return -EOPNOTSUPP instead, and use ASSERT(!has_pirq(currd));
* patch#4: is the patch#5 in v4 because patch#5 in v5 has some dependency on 
it. And add the handling of errno and add the Reviewed-by Stefano
* patch#5: is the patch#4 in v4. New implementation to add new hypercall 
XEN_DOMCTL_gsi_permission to grant gsi


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; move 
printings behind pcidevs_unlock
* patch#2: add check to prevent PVH self map
* patch#3: new patch, The implementation of adding PHYSDEVOP_setup_gsi for PVH 
is treated as a separate patch
* patch#4: new patch to solve the map_pirq problem of PVH dom0. use gsi to 
grant irq permission in XEN_DOMCTL_irq_permission.
* patch#5: to be compatible with previous kernel versions, when there is no gsi 
sysfs, still use irq
v4 link:
https://lore.kernel.org/xen-devel/20240105070920.350113-1-jiqian.c...@amd.com/T/#t

v2->v3 changes:
* patch#1: move the content out of pci_reset_device_state and delete 
pci_reset_device_state; add xsm_resource_setup_pci check for 
PHYSDEVOP_pci_device_state_reset; add description for 
PHYSDEVOP_pci_device_state_reset;
* patch#2: du to changes in the implementation of the second patch on kernel 
side(that it will do setup_gsi and map_pirq when assigning a device to 
passthrough), add PHYSDEVOP_setup_gsi for PVH dom0, and we need to support self 
mapping.
* patch#3: du to changes in the implementation of the second patch on kernel 
side(that adds a new sysfs for gsi instead of a new syscall), so read gsi 
number from the sysfs of gsi.
v3 link:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-jiqian.c...@amd.com/T/#t

v2 link:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t
Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will call \u201cpcistub_init_device() -> pci_restore_state() 
-> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword()\u201d, the pci config write will trigger an io 
interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state in pdev->vpci is all out of date and is different 
from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to reset the state stored in vPCI when the state of real 
device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1 
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), 
v1 simply allow domU to write pci bar, it does not comply with the design 
principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using 
gsi. See xen_pt_realize->xc_physdev_map_pirq and 
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into 
Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 a

[XEN PATCH v7 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-04-18 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when caller has no PIRQ
flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  2 ++
 xen/arch/x86/physdev.c   | 24 
 2 files changed, 26 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 56fbb69ab201..d49fb8b548a3 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 7efa17cf4c1e..1367abc61e54 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_map_pirq: {
 physdev_map_pirq_t map;
 struct msi_info msi;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If it is an HVM guest, check if it has PIRQs */
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 switch ( map.type )
 {
 case MAP_PIRQ_TYPE_MSI_SEG:
@@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 case PHYSDEVOP_unmap_pirq: {
 struct physdev_unmap_pirq unmap;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(unmap.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If it is an HVM guest, check if it has PIRQs */
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
 break;
 }
-- 
2.34.1




[RFC KERNEL PATCH v6 0/3] Support device passthrough when dom0 is PVH on Xen

2024-04-18 Thread Jiqian Chen
Hi All,
This is v6 series to support passthrough on Xen when dom0 is PVH.
v5->v6 change:
* patch#3: change to add a new syscall to translate irq to gsi, instead adding 
a new gsi sysfs.


Best regards,
Jiqian Chen


v4->v5 changes:
* patch#1: Add Reviewed-by Stefano
* patch#2: Add Reviewed-by Stefano
* patch#3: No changes


v3->v4 changes:
* patch#1: change the comment of PHYSDEVOP_pci_device_state_reset; use a new 
function pcistub_reset_device_state to wrap __pci_reset_function_locked and 
xen_reset_device_state, and call pcistub_reset_device_state in pci_stub.c
* patch#2: remove map_pirq from xen_pvh_passthrough_gsi


v2->v3 changes:
* patch#1: add condition to limit do xen_reset_device_state for no-pv domain in 
pcistub_init_device.
* patch#2: Abandoning previous implementations that call unmask_irq. To setup 
gsi and map pirq for passthrough device in pcistub_init_device.
* patch#3: Abandoning previous implementations that adds new syscall to get gsi 
from irq. To add a new sysfs for gsi, then userspace can get gsi number from 
sysfs.


Below is the description of v2 cover letter:
This series of patches are the v2 of the implementation of passthrough when 
dom0 is PVH on Xen.
We sent the v1 to upstream before, but the v1 had so many problems and we got 
lots of suggestions.
I will introduce all issues that these patches try to fix and the differences 
between v1 and v2.

Issues we encountered:
1. pci_stub failed to write bar for a passthrough device.
Problem: when we run \u201csudo xl pci-assignable-add \u201d to assign a 
device, pci_stub will call \u201cpcistub_init_device() -> pci_restore_state() 
-> pci_restore_config_space() ->
pci_restore_config_space_range() -> pci_restore_config_dword() -> 
pci_write_config_dword()\u201d, the pci config write will trigger an io 
interrupt to bar_write() in the xen, but the
bar->enabled was set before, the write is not allowed now, and then when 
bar->Qemu config the
passthrough device in xen_pt_realize(), it gets invalid bar values.

Reason: the reason is that we don't tell vPCI that the device has been reset, 
so the current cached state in pdev->vpci is all out of date and is different 
from the real device state.

Solution: to solve this problem, the first patch of kernel(xen/pci: Add 
xen_reset_device_state
function) and the fist patch of xen(xen/vpci: Clear all vpci status of device) 
add a new hypercall to reset the state stored in vPCI when the state of real 
device has changed.
Thank Roger for the suggestion of this v2, and it is different from v1 
(https://lore.kernel.org/xen-devel/20230312075455.450187-3-ray.hu...@amd.com/), 
v1 simply allow domU to write pci bar, it does not comply with the design 
principles of vPCI.

2. failed to do PHYSDEVOP_map_pirq when dom0 is PVH
Problem: HVM domU will do PHYSDEVOP_map_pirq for a passthrough device by using 
gsi. See xen_pt_realize->xc_physdev_map_pirq and 
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will call into 
Xen, but in hvm_physdev_op(), PHYSDEVOP_map_pirq is not allowed.

Reason: In hvm_physdev_op(), the variable "currd" is PVH dom0 and PVH has no 
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

Solution: I think we may need to allow PHYSDEVOP_map_pirq when "currd" is dom0 
(at present dom0 is PVH). The second patch of xen(x86/pvh: Open 
PHYSDEVOP_map_pirq for PVH dom0) allow PVH dom0 do PHYSDEVOP_map_pirq. This v2 
patch is better than v1, v1 simply remove the has_pirq check(xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a 
passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to 
check if the gsi has corresponding mappings in dom0. But it didn\u2019t, so 
failed. See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH 
dom0 and it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH 
dom0, because the devices of PVH are using MSI(-X) interrupts. However, the 
IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be 
done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka 
ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device 
doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask 
irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a 
device to be passthrough. So that passthrough devices can have the mapping of 
gsi on PVH dom0 and gsi 

[RFC KERNEL PATCH v6 2/3] xen/pvh: Setup gsi for passthrough device

2024-04-18 Thread Jiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 arch/x86/xen/enlighten_pvh.c   | 92 ++
 drivers/acpi/pci_irq.c |  2 +-
 drivers/xen/xen-pciback/pci_stub.c |  8 +++
 include/linux/acpi.h   |  1 +
 include/xen/acpi.h |  6 ++
 5 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index c28f073c1df5..12be665b27d8 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -26,6 +27,97 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+typedef struct gsi_info {
+   int gsi;
+   int trigger;
+   int polarity;
+} gsi_info_t;
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;  /* GSI, or link _CRS index */
+};
+
+static int xen_pvh_get_gsi_info(struct pci_dev *dev,
+   gsi_info_t 
*gsi_info)
+{
+   int gsi;
+   u8 pin;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (!dev || !gsi_info)
+   return -EINVAL;
+
+   pin = dev->pin;
+   if (!pin)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->index,
+, ,
+NULL);
+   else
+   gsi = entry->index;
+   } else
+   gsi = -1;
+
+   if (gsi < 0)
+   return -EINVAL;
+
+   gsi_info->gsi = gsi;
+   gsi_info->trigger = trigger;
+   gsi_info->polarity = polarity;
+
+   return 0;
+}
+
+static int xen_pvh_setup_gsi(gsi_info_t *gsi_info)
+{
+   struct physdev_setup_gsi setup_gsi;
+
+   if (!gsi_info)
+   return -EINVAL;
+
+   setup_gsi.gsi = gsi_info->gsi;
+   setup_gsi.triggering = (gsi_info->trigger == ACPI_EDGE_SENSITIVE ? 0 : 
1);
+   setup_gsi.polarity = (gsi_info->polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, _gsi);
+}
+
+int xen_pvh_passthrough_gsi(struct pci_dev *dev)
+{
+   int ret;
+   gsi_info_t gsi_info;
+
+   if (!dev)
+   return -EINVAL;
+
+   ret = xen_pvh_get_gsi_info(dev, _info);
+   if (ret) {
+   xen_raw_printk("Fail to get gsi info!\n");
+   return ret;
+   }
+
+   ret = xen_pvh_setup_gsi(_info);
+   if (ret == -EEXIST) {
+   xen_raw_printk("Already setup the GSI :%d\n", gsi_info.gsi);
+   ret = 0;
+   } else if (ret)
+   xen_raw_printk("Fail to setup GSI (%d)!\n", gsi_info.gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_passthrough_gsi);
+
 void __init xen_pvh_init(struct boot_params *boot_params)
 {
u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 46c40ec8a18e..22d4380d2b04 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -435,6 +436,13 @@ static int pcistub_init_device(struct pci_dev *dev)
goto config_release;
pci_restore_state(dev);
}
+
+   if (xen_initial_domain() && xen_pvh_domain()) {
+   err = xen_pvh_passthrough_gsi(dev);
+   if (err)
+   goto config_release;
+   }
+
/* Now disable the device (this also ensures some private device
 

[RFC KERNEL PATCH v6 3/3] xen/privcmd: Add new syscall to get gsi from irq

2024-04-18 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi 28,
it causes the irq number is not equal with the gsi number.
And when passthrough a device, QEMU will use device's gsi
number to do pirq mapping, but the gsi number is got from
file /sys/bus/pci/devices//irq, irq!= gsi, so it will
fail when mapping.
And in current linux codes, there is no method to translate
irq to gsi for userspace.

For above purpose, record the relationship of gsi and irq
when PVH dom0 do acpi_register_gsi_ioapic for devices and
adds a new syscall into privcmd to let userspace can get
that translation when they have a need.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 arch/x86/include/asm/apic.h  |  8 +++
 arch/x86/include/asm/xen/pci.h   |  5 
 arch/x86/kernel/acpi/boot.c  |  2 +-
 arch/x86/pci/xen.c   | 21 +
 drivers/xen/events/events_base.c | 39 
 drivers/xen/privcmd.c| 19 
 include/uapi/xen/privcmd.h   |  7 ++
 include/xen/events.h |  5 
 8 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 9d159b771dc8..dd4139250895 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -169,6 +169,9 @@ extern bool apic_needs_pit(void);
 
 extern void apic_send_IPI_allbutself(unsigned int vector);
 
+extern int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
+   int trigger, int polarity);
+
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
 #define local_apic_timer_c2_ok 1
@@ -183,6 +186,11 @@ static inline void apic_intr_mode_init(void) { }
 static inline void lapic_assign_system_vectors(void) { }
 static inline void lapic_assign_legacy_vector(unsigned int i, bool r) { }
 static inline bool apic_needs_pit(void) { return true; }
+static inline int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
+   int trigger, int polarity)
+{
+   return (int)gsi;
+}
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_X86_X2APIC
diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
index 9015b888edd6..aa8ded61fc2d 100644
--- a/arch/x86/include/asm/xen/pci.h
+++ b/arch/x86/include/asm/xen/pci.h
@@ -5,6 +5,7 @@
 #if defined(CONFIG_PCI_XEN)
 extern int __init pci_xen_init(void);
 extern int __init pci_xen_hvm_init(void);
+extern int __init pci_xen_pvh_init(void);
 #define pci_xen 1
 #else
 #define pci_xen 0
@@ -13,6 +14,10 @@ static inline int pci_xen_hvm_init(void)
 {
return -1;
 }
+static inline int pci_xen_pvh_init(void)
+{
+   return -1;
+}
 #endif
 #ifdef CONFIG_XEN_PV_DOM0
 int __init pci_xen_initial_domain(void);
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 85a3ce2a3666..72c73458c083 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -749,7 +749,7 @@ static int acpi_register_gsi_pic(struct device *dev, u32 
gsi,
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
-static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
+int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
int trigger, int polarity)
 {
int irq = gsi;
diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 652cd53e77f6..f056ab5c0a06 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -114,6 +114,21 @@ static int acpi_register_gsi_xen_hvm(struct device *dev, 
u32 gsi,
 false /* no mapping of GSI to PIRQ */);
 }
 
+static int acpi_register_gsi_xen_pvh(struct device *dev, u32 gsi,
+   int trigger, int polarity)
+{
+   int irq;
+
+   irq = acpi_register_gsi_ioapic(dev, gsi, trigger, polarity);
+   if (irq < 0)
+   return irq;
+
+   if (xen_pvh_add_gsi_irq_map(gsi, irq) == -EEXIST)
+   printk(KERN_INFO "Already map the GSI :%u and IRQ: %d\n", gsi, 
irq);
+
+   return irq;
+}
+
 #ifdef CONFIG_XEN_PV_DOM0
 static int xen_register_gsi(u32 gsi, int triggering, int polarity)
 {
@@ -558,6 +573,12 @@ int __init pci_xen_hvm_init(void)
return 0;
 }
 
+int __init pci_xen_pvh_init(void)
+{
+   __acpi_register_gsi = acpi_register_gsi_xen_pvh;
+   return 0;
+}
+
 #ifdef CONFIG_XEN_PV_DOM0
 int __init pci_xen_initial_domain(void)
 {
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 27553673e46b..80d4f7faac64 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -953,6 +953,43 @@ int xen_irq_from_gsi(unsigned gsi)
 }
 EXPORT_SYMBOL_GPL(xen_irq_from_gsi);
 
+int xen_gsi_from_irq

[KERNEL PATCH v6 1/3] xen/pci: Add xen_reset_device_state function

2024-04-18 Thread Jiqian Chen
When device on dom0 side has been reset, the vpci on Xen side
won't get notification, so that the cached state in vpci is
all out of date with the real device state.
To solve that problem, add a new function to clear all vpci
device state when device is reset on dom0 side.

And call that function in pcistub_init_device. Because when
using "pci-assignable-add" to assign a passthrough device in
Xen, it will reset passthrough device and the vpci state will
out of date, and then device will fail to restore bar state.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 drivers/xen/pci.c  | 12 
 drivers/xen/xen-pciback/pci_stub.c | 18 +++---
 include/xen/interface/physdev.h|  7 +++
 include/xen/pci.h  |  6 ++
 4 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 72d4e3f193af..e9b30bc09139 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev)
return r;
 }
 
+int xen_reset_device_state(const struct pci_dev *dev)
+{
+   struct physdev_pci_device device = {
+   .seg = pci_domain_nr(dev->bus),
+   .bus = dev->bus->number,
+   .devfn = dev->devfn
+   };
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, );
+}
+EXPORT_SYMBOL_GPL(xen_reset_device_state);
+
 static int xen_pci_notifier(struct notifier_block *nb,
unsigned long action, void *data)
 {
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index e34b623e4b41..46c40ec8a18e 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
return psdev;
 }
 
+static int pcistub_reset_device_state(struct pci_dev *dev)
+{
+   __pci_reset_function_locked(dev);
+
+   if (!xen_pv_domain())
+   return xen_reset_device_state(dev);
+   else
+   return 0;
+}
+
 /* Don't call this directly as it's called by pcistub_device_put */
 static void pcistub_device_release(struct kref *kref)
 {
@@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref)
/* Call the reset function which does not take lock as this
 * is called from "unbind" which takes a device_lock mutex.
 */
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
if (dev_data &&
pci_load_and_free_saved_state(dev, _data->pci_saved_state))
dev_info(>dev, "Could not reload PCI state\n");
@@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 * (so it's ready for the next domain)
 */
device_lock_assert(>dev);
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
 
dev_data = pci_get_drvdata(dev);
ret = pci_load_saved_state(dev, dev_data->pci_saved_state);
@@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(>dev, "Could not store PCI conf saved state!\n");
else {
dev_dbg(>dev, "resetting (FLR, D3, etc) the device\n");
-   __pci_reset_function_locked(dev);
+   err = pcistub_reset_device_state(dev);
+   if (err)
+   goto config_release;
pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index a237af867873..8609770e28f5 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -256,6 +256,13 @@ struct physdev_pci_device_add {
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/include/xen/pci.h b/include/xen/pci.h
index b8337cf85fd1..b2e2e856efd6 100644
--- a/include/xen/pci.h
+++ b/include/xen/pci.h
@@ -4,10 +4,16 @@
 #define __XEN_PCI_H__
 
 #if defined(CONFIG_XEN_DOM0)
+int xen_reset_device_state(const struct pci_dev *dev);
 int xen_find_device_domain_owner(struct pci_dev *dev);
 int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
 int xen_unregister_device_domain_owner(struct pci_dev *dev);
 #else
+static inline int xen_reset_device_state(const struct pci_dev *dev)
+{
+   return -1;
+}
+
 static inline int xen_find_device_domain_owner(struct pci_dev *dev)
 {
return -1;
-- 
2.34.1




[KERNEL PATCH v5 1/3] xen/pci: Add xen_reset_device_state function

2024-03-28 Thread Jiqian Chen
When device on dom0 side has been reset, the vpci on Xen side
won't get notification, so that the cached state in vpci is
all out of date with the real device state.
To solve that problem, add a new function to clear all vpci
device state when device is reset on dom0 side.

And call that function in pcistub_init_device. Because when
using "pci-assignable-add" to assign a passthrough device in
Xen, it will reset passthrough device and the vpci state will
out of date, and then device will fail to restore bar state.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 drivers/xen/pci.c  | 12 
 drivers/xen/xen-pciback/pci_stub.c | 18 +++---
 include/xen/interface/physdev.h|  7 +++
 include/xen/pci.h  |  6 ++
 4 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 72d4e3f193af..e9b30bc09139 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev)
return r;
 }
 
+int xen_reset_device_state(const struct pci_dev *dev)
+{
+   struct physdev_pci_device device = {
+   .seg = pci_domain_nr(dev->bus),
+   .bus = dev->bus->number,
+   .devfn = dev->devfn
+   };
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, );
+}
+EXPORT_SYMBOL_GPL(xen_reset_device_state);
+
 static int xen_pci_notifier(struct notifier_block *nb,
unsigned long action, void *data)
 {
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index e34b623e4b41..46c40ec8a18e 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
return psdev;
 }
 
+static int pcistub_reset_device_state(struct pci_dev *dev)
+{
+   __pci_reset_function_locked(dev);
+
+   if (!xen_pv_domain())
+   return xen_reset_device_state(dev);
+   else
+   return 0;
+}
+
 /* Don't call this directly as it's called by pcistub_device_put */
 static void pcistub_device_release(struct kref *kref)
 {
@@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref)
/* Call the reset function which does not take lock as this
 * is called from "unbind" which takes a device_lock mutex.
 */
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
if (dev_data &&
pci_load_and_free_saved_state(dev, _data->pci_saved_state))
dev_info(>dev, "Could not reload PCI state\n");
@@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 * (so it's ready for the next domain)
 */
device_lock_assert(>dev);
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
 
dev_data = pci_get_drvdata(dev);
ret = pci_load_saved_state(dev, dev_data->pci_saved_state);
@@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(>dev, "Could not store PCI conf saved state!\n");
else {
dev_dbg(>dev, "resetting (FLR, D3, etc) the device\n");
-   __pci_reset_function_locked(dev);
+   err = pcistub_reset_device_state(dev);
+   if (err)
+   goto config_release;
pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index a237af867873..8609770e28f5 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -256,6 +256,13 @@ struct physdev_pci_device_add {
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/include/xen/pci.h b/include/xen/pci.h
index b8337cf85fd1..b2e2e856efd6 100644
--- a/include/xen/pci.h
+++ b/include/xen/pci.h
@@ -4,10 +4,16 @@
 #define __XEN_PCI_H__
 
 #if defined(CONFIG_XEN_DOM0)
+int xen_reset_device_state(const struct pci_dev *dev);
 int xen_find_device_domain_owner(struct pci_dev *dev);
 int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
 int xen_unregister_device_domain_owner(struct pci_dev *dev);
 #else
+static inline int xen_reset_device_state(const struct pci_dev *dev)
+{
+   return -1;
+}
+
 static inline int xen_find_device_domain_owner(struct pci_dev *dev)
 {
return -1;
-- 
2.34.1




[RFC KERNEL PATCH v5 3/3] PCI/sysfs: Add gsi sysfs for pci_dev

2024-03-28 Thread Jiqian Chen
There is a need for some scenarios to use gsi sysfs.
For example, when xen passthrough a device to dumU, it will
use gsi to map pirq, but currently userspace can't get gsi
number.
So, add gsi sysfs for that and for other potential scenarios.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
RFC: No feasible suggestions were obtained in the discussion of v4.
Discussions are still needed where/how to expose the gsi.
Looking forward to get more comments and suggestions from PCI/ACPI Maintainers.

---
 drivers/acpi/pci_irq.c  |  1 +
 drivers/pci/pci-sysfs.c | 11 +++
 include/linux/pci.h |  2 ++
 3 files changed, 14 insertions(+)

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index 630fe0a34bc6..739a58755df2 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
kfree(entry);
return 0;
}
+   dev->gsi = gsi;
 
rc = acpi_register_gsi(>dev, gsi, triggering, polarity);
if (rc < 0) {
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 2321fdfefd7d..c51df88d079e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(irq);
 
+static ssize_t gsi_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   return sysfs_emit(buf, "%u\n", pdev->gsi);
+}
+static DEVICE_ATTR_RO(gsi);
+
 static ssize_t broken_parity_status_show(struct device *dev,
 struct device_attribute *attr,
 char *buf)
@@ -596,6 +606,7 @@ static struct attribute *pci_dev_attrs[] = {
_attr_revision.attr,
_attr_class.attr,
_attr_irq.attr,
+   _attr_gsi.attr,
_attr_local_cpus.attr,
_attr_local_cpulist.attr,
_attr_modalias.attr,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7ab0d13672da..457043cfdfce 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -529,6 +529,8 @@ struct pci_dev {
 
/* These methods index pci_reset_fn_methods[] */
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
+
+   unsigned intgsi;
 };
 
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
-- 
2.34.1




[RFC KERNEL PATCH v5 2/3] xen/pvh: Setup gsi for passthrough device

2024-03-28 Thread Jiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
RFC: This patch change function acpi_pci_irq_lookup from a static function to 
non-static, need ACPI Maintainer to give some comments.

---
 arch/x86/xen/enlighten_pvh.c   | 92 ++
 drivers/acpi/pci_irq.c |  2 +-
 drivers/xen/xen-pciback/pci_stub.c |  8 +++
 include/linux/acpi.h   |  1 +
 include/xen/acpi.h |  6 ++
 5 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index c28f073c1df5..12be665b27d8 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -26,6 +27,97 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+typedef struct gsi_info {
+   int gsi;
+   int trigger;
+   int polarity;
+} gsi_info_t;
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;  /* GSI, or link _CRS index */
+};
+
+static int xen_pvh_get_gsi_info(struct pci_dev *dev,
+   gsi_info_t 
*gsi_info)
+{
+   int gsi;
+   u8 pin;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (!dev || !gsi_info)
+   return -EINVAL;
+
+   pin = dev->pin;
+   if (!pin)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->index,
+, ,
+NULL);
+   else
+   gsi = entry->index;
+   } else
+   gsi = -1;
+
+   if (gsi < 0)
+   return -EINVAL;
+
+   gsi_info->gsi = gsi;
+   gsi_info->trigger = trigger;
+   gsi_info->polarity = polarity;
+
+   return 0;
+}
+
+static int xen_pvh_setup_gsi(gsi_info_t *gsi_info)
+{
+   struct physdev_setup_gsi setup_gsi;
+
+   if (!gsi_info)
+   return -EINVAL;
+
+   setup_gsi.gsi = gsi_info->gsi;
+   setup_gsi.triggering = (gsi_info->trigger == ACPI_EDGE_SENSITIVE ? 0 : 
1);
+   setup_gsi.polarity = (gsi_info->polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, _gsi);
+}
+
+int xen_pvh_passthrough_gsi(struct pci_dev *dev)
+{
+   int ret;
+   gsi_info_t gsi_info;
+
+   if (!dev)
+   return -EINVAL;
+
+   ret = xen_pvh_get_gsi_info(dev, _info);
+   if (ret) {
+   xen_raw_printk("Fail to get gsi info!\n");
+   return ret;
+   }
+
+   ret = xen_pvh_setup_gsi(_info);
+   if (ret == -EEXIST) {
+   xen_raw_printk("Already setup the GSI :%d\n", gsi_info.gsi);
+   ret = 0;
+   } else if (ret)
+   xen_raw_printk("Fail to setup GSI (%d)!\n", gsi_info.gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_passthrough_gsi);
+
 void __init xen_pvh_init(struct boot_params *boot_params)
 {
u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 46c40ec8a18e..22d4380d2b04 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -435,6 +436,13 @@ static int pcistub_init_device(struct pci_dev *dev)
goto config_release;
pci_restore_state(dev);
}
+
+   if (xen_initial_domain() && xen_pvh_domain()) {
+   err = xen_pvh_passthrough_gsi(

[RFC KERNEL PATCH v5 0/3] Support device passthrough when dom0 is PVH on Xen

2024-03-28 Thread Jiqian Chen
atch of kernel(xen/pvh: Unmask 
irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a 
device to be passthrough. So that passthrough devices can have the mapping of 
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the 
v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/,
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2 patch is the same as v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.


Jiqian Chen (3):
  xen/pci: Add xen_reset_device_state function
  xen/pvh: Setup gsi for passthrough device
  PCI/sysfs: Add gsi sysfs for pci_dev

 arch/x86/xen/enlighten_pvh.c   | 92 ++
 drivers/acpi/pci_irq.c |  3 +-
 drivers/pci/pci-sysfs.c| 11 
 drivers/xen/pci.c  | 12 
 drivers/xen/xen-pciback/pci_stub.c | 26 -
 include/linux/acpi.h   |  1 +
 include/linux/pci.h|  2 +
 include/xen/acpi.h |  6 ++
 include/xen/interface/physdev.h|  7 +++
 include/xen/pci.h  |  6 ++
 10 files changed, 162 insertions(+), 4 deletions(-)

-- 
2.34.1




[RFC XEN PATCH v6 4/5] libxl: Use gsi instead of irq for mapping pirq

2024-03-28 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, xl wants to use
gsi to map pirq, see pci_add_dm_done->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

So, use real gsi number read from gsi sysfs.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 

---
RFC: discussions ongoing on the Linux side where/how to expose the gsi

---
 tools/libs/light/libxl_pci.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..2cec83e0b734 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1478,8 +1478,14 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
 pci->bus, pci->dev, pci->func);
+r = access(sysfs_path, F_OK);
+if (r && errno == ENOENT) {
+/* To compitable with old version of kernel, still need to use irq */
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+   pci->bus, pci->dev, pci->func);
+}
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
@@ -2229,9 +2235,15 @@ skip_bar:
 if (!pci_supp_legacy_irq())
 goto skip_legacy_irq;
 
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
pci->bus, pci->dev, pci->func);
 
+rc = access(sysfs_path, F_OK);
+if (rc && errno == ENOENT) {
+/* To compitable with old version of kernel, still need to use irq */
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+   pci->bus, pci->dev, pci->func);
+}
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
-- 
2.34.1




[RFC XEN PATCH v6 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-03-28 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, when
passthrough a device to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
at domain_pirq_to_irq.

So, add a new hypercall to grant/revoke gsi permission
when dom0 is not PV or dom0 has not PIRQ flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/include/xenctrl.h  |  5 
 tools/libs/ctrl/xc_domain.c  | 15 +++
 tools/libs/light/libxl_pci.c | 52 +---
 xen/arch/x86/domctl.c| 31 +
 xen/include/public/domctl.h  |  9 +++
 xen/xsm/flask/hooks.c|  1 +
 6 files changed, 103 insertions(+), 10 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2ef8b4e05422..519c860a00d5 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..8540e84fda93 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, );
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_gsi_permission,
+.domain = domid,
+.u.gsi_permission.gsi = gsi,
+.u.gsi_permission.allow_access = allow_access,
+};
+
+return do_domctl(xch, );
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 2cec83e0b734..debf6ec6ddc7 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc,
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, );
+int gsi;
+bool is_gsi = true;
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1485,6 +1487,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 /* To compitable with old version of kernel, still need to use irq */
 sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
pci->bus, pci->dev, pci->func);
+is_gsi = false;
 }
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
@@ -1492,6 +1495,13 @@ static void pci_add_dm_done(libxl__egc *egc,
 goto out_no_irq;
 }
 if ((fscanf(f, "%u", ) == 1) && irq) {
+/*
+ * If use gsi, save the value, because the value of irq
+ * will be changed by function xc_physdev_map_pirq
+ */
+if (is_gsi) {
+gsi = irq;
+}
 r = xc_physdev_map_pirq(ctx->xch, domid, irq, );
 if (r < 0) {
 LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
@@ -1500,13 +1510,25 @@ static void pci_add_dm_done(libxl__egc *egc,
 rc = ERROR_FAIL;
 goto out;
 }
-r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
-if (r < 0) {
-LOGED(ERROR, domainid,
-  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
-fclose(f);
-rc = ERROR_FAIL;
-goto out;
+if (is_gsi) {
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
+if (r < 0 && r != -EOPNOTSUPP) {
+LOGED(ERROR, domainid,
+  "xc_domain_gsi_permission gsi=%d (error=%d)", gsi, r);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
+}
+if (!is_gsi || r == -EOPNOTSUPP) {
+r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+if (r < 0) {
+LOGED(ERROR, domainid,
+"xc_domain_irq_permission irq=%d (error=%d)", irq, r);
+fclose(f);
+rc = ERROR_FAIL;
+goto out;
+}
 }
 }
 fclose(f);
@@ -2180,6 +2202,7 @@ static void pci_remove_detached(libxl

[XEN PATCH v6 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-03-28 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when caller has no PIRQ
flag.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  2 ++
 xen/arch/x86/physdev.c   | 24 
 2 files changed, 26 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 6ad5b4d5f11f..493998b42ec5 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 7efa17cf4c1e..1367abc61e54 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_map_pirq: {
 physdev_map_pirq_t map;
 struct msi_info msi;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If it is an HVM guest, check if it has PIRQs */
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 switch ( map.type )
 {
 case MAP_PIRQ_TYPE_MSI_SEG:
@@ -343,11 +355,23 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 case PHYSDEVOP_unmap_pirq: {
 struct physdev_unmap_pirq unmap;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(unmap.domid);
+if ( d == NULL )
+return -ESRCH;
+/* If it is an HVM guest, check if it has PIRQs */
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
 break;
 }
-- 
2.34.1




[RFC XEN PATCH v6 0/5] Support device passthrough when dom0 is PVH on Xen

2024-03-28 Thread Jiqian Chen
o PHYSDEVOP_map_pirq. This v2 
patch is better than v1, v1 simply remove the has_pirq check(xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-4-ray.hu...@amd.com/).

3. the gsi of a passthrough device doesn't be unmasked
 3.1 failed to check the permission of pirq
 3.2 the gsi of passthrough device was not registered in PVH dom0

Problem:
3.1 callback function pci_add_dm_done() will be called when qemu config a 
passthrough device for domU.
This function will call xc_domain_irq_permission()-> pirq_access_permitted() to 
check if the gsi has corresponding mappings in dom0. But it didn’t, so failed. 
See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and 
it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH 
dom0, because the devices of PVH are using MSI(-X) interrupts. However, the 
IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be 
done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka 
ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device 
doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask 
irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a 
device to be passthrough. So that passthrough devices can have the mapping of 
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the 
v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/,
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2 patch is the same as v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.


Jiqian Chen (5):
  xen/vpci: Clear all vpci status of device
  x86/pvh: Allow (un)map_pirq when dom0 is PVH
  x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  libxl: Use gsi instead of irq for mapping pirq
  domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

 tools/include/xenctrl.h  |  5 +++
 tools/libs/ctrl/xc_domain.c  | 15 
 tools/libs/light/libxl_pci.c | 68 +---
 xen/arch/x86/domctl.c| 31 
 xen/arch/x86/hvm/hypercall.c |  8 +
 xen/arch/x86/physdev.c   | 24 +
 xen/drivers/pci/physdev.c| 36 +++
 xen/drivers/vpci/vpci.c  | 10 ++
 xen/include/public/domctl.h  |  9 +
 xen/include/public/physdev.h |  7 
 xen/include/xen/vpci.h   |  6 
 xen/xsm/flask/hooks.c|  1 +
 12 files changed, 208 insertions(+), 12 deletions(-)

-- 
2.34.1




[RFC XEN PATCH v6 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-03-28 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 493998b42ec5..7d4e41f66885 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_unmap_pirq:
 break;
 
+case PHYSDEVOP_setup_gsi:
+if ( !is_hardware_domain(currd) )
+return -EOPNOTSUPP;
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[XEN PATCH v6 1/5] xen/vpci: Clear all vpci status of device

2024-03-28 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stewart Hildebrand 
Reviewed-by: Stefano Stabellini 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 36 
 xen/drivers/vpci/vpci.c  | 10 ++
 xen/include/public/physdev.h |  7 +++
 xen/include/xen/vpci.h   |  6 ++
 5 files changed, 60 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index eeb73e1aa5d0..6ad5b4d5f11f 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..73dc8f058b0e 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct physdev_pci_device dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(, arg, 1) != 0 )
+break;
+sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(>domain->pci_lock);
+ret = vpci_reset_device_state(pdev);
+write_unlock(>domain->pci_lock);
+pcidevs_unlock();
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 260b72875ee1..310700c1e775 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -117,6 +117,16 @@ int vpci_assign_device(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(pcidevs_locked());
+ASSERT(rw_is_write_locked(>domain->pci_lock));
+
+vpci_deassign_device(pdev);
+return vpci_assign_device(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..f5bab1f29779 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index e89c571890b2..ea64d94e818b 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_deassign_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -266,6 +267,11 @@ static inline int vpci_assign_device(struct pci_dev *pdev)
 
 static inline void vpci_deassign_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1




[RFC QEMU PATCH v5 1/1] xen: Use gsi instead of irq for mapping pirq

2024-03-28 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, qemu wants to use
gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but
the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

Add gsi into XenHostPCIDevice and use gsi number that
read from gsi sysfs if it exists.

Signed-off-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 

---
RFC: discussions ongoing on the Linux side where/how to expose the gsi

---
 hw/xen/xen-host-pci-device.c | 7 +++
 hw/xen/xen-host-pci-device.h | 1 +
 hw/xen/xen_pt.c  | 6 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716a2..5be3279aa25b 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -370,6 +370,13 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 }
 d->irq = v;
 
+xen_host_pci_get_dec_value(d, "gsi", , errp);
+if (*errp) {
+d->gsi = -1;
+} else {
+d->gsi = v;
+}
+
 xen_host_pci_get_hex_value(d, "class", , errp);
 if (*errp) {
 goto error;
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 4d8d34ecb024..74c552bb5548 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -27,6 +27,7 @@ typedef struct XenHostPCIDevice {
 uint16_t device_id;
 uint32_t class_code;
 int irq;
+int gsi;
 
 XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
 XenHostPCIIORegion rom;
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 3635d1b39f79..d34a7a8764ab 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -840,7 +840,11 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
 goto out;
 }
 
-machine_irq = s->real_device.irq;
+if (s->real_device.gsi < 0) {
+machine_irq = s->real_device.irq;
+} else {
+machine_irq = s->real_device.gsi;
+}
 if (machine_irq == 0) {
 XEN_PT_LOG(d, "machine irq is 0\n");
 cmd |= PCI_COMMAND_INTX_DISABLE;
-- 
2.34.1




[QEMU PATCH v5 0/1] Support device passthrough when dom0 is PVH on Xen

2024-03-28 Thread Jiqian Chen
Hi All,
This is v5 series to support passthrough on Xen when dom0 is PVH.
v4->v5 changes:
* Add review by Stefano

v3->v4 changes:
* Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi 
sysfs
  if it exists, if there is no gsi sysfs, still use irq.

v2->v3 changes:
* Du to changes in the implementation of the second patch on kernel side(that 
adds
  a new sysfs for gsi instead of a new syscall), so read gsi number from the 
sysfs of gsi.

Below is the description of v2 cover letter:
This patch is the v2 of the implementation of passthrough when dom0 is PVH on 
Xen.
Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in
function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq,
but qemu pass irq to it and treat irq as gsi, it is got from file
/sys/bus/pci/devices/:xx:xx.x/irq in function xen_host_pci_device_get(). 
But actually
the gsi number is not equal with irq. On PVH dom0, when it allocates irq for a 
gsi in
function acpi_register_gsi_ioapic(), allocation is dynamic, and follow the 
principle of
applying first, distributing first. And if you debug the kernel codes
(see function __irq_alloc_descs), you will find the irq number is allocated 
from small to
large by order, but the applying gsi number is not, gsi 38 may come before gsi 
28, that
causes gsi 38 get a smaller irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want
to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add 
new syscall
to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() 
when dom0
initialize pci devices, and provide a syscall for userspace to get the gsi from 
irq. The
third patch of xen(tools: Add new function to get gsi from irq) add a new 
function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success.

This v2 on qemu side is the same as the v1 
(qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.


Jiqian Chen (1):
  xen: Use gsi instead of irq for mapping pirq

 hw/xen/xen-host-pci-device.c | 7 +++
 hw/xen/xen-host-pci-device.h | 1 +
 hw/xen/xen_pt.c  | 6 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

-- 
2.34.1




[RFC XEN PATCH v5 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-01-11 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, when
passthrough a device to guest on PVH dom0, callstack
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
at domain_pirq_to_irq.

So, add a new hypercall to grant/revoke gsi permission
when dom0 is not PV or dom0 has not PIRQ flag.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/include/xenctrl.h  |  5 +
 tools/libs/ctrl/xc_domain.c  | 15 +++
 tools/libs/light/libxl_pci.c | 16 ++--
 xen/arch/x86/domctl.c| 31 +++
 xen/include/public/domctl.h  |  9 +
 xen/xsm/flask/hooks.c|  1 +
 6 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 2ef8b4e05422..519c860a00d5 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
  uint32_t pirq,
  bool allow_access);
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access);
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
index f2d9d14b4d9f..448ba2c59ae1 100644
--- a/tools/libs/ctrl/xc_domain.c
+++ b/tools/libs/ctrl/xc_domain.c
@@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
 return do_domctl(xch, );
 }
 
+int xc_domain_gsi_permission(xc_interface *xch,
+ uint32_t domid,
+ uint32_t gsi,
+ bool allow_access)
+{
+struct xen_domctl domctl = {};
+
+domctl.cmd = XEN_DOMCTL_gsi_permission;
+domctl.domain = domid;
+domctl.u.gsi_permission.gsi = gsi;
+domctl.u.gsi_permission.allow_access = allow_access;
+
+return do_domctl(xch, );
+}
+
 int xc_domain_iomem_permission(xc_interface *xch,
uint32_t domid,
unsigned long first_mfn,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index a1c6e82631e9..4136a860a048 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc,
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, );
+int gsi;
+bool has_gsi = true;
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1482,6 +1484,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 pci->bus, pci->dev, pci->func);
 
 if ( access(sysfs_path, F_OK) != 0 ) {
+has_gsi = false;
 if ( errno == ENOENT )
 sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
 pci->bus, pci->dev, pci->func);
@@ -1497,6 +1500,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 goto out_no_irq;
 }
 if ((fscanf(f, "%u", ) == 1) && irq) {
+gsi = irq;
 r = xc_physdev_map_pirq(ctx->xch, domid, irq, );
 if (r < 0) {
 LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
@@ -1505,7 +1509,10 @@ static void pci_add_dm_done(libxl__egc *egc,
 rc = ERROR_FAIL;
 goto out;
 }
-r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+if ( has_gsi )
+r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
+if ( !has_gsi || r == -EOPNOTSUPP )
+r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
 if (r < 0) {
 LOGED(ERROR, domainid,
   "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
@@ -2185,6 +2192,7 @@ static void pci_remove_detached(libxl__egc *egc,
 FILE *f;
 uint32_t domainid = prs->domid;
 bool isstubdom;
+bool has_gsi = true;
 
 /* Convenience aliases */
 libxl_device_pci *const pci = >pci;
@@ -2244,6 +2252,7 @@ skip_bar:
pci->bus, pci->dev, pci->func);
 
 if ( access(sysfs_path, F_OK) != 0 ) {
+has_gsi = false;
 if ( errno == ENOENT )
 sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
 pci->bus, pci->dev, pci->func);
@@ -2270,7 +2279,10 @@ skip_bar:
  */
 LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
 }
-rc = xc_domain_irq_permission(ctx->xch, domid, irq, 0);
+if ( has_gsi )
+rc = xc_domain_gsi_permissi

[RFC XEN PATCH v5 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-01-11 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 493998b42ec5..46f51ee459f6 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -76,6 +76,12 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_unmap_pirq:
 break;
 
+case PHYSDEVOP_setup_gsi:
+if ( !is_hardware_domain(currd) )
+return -EOPNOTSUPP;
+ASSERT(!has_pirq(currd));
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[RFC XEN PATCH v5 4/5] libxl: Use gsi instead of irq for mapping pirq

2024-01-11 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, xl wants to use
gsi to map pirq, see pci_add_dm_done->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

So, use real gsi number read from gsi sysfs.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
Reviewed-by: Stefano Stabellini 
---
 tools/libs/light/libxl_pci.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..a1c6e82631e9 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1478,8 +1478,19 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
 pci->bus, pci->dev, pci->func);
+
+if ( access(sysfs_path, F_OK) != 0 ) {
+if ( errno == ENOENT )
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+pci->bus, pci->dev, pci->func);
+else {
+LOGED(ERROR, domainid, "Can't access %s", sysfs_path);
+goto out_no_irq;
+}
+}
+
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
@@ -2229,9 +2240,19 @@ skip_bar:
 if (!pci_supp_legacy_irq())
 goto skip_legacy_irq;
 
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
pci->bus, pci->dev, pci->func);
 
+if ( access(sysfs_path, F_OK) != 0 ) {
+if ( errno == ENOENT )
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+pci->bus, pci->dev, pci->func);
+else {
+LOGED(ERROR, domid, "Can't access %s", sysfs_path);
+goto skip_legacy_irq;
+}
+}
+
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
-- 
2.34.1




[RFC XEN PATCH v5 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-01-11 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
add a new check to prevent self map when caller has no PIRQ
flag.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c |  2 ++
 xen/arch/x86/physdev.c   | 22 ++
 2 files changed, 24 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 6ad5b4d5f11f..493998b42ec5 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+break;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 47c4da0af7e1..7f2422c2a483 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -303,11 +303,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 case PHYSDEVOP_map_pirq: {
 physdev_map_pirq_t map;
 struct msi_info msi;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(map.domid);
+if ( d == NULL )
+return -ESRCH;
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 switch ( map.type )
 {
 case MAP_PIRQ_TYPE_MSI_SEG:
@@ -341,11 +352,22 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 case PHYSDEVOP_unmap_pirq: {
 struct physdev_unmap_pirq unmap;
+struct domain *d;
 
 ret = -EFAULT;
 if ( copy_from_guest(, arg, 1) != 0 )
 break;
 
+d = rcu_lock_domain_by_any_id(unmap.domid);
+if ( d == NULL )
+return -ESRCH;
+if ( !is_pv_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}
+rcu_unlock_domain(d);
+
 ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
 break;
 }
-- 
2.34.1




[RFC XEN PATCH v5 1/5] xen/vpci: Clear all vpci status of device

2024-01-11 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 36 
 xen/drivers/vpci/vpci.c  | 10 ++
 xen/include/public/physdev.h |  7 +++
 xen/include/xen/vpci.h   |  6 ++
 5 files changed, 60 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index eeb73e1aa5d0..6ad5b4d5f11f 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..73dc8f058b0e 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct physdev_pci_device dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(, arg, 1) != 0 )
+break;
+sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(>domain->pci_lock);
+ret = vpci_reset_device_state(pdev);
+write_unlock(>domain->pci_lock);
+pcidevs_unlock();
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 72ef277c4f8e..c6df2c6a9561 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -107,6 +107,16 @@ int vpci_add_handlers(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(pcidevs_locked());
+ASSERT(rw_is_write_locked(>domain->pci_lock));
+
+vpci_remove_device(pdev);
+return vpci_add_handlers(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..f5bab1f29779 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index d20c301a3db3..6ec83ce9ae13 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_add_handlers(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_remove_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -262,6 +263,11 @@ static inline int vpci_add_handlers(struct pci_dev *pdev)
 
 static inline void vpci_remove_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1




[RFC XEN PATCH v5 0/5] Support device passthrough when dom0 is PVH on Xen

2024-01-11 Thread Jiqian Chen
k if the gsi has corresponding mappings in dom0. But it didn’t, so failed. 
See XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH dom0 and 
it return irq is 0.
3.2 it's possible for a gsi (iow: vIO-APIC pin) to never get registered on PVH 
dom0, because the devices of PVH are using MSI(-X) interrupts. However, the 
IO-APIC pin must be configured for it to be able to be mapped into a domU.

Reason: After searching codes, I find "map_pirq" and "register_gsi" will be 
done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when the gsi(aka 
ioapic's pin) is unmasked in PVH dom0.
So the two problems can be concluded to that the gsi of a passthrough device 
doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask 
irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a 
device to be passthrough. So that passthrough devices can have the mapping of 
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the 
v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/,
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2 patch is the same as v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (5):
  xen/vpci: Clear all vpci status of device
  x86/pvh: Allow (un)map_pirq when dom0 is PVH
  x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  libxl: Use gsi instead of irq for mapping pirq
  domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

 tools/include/xenctrl.h  |  5 +
 tools/libs/ctrl/xc_domain.c  | 15 +
 tools/libs/light/libxl_pci.c | 41 
 xen/arch/x86/domctl.c| 31 +++
 xen/arch/x86/hvm/hypercall.c |  9 
 xen/arch/x86/physdev.c   | 22 +++
 xen/drivers/pci/physdev.c| 36 +++
 xen/drivers/vpci/vpci.c  | 10 +
 xen/include/public/domctl.h  |  9 
 xen/include/public/physdev.h |  7 ++
 xen/include/xen/vpci.h   |  6 ++
 xen/xsm/flask/hooks.c|  1 +
 12 files changed, 188 insertions(+), 4 deletions(-)

-- 
2.34.1




[RFC QEMU PATCH v4 1/1] xen: Use gsi instead of irq for mapping pirq

2024-01-04 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, qemu wants to use
gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but
the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

Add gsi into XenHostPCIDevice and use gsi number that
read from gsi sysfs if it exists.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 hw/xen/xen-host-pci-device.c | 7 +++
 hw/xen/xen-host-pci-device.h | 1 +
 hw/xen/xen_pt.c  | 6 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716a2..5be3279aa25b 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -370,6 +370,13 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 }
 d->irq = v;
 
+xen_host_pci_get_dec_value(d, "gsi", , errp);
+if (*errp) {
+d->gsi = -1;
+} else {
+d->gsi = v;
+}
+
 xen_host_pci_get_hex_value(d, "class", , errp);
 if (*errp) {
 goto error;
diff --git a/hw/xen/xen-host-pci-device.h b/hw/xen/xen-host-pci-device.h
index 4d8d34ecb024..74c552bb5548 100644
--- a/hw/xen/xen-host-pci-device.h
+++ b/hw/xen/xen-host-pci-device.h
@@ -27,6 +27,7 @@ typedef struct XenHostPCIDevice {
 uint16_t device_id;
 uint32_t class_code;
 int irq;
+int gsi;
 
 XenHostPCIIORegion io_regions[PCI_NUM_REGIONS - 1];
 XenHostPCIIORegion rom;
diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 36e6f93c372f..d448f3a17306 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -839,7 +839,11 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
 goto out;
 }
 
-machine_irq = s->real_device.irq;
+if (s->real_device.gsi < 0) {
+machine_irq = s->real_device.irq;
+} else {
+machine_irq = s->real_device.gsi;
+}
 if (machine_irq == 0) {
 XEN_PT_LOG(d, "machine irq is 0\n");
 cmd |= PCI_COMMAND_INTX_DISABLE;
-- 
2.34.1




[RFC QEMU PATCH v4 0/1] Support device passthrough when dom0 is PVH on Xen

2024-01-04 Thread Jiqian Chen
Hi All,
This is v4 series to support passthrough on Xen when dom0 is PVH.
v3->v4 changes:
* Add gsi into struct XenHostPCIDevice and use gsi number that read from gsi 
sysfs if it exists, if there is no gsi sysfs, still use irq.

v2->v3 changes:
* du to changes in the implementation of the second patch on kernel side(that 
adds a new sysfs for gsi instead of a new syscall), so read gsi number from the 
sysfs of gsi.

Below is the description of v2 cover letter:
This patch is the v2 of the implementation of passthrough when dom0 is PVH on 
Xen.
Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success.

This v2 on qemu side is the same as the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (1):
  xen: Use gsi instead of irq for mapping pirq

 hw/xen/xen-host-pci-device.c | 7 +++
 hw/xen/xen-host-pci-device.h | 1 +
 hw/xen/xen_pt.c  | 6 +-
 3 files changed, 13 insertions(+), 1 deletion(-)

-- 
2.34.1




[RFC XEN PATCH v4 0/5] Support device passthrough when dom0 is PVH on Xen

2024-01-04 Thread Jiqian Chen
 be concluded to that the gsi of a passthrough device 
doesn't be unmasked.

Solution: to solve these problems, the second patch of kernel(xen/pvh: Unmask 
irq for passthrough device in PVH dom0) call the unmask_irq() when we assign a 
device to be passthrough. So that passthrough devices can have the mapping of 
gsi on PVH dom0 and gsi can be registered. This v2 patch is different from the 
v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/,
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2 patch is the same as v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (5):
  xen/vpci: Clear all vpci status of device
  x86/pvh: Allow (un)map_pirq when caller isn't DOMID_SELF
  x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0
  domctl: Use gsi to grant/revoke irq permission
  libxl: Use gsi instead of irq for mapping pirq

 tools/libs/light/libxl_pci.c | 21 +
 tools/libs/light/libxl_x86.c |  5 -
 xen/arch/x86/hvm/hypercall.c | 34 --
 xen/common/domctl.c  | 12 ++--
 xen/drivers/pci/physdev.c| 34 ++
 xen/drivers/vpci/vpci.c  |  9 +
 xen/include/public/physdev.h |  7 +++
 xen/include/xen/vpci.h   |  6 ++
 8 files changed, 119 insertions(+), 9 deletions(-)

-- 
2.34.1




[RFC XEN PATCH v4 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-01-04 Thread Jiqian Chen
On PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
passthrough devices to register gsi when dom0 is PVH.
So, add PHYSDEVOP_setup_gsi for above purpose.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 632a68be3cc4..e27d3ca15185 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -97,6 +97,12 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_setup_gsi:
+if ( is_hardware_domain(currd) && !has_pirq(currd) )
+break;
+else
+return -ENOSYS;
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[RFC XEN PATCH v4 5/5] libxl: Use gsi instead of irq for mapping pirq

2024-01-04 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, xl wants to use
gsi to map pirq, see pci_add_dm_done->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

So, use real gsi number read from gsi sysfs.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/libs/light/libxl_pci.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index e1341d1e9850..ab51dc099725 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1479,8 +1479,14 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
 pci->bus, pci->dev, pci->func);
+
+if (access(sysfs_path, F_OK) != 0) {
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+pci->bus, pci->dev, pci->func);
+}
+
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
@@ -2231,9 +2237,14 @@ skip_bar:
 if (!pci_supp_legacy_irq())
 goto skip_legacy_irq;
 
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
pci->bus, pci->dev, pci->func);
 
+if (access(sysfs_path, F_OK) != 0) {
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+   pci->bus, pci->dev, pci->func);
+}
+
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
-- 
2.34.1




[RFC XEN PATCH v4 4/5] domctl: Use gsi to grant/revoke irq permission

2024-01-04 Thread Jiqian Chen
Some type of domain don't have PIRQ, like PVH, current
implementation is not suitable for those domain.

When passthrough a device to guest on PVH dom0, this
pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
at domain_pirq_to_irq.

So, change it to use gsi to grant/revoke irq permission.
And change the caller of XEN_DOMCTL_irq_permission to
pass gsi to it instead of pirq.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/libs/light/libxl_pci.c |  6 --
 tools/libs/light/libxl_x86.c |  5 -
 xen/common/domctl.c  | 12 ++--
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da0794e..e1341d1e9850 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1418,6 +1418,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 unsigned long long start, end, flags, size;
 int irq, i;
 int r;
+int gsi;
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
 bool isstubdom = libxl_is_stubdom(ctx, domid, );
@@ -1486,6 +1487,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 goto out_no_irq;
 }
 if ((fscanf(f, "%u", ) == 1) && irq) {
+gsi = irq;
 r = xc_physdev_map_pirq(ctx->xch, domid, irq, );
 if (r < 0) {
 LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
@@ -1494,10 +1496,10 @@ static void pci_add_dm_done(libxl__egc *egc,
 rc = ERROR_FAIL;
 goto out;
 }
-r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+r = xc_domain_irq_permission(ctx->xch, domid, gsi, 1);
 if (r < 0) {
 LOGED(ERROR, domainid,
-  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
+  "xc_domain_irq_permission irq=%d (error=%d)", gsi, r);
 fclose(f);
 rc = ERROR_FAIL;
 goto out;
diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
index d16573e72cd4..88ad50cf6360 100644
--- a/tools/libs/light/libxl_x86.c
+++ b/tools/libs/light/libxl_x86.c
@@ -654,12 +654,15 @@ out:
 int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq)
 {
 int ret;
+int gsi;
+
+gsi = irq;
 
 ret = xc_physdev_map_pirq(CTX->xch, domid, irq, );
 if (ret)
 return ret;
 
-ret = xc_domain_irq_permission(CTX->xch, domid, irq, 1);
+ret = xc_domain_irq_permission(CTX->xch, domid, gsi, 1);
 
 return ret;
 }
diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index f5a71ee5f78d..eeb975bd0194 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -653,12 +653,20 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
 unsigned int pirq = op->u.irq_permission.pirq, irq;
 int allow = op->u.irq_permission.allow_access;
 
-if ( pirq >= current->domain->nr_pirqs )
+if ( pirq >= nr_irqs_gsi )
 {
 ret = -EINVAL;
 break;
 }
-irq = pirq_access_permitted(current->domain, pirq);
+
+if ( irq_access_permitted(current->domain, pirq) )
+irq = pirq;
+else
+{
+ret = -EPERM;
+break;
+}
+
 if ( !irq || xsm_irq_permission(XSM_HOOK, d, irq, allow) )
 ret = -EPERM;
 else if ( allow )
-- 
2.34.1




[RFC XEN PATCH v4 2/5] x86/pvh: Allow (un)map_pirq when caller isn't DOMID_SELF

2024-01-04 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.

So, allow PHYSDEVOP_map_pirq when domid of the caller is not
DOMID_SELF no matter whether currd has X86_EMU_USE_PIRQ flag
and also allow PHYSDEVOP_unmap_pirq for the failed path to
unmap pirq.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 6ad5b4d5f11f..632a68be3cc4 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -72,8 +73,30 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 switch ( cmd )
 {
-case PHYSDEVOP_map_pirq:
-case PHYSDEVOP_unmap_pirq:
+case PHYSDEVOP_map_pirq: {
+physdev_map_pirq_t map;
+
+if ( copy_from_guest(, arg, 1) != 0 )
+return -EFAULT;
+
+if ( !has_pirq(currd) && map.domid == DOMID_SELF )
+return -ENOSYS;
+
+break;
+}
+
+case PHYSDEVOP_unmap_pirq: {
+physdev_unmap_pirq_t unmap;
+
+if ( copy_from_guest(, arg, 1) != 0 )
+return -EFAULT;
+
+if ( !has_pirq(currd) && unmap.domid == DOMID_SELF )
+return -ENOSYS;
+
+break;
+}
+
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[RFC XEN PATCH v4 1/5] xen/vpci: Clear all vpci status of device

2024-01-04 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 34 ++
 xen/drivers/vpci/vpci.c  |  9 +
 xen/include/public/physdev.h |  7 +++
 xen/include/xen/vpci.h   |  6 ++
 5 files changed, 57 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index eeb73e1aa5d0..6ad5b4d5f11f 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d133c..552ccbf747cb 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,39 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct physdev_pci_device dev;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(, arg, 1) != 0 )
+break;
+sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+ret = vpci_reset_device_state(pdev);
+pcidevs_unlock();
+if ( ret )
+printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
);
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 72ef277c4f8e..3c64cb10ccbb 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -107,6 +107,15 @@ int vpci_add_handlers(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(pcidevs_locked());
+
+vpci_remove_device(pdev);
+return vpci_add_handlers(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c0b..f5bab1f29779 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index d20c301a3db3..6ec83ce9ae13 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_add_handlers(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_remove_device(struct pci_dev *pdev);
+int __must_check vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -262,6 +263,11 @@ static inline int vpci_add_handlers(struct pci_dev *pdev)
 
 static inline void vpci_remove_device(struct pci_dev *pdev) { }
 
+static inline int __must_check vpci_reset_device_state(struct pci_dev *pdev)
+{
+return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1




[RFC KERNEL PATCH v4 1/3] xen/pci: Add xen_reset_device_state function

2024-01-04 Thread Jiqian Chen
When device on dom0 side has been reset, the vpci on Xen side
won't get notification, so that the cached state in vpci is
all out of date with the real device state.
To solve that problem, add a new function to clear all vpci
device state when device is reset on dom0 side.

And call that function in pcistub_init_device. Because when
using "pci-assignable-add" to assign a passthrough device in
Xen, it will reset passthrough device and the vpci state will
out of date, and then device will fail to restore bar state.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 drivers/xen/pci.c  | 12 
 drivers/xen/xen-pciback/pci_stub.c | 18 +++---
 include/xen/interface/physdev.h|  7 +++
 include/xen/pci.h  |  6 ++
 4 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 72d4e3f193af..e9b30bc09139 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev)
return r;
 }
 
+int xen_reset_device_state(const struct pci_dev *dev)
+{
+   struct physdev_pci_device device = {
+   .seg = pci_domain_nr(dev->bus),
+   .bus = dev->bus->number,
+   .devfn = dev->devfn
+   };
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, );
+}
+EXPORT_SYMBOL_GPL(xen_reset_device_state);
+
 static int xen_pci_notifier(struct notifier_block *nb,
unsigned long action, void *data)
 {
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index e34b623e4b41..46c40ec8a18e 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -89,6 +89,16 @@ static struct pcistub_device *pcistub_device_alloc(struct 
pci_dev *dev)
return psdev;
 }
 
+static int pcistub_reset_device_state(struct pci_dev *dev)
+{
+   __pci_reset_function_locked(dev);
+
+   if (!xen_pv_domain())
+   return xen_reset_device_state(dev);
+   else
+   return 0;
+}
+
 /* Don't call this directly as it's called by pcistub_device_put */
 static void pcistub_device_release(struct kref *kref)
 {
@@ -107,7 +117,7 @@ static void pcistub_device_release(struct kref *kref)
/* Call the reset function which does not take lock as this
 * is called from "unbind" which takes a device_lock mutex.
 */
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
if (dev_data &&
pci_load_and_free_saved_state(dev, _data->pci_saved_state))
dev_info(>dev, "Could not reload PCI state\n");
@@ -284,7 +294,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev)
 * (so it's ready for the next domain)
 */
device_lock_assert(>dev);
-   __pci_reset_function_locked(dev);
+   pcistub_reset_device_state(dev);
 
dev_data = pci_get_drvdata(dev);
ret = pci_load_saved_state(dev, dev_data->pci_saved_state);
@@ -420,7 +430,9 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_err(>dev, "Could not store PCI conf saved state!\n");
else {
dev_dbg(>dev, "resetting (FLR, D3, etc) the device\n");
-   __pci_reset_function_locked(dev);
+   err = pcistub_reset_device_state(dev);
+   if (err)
+   goto config_release;
pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index a237af867873..8609770e28f5 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -256,6 +256,13 @@ struct physdev_pci_device_add {
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * Notify the hypervisor that a PCI device has been reset, so that any
+ * internally cached state is regenerated.  Should be called after any
+ * device reset performed by the hardware domain.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/include/xen/pci.h b/include/xen/pci.h
index b8337cf85fd1..b2e2e856efd6 100644
--- a/include/xen/pci.h
+++ b/include/xen/pci.h
@@ -4,10 +4,16 @@
 #define __XEN_PCI_H__
 
 #if defined(CONFIG_XEN_DOM0)
+int xen_reset_device_state(const struct pci_dev *dev);
 int xen_find_device_domain_owner(struct pci_dev *dev);
 int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
 int xen_unregister_device_domain_owner(struct pci_dev *dev);
 #else
+static inline int xen_reset_device_state(const struct pci_dev *dev)
+{
+   return -1;
+}
+
 static inline int xen_find_device_domain_owner(struct pci_dev *dev)
 {
return -1;
-- 
2.34.1




[RFC KERNEL PATCH v4 2/3] xen/pvh: Setup gsi for passthrough device

2024-01-04 Thread Jiqian Chen
In PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 arch/x86/xen/enlighten_pvh.c   | 90 ++
 drivers/acpi/pci_irq.c |  2 +-
 drivers/xen/xen-pciback/pci_stub.c |  8 +++
 include/linux/acpi.h   |  1 +
 include/xen/acpi.h |  6 ++
 5 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index ada3868c02c2..ecadd966c684 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
 #include 
+#include 
 
 #include 
 
@@ -25,6 +26,95 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+typedef struct gsi_info {
+   int gsi;
+   int trigger;
+   int polarity;
+} gsi_info_t;
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;  /* GSI, or link _CRS index */
+};
+
+static int xen_pvh_get_gsi_info(struct pci_dev *dev,
+   gsi_info_t 
*gsi_info)
+{
+   int gsi;
+   u8 pin = 0;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (dev)
+   pin = dev->pin;
+   if (!dev || !pin || !gsi_info)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->index,
+, ,
+NULL);
+   else
+   gsi = entry->index;
+   } else
+   gsi = -1;
+
+   if (gsi < 0)
+   return -EINVAL;
+
+   gsi_info->gsi = gsi;
+   gsi_info->trigger = trigger;
+   gsi_info->polarity = polarity;
+
+   return 0;
+}
+
+static int xen_pvh_setup_gsi(gsi_info_t *gsi_info)
+{
+   struct physdev_setup_gsi setup_gsi;
+
+   if (!gsi_info)
+   return -EINVAL;
+
+   setup_gsi.gsi = gsi_info->gsi;
+   setup_gsi.triggering = (gsi_info->trigger == ACPI_EDGE_SENSITIVE ? 0 : 
1);
+   setup_gsi.polarity = (gsi_info->polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, _gsi);
+}
+
+int xen_pvh_passthrough_gsi(struct pci_dev *dev)
+{
+   int ret;
+   gsi_info_t gsi_info;
+
+   if (!dev)
+   return -EINVAL;
+
+   ret = xen_pvh_get_gsi_info(dev, _info);
+   if (ret) {
+   xen_raw_printk("Fail to get gsi info!\n");
+   return ret;
+   }
+
+   ret = xen_pvh_setup_gsi(_info);
+   if (ret == -EEXIST) {
+   xen_raw_printk("Already setup the GSI :%d\n", gsi_info.gsi);
+   ret = 0;
+   } else if (ret)
+   xen_raw_printk("Fail to setup GSI (%d)!\n", gsi_info.gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_passthrough_gsi);
+
 void __init xen_pvh_init(struct boot_params *boot_params)
 {
u32 msr;
diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index ff30ceca2203..630fe0a34bc6 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev *dev,
 }
 #endif /* CONFIG_X86_IO_APIC */
 
-static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
+struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
 {
struct acpi_prt_entry *entry = NULL;
struct pci_dev *bridge;
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 46c40ec8a18e..22d4380d2b04 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -435,6 +436,13 @@ static int pcistub_init_device(struct pci_dev *dev)
goto config_release;
pci_restore_state(dev);
}
+
+   if (xen_initial_domain() && xen_pvh_domain()) {
+   err = xen_pvh_passthrough_gsi(dev);
+   if (err)
+   goto config_release;
+   }
+
/* Now disable the device (this also ensures some private device
 * data is setup

[RFC KERNEL PATCH v4 3/3] PCI/sysfs: Add gsi sysfs for pci_dev

2024-01-04 Thread Jiqian Chen
There is a need for some scenarios to use gsi sysfs.
For example, when xen passthrough a device to dumU, it will
use gsi to map pirq, but currently userspace can't get gsi
number.
So, add gsi sysfs for that and for other potential scenarios.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 drivers/acpi/pci_irq.c  |  1 +
 drivers/pci/pci-sysfs.c | 11 +++
 include/linux/pci.h |  2 ++
 3 files changed, 14 insertions(+)

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index 630fe0a34bc6..739a58755df2 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
kfree(entry);
return 0;
}
+   dev->gsi = gsi;
 
rc = acpi_register_gsi(>dev, gsi, triggering, polarity);
if (rc < 0) {
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 2321fdfefd7d..c51df88d079e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(irq);
 
+static ssize_t gsi_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   return sysfs_emit(buf, "%u\n", pdev->gsi);
+}
+static DEVICE_ATTR_RO(gsi);
+
 static ssize_t broken_parity_status_show(struct device *dev,
 struct device_attribute *attr,
 char *buf)
@@ -596,6 +606,7 @@ static struct attribute *pci_dev_attrs[] = {
_attr_revision.attr,
_attr_class.attr,
_attr_irq.attr,
+   _attr_gsi.attr,
_attr_local_cpus.attr,
_attr_local_cpulist.attr,
_attr_modalias.attr,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index dea043bc1e38..0618d4a87a50 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -529,6 +529,8 @@ struct pci_dev {
 
/* These methods index pci_reset_fn_methods[] */
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
+
+   unsigned intgsi;
 };
 
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
-- 
2.34.1




[RFC KERNEL PATCH v4 0/3] Support device passthrough when dom0 is PVH on Xen

2024-01-04 Thread Jiqian Chen
xen-devel/20230312120157.452859-5-ray.hu...@amd.com/,
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2 patch is the same as v1( kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ 
and xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.


Jiqian Chen (3):
  xen/pci: Add xen_reset_device_state function
  xen/pvh: Setup gsi for passthrough device
  PCI/sysfs: Add gsi sysfs for pci_dev

 arch/x86/xen/enlighten_pvh.c   | 90 ++
 drivers/acpi/pci_irq.c |  3 +-
 drivers/pci/pci-sysfs.c| 11 
 drivers/xen/pci.c  | 12 
 drivers/xen/xen-pciback/pci_stub.c | 26 -
 include/linux/acpi.h   |  1 +
 include/linux/pci.h|  2 +
 include/xen/acpi.h |  6 ++
 include/xen/interface/physdev.h|  7 +++
 include/xen/pci.h  |  6 ++
 10 files changed, 160 insertions(+), 4 deletions(-)

-- 
2.34.1




[RFC QEMU PATCH v3 1/1] xen: Use gsi instead of irq for mapping pirq

2023-12-10 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, qemu wants to use
gsi to map pirq, xen_pt_realize->xc_physdev_map_pirq, but
the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

Use real gsi number read from gsi sysfs.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 hw/xen/xen-host-pci-device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716..e270ac2631 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -364,7 +364,7 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 }
 d->device_id = v;
 
-xen_host_pci_get_dec_value(d, "irq", , errp);
+xen_host_pci_get_dec_value(d, "gsi", , errp);
 if (*errp) {
 goto error;
 }
-- 
2.34.1




[RFC QEMU PATCH v3 0/1] Support device passthrough when dom0 is PVH on Xen

2023-12-10 Thread Jiqian Chen
Hi All,
v2->v3 changes:
* du to changes in the implementation of the second patch on kernel side(that 
adds a new sysfs for gsi instead of a new syscall), so read gsi number from the 
sysfs of gsi.


v3 patch on kernel side:
https://lore.kernel.org/lkml/20231210161519.1550860-1-jiqian.c...@amd.com/T/#t
v3 patch on Xen side:
https://lore.kernel.org/xen-devel/20231210164009.1551147-1-jiqian.c...@amd.com/T/#t


Below is the description of v2 cover letter:
This patch is the v2 of the implementation of passthrough when dom0 is PVH on 
Xen.
Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu pass irq to it and treat irq as gsi, it is got from 
file /sys/bus/pci/devices/:xx:xx.x/irq in function 
xen_host_pci_device_get(). But actually the gsi number is not equal with irq. 
On PVH dom0, when it allocates irq for a gsi in function 
acpi_register_gsi_ioapic(), allocation is dynamic, and follow the principle of 
applying first, distributing first. And if you debug the kernel codes(see 
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order, but the applying gsi number is not, gsi 38 may come 
before gsi 28, that causes gsi 38 get a smaller irq number than gsi 28, and 
then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi, we can do a translation. The third patch of 
kernel(xen/privcmd: Add new syscall to get gsi from irq) records all the 
relations in acpi_register_gsi_xen_pvh() when dom0 initialize pci devices, and 
provide a syscall for userspace to get the gsi from irq. The third patch of 
xen(tools: Add new function to get gsi from irq) add a new function 
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success.

This v2 on qemu side is the same as the v1 ( qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

v2 on kernel side:
https://lore.kernel.org/lkml/20231124103123.3263471-1-jiqian.c...@amd.com/T/#t

v2 on Xen side:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t


Jiqian Chen (1):
  xen: Use gsi instead of irq for mapping pirq

 hw/xen/xen-host-pci-device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.34.1




[RFC XEN PATCH v3 3/3] libxl: Use gsi instead of irq for mapping pirq

2023-12-10 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
the irq number is alloced from small to large, but the
applying gsi number is not, may gsi 38 comes before gsi
28, that causes the irq number is not equal with the gsi
number. And when passthrough a device, xl wants to use
gsi to map pirq, see pci_add_dm_done->xc_physdev_map_pirq,
but the gsi number is got from file
/sys/bus/pci/devices//irq in current code, so it
will fail when mapping.

So, use real gsi number read from gsi sysfs.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 tools/libs/light/libxl_pci.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 96cb4da079..9e75f0c263 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1416,7 +1416,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 char *sysfs_path;
 FILE *f;
 unsigned long long start, end, flags, size;
-int irq, i;
+int gsi, i;
 int r;
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 uint32_t domainid = domid;
@@ -1439,7 +1439,7 @@ static void pci_add_dm_done(libxl__egc *egc,
pci->bus, pci->dev, pci->func);
 f = fopen(sysfs_path, "r");
 start = end = flags = size = 0;
-irq = 0;
+gsi = 0;
 
 if (f == NULL) {
 LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
@@ -1478,26 +1478,26 @@ static void pci_add_dm_done(libxl__egc *egc,
 fclose(f);
 if (!pci_supp_legacy_irq())
 goto out_no_irq;
-sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
+sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
 pci->bus, pci->dev, pci->func);
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
 LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
 goto out_no_irq;
 }
-if ((fscanf(f, "%u", ) == 1) && irq) {
-r = xc_physdev_map_pirq(ctx->xch, domid, irq, );
+if ((fscanf(f, "%u", ) == 1) && gsi) {
+r = xc_physdev_map_pirq(ctx->xch, domid, gsi, );
 if (r < 0) {
-LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
-  irq, r);
+LOGED(ERROR, domainid, "xc_physdev_map_pirq gsi=%d (error=%d)",
+  gsi, r);
 fclose(f);
 rc = ERROR_FAIL;
 goto out;
 }
-r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
+r = xc_domain_irq_permission(ctx->xch, domid, gsi, 1);
 if (r < 0) {
 LOGED(ERROR, domainid,
-  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
+  "xc_domain_irq_permission gsi=%d (error=%d)", gsi, r);
 fclose(f);
 rc = ERROR_FAIL;
 goto out;
-- 
2.34.1




[RFC XEN PATCH v3 2/3] x86/pvh: Add (un)map_pirq and setup_gsi for PVH dom0

2023-12-10 Thread Jiqian Chen
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see
xen_pt_realize->xc_physdev_map_pirq and
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
So, allow PHYSDEVOP_map_pirq when currd is dom0 no matter if
dom0 has X86_EMU_USE_PIRQ flag and also allow
PHYSDEVOP_unmap_pirq for the failed path to unmap pirq.

What's more, in PVH dom0, the gsis don't get registered, but
the gsi of a passthrough device must be configured for it to
be able to be mapped into a hvm domU.
So, add PHYSDEVOP_setup_gsi for PVH dom0, because PVH dom0
will setup gsi during assigning a device to passthrough.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index 6ad5b4d5f1..621d789bd3 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -72,8 +72,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 
 switch ( cmd )
 {
+case PHYSDEVOP_setup_gsi:
 case PHYSDEVOP_map_pirq:
 case PHYSDEVOP_unmap_pirq:
+if ( is_hardware_domain(currd) )
+break;
 case PHYSDEVOP_eoi:
 case PHYSDEVOP_irq_status_query:
 case PHYSDEVOP_get_free_pirq:
-- 
2.34.1




[RFC XEN PATCH v3 0/3] Support device passthrough when dom0 is PVH on Xen

2023-12-10 Thread Jiqian Chen
xen-devel/20230312120157.452859-5-ray.hu...@amd.com/ and
xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-5-ray.hu...@amd.com/),
v1 performed "map_pirq" and "register_gsi" on all pci devices on PVH dom0, 
which is unnecessary and
may cause multiple registration.

4. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function
xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu
pass irq to it and treat irq as gsi, it is got from file 
/sys/bus/pci/devices/:xx:xx.x/irq in
function xen_host_pci_device_get(). But actually the gsi number is not equal 
with irq. On PVH dom0,
when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), 
allocation is dynamic, and
follow the principle of applying first, distributing first. And if you debug 
the kernel codes(see
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order,
but the applying gsi number is not, gsi 38 may come before gsi 28, that causes 
gsi 38 get a smaller
irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi,
we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall 
to get gsi from irq)
records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize 
pci devices, and provide
a syscall for userspace to get the gsi from irq. The third patch of xen(tools: 
Add new function to get
gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new 
syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2
patch is the same as v1(
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ and
xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 (
qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (3):
  xen/vpci: Clear all vpci status of device
  x86/pvh: Add (un)map_pirq and setup_gsi for PVH dom0
  libxl: Use gsi instead of irq for mapping pirq

 tools/libs/light/libxl_pci.c | 18 +-
 xen/arch/x86/hvm/hypercall.c |  4 
 xen/drivers/pci/physdev.c| 35 +++
 xen/drivers/vpci/vpci.c  |  9 +
 xen/include/public/physdev.h |  8 
 xen/include/xen/vpci.h   |  6 ++
 6 files changed, 71 insertions(+), 9 deletions(-)

-- 
2.34.1




[RFC XEN PATCH v3 1/3] xen/vpci: Clear all vpci status of device

2023-12-10 Thread Jiqian Chen
When a device has been reset on dom0 side, the vpci on Xen
side won't get notification, so the cached state in vpci is
all out of date compare with the real device state.
To solve that problem, add a new hypercall to clear all vpci
device state. When the state of device is reset on dom0 side,
dom0 can call this hypercall to notify vpci.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 xen/arch/x86/hvm/hypercall.c |  1 +
 xen/drivers/pci/physdev.c| 35 +++
 xen/drivers/vpci/vpci.c  |  9 +
 xen/include/public/physdev.h |  8 
 xen/include/xen/vpci.h   |  6 ++
 5 files changed, 59 insertions(+)

diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
index eeb73e1aa5..6ad5b4d5f1 100644
--- a/xen/arch/x86/hvm/hypercall.c
+++ b/xen/arch/x86/hvm/hypercall.c
@@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 case PHYSDEVOP_pci_mmcfg_reserved:
 case PHYSDEVOP_pci_device_add:
 case PHYSDEVOP_pci_device_remove:
+case PHYSDEVOP_pci_device_state_reset:
 case PHYSDEVOP_dbgp_op:
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
index 42db3e6d13..6ee2edb86a 100644
--- a/xen/drivers/pci/physdev.c
+++ b/xen/drivers/pci/physdev.c
@@ -2,6 +2,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifndef COMPAT
 typedef long ret_t;
@@ -67,6 +68,40 @@ ret_t pci_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
arg)
 break;
 }
 
+case PHYSDEVOP_pci_device_state_reset: {
+struct physdev_pci_device dev;
+struct pci_dev *pdev;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(, arg, 1) != 0 )
+break;
+
+ret = xsm_resource_setup_pci(XSM_PRIV,
+ (dev.seg << 16) | (dev.bus << 8) |
+ dev.devfn);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, PCI_SBDF(dev.seg, dev.bus, dev.devfn));
+if ( !pdev )
+{
+ret = -ENODEV;
+pcidevs_unlock();
+break;
+}
+
+ret = vpci_reset_device_state(pdev);
+if ( ret )
+printk(XENLOG_ERR "PCI reset device %pp state failed\n",
+   >sbdf);
+pcidevs_unlock();
+break;
+}
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index 72ef277c4f..3c64cb10cc 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -107,6 +107,15 @@ int vpci_add_handlers(struct pci_dev *pdev)
 
 return rc;
 }
+
+int vpci_reset_device_state(struct pci_dev *pdev)
+{
+ASSERT(pcidevs_locked());
+
+vpci_remove_device(pdev);
+return vpci_add_handlers(pdev);
+}
+
 #endif /* __XEN__ */
 
 static int vpci_register_cmp(const struct vpci_register *r1,
diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
index f0c0d4727c..92c2f28bca 100644
--- a/xen/include/public/physdev.h
+++ b/xen/include/public/physdev.h
@@ -296,6 +296,14 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * On PVH dom0, when device is reset, the vpci on Xen side
+ * won't get notification, so that the cached state in vpci is
+ * all out of date with the real device state. Use this to reset
+ * the vpci state of device.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index d20c301a3d..d6377424f0 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -30,6 +30,7 @@ int __must_check vpci_add_handlers(struct pci_dev *pdev);
 
 /* Remove all handlers and free vpci related structures. */
 void vpci_remove_device(struct pci_dev *pdev);
+int vpci_reset_device_state(struct pci_dev *pdev);
 
 /* Add/remove a register handler. */
 int __must_check vpci_add_register_mask(struct vpci *vpci,
@@ -262,6 +263,11 @@ static inline int vpci_add_handlers(struct pci_dev *pdev)
 
 static inline void vpci_remove_device(struct pci_dev *pdev) { }
 
+static inline int vpci_reset_device_state(struct pci_dev *pdev)
+{
+return 0;
+}
+
 static inline void vpci_dump_msi(void) { }
 
 static inline uint32_t vpci_read(pci_sbdf_t sbdf, unsigned int reg,
-- 
2.34.1




[RFC KERNEL PATCH v3 3/3] PCI/sysfs: Add gsi sysfs for pci_dev

2023-12-10 Thread Jiqian Chen
There is a need for some scenarios to use gsi sysfs.
For example, when xen passthrough a device to dumU, it will
use gsi to map pirq, but currently userspace can't get gsi
number.
So, add gsi sysfs for that and for other potential scenarios.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 drivers/acpi/pci_irq.c  |  1 +
 drivers/pci/pci-sysfs.c | 11 +++
 include/linux/pci.h |  2 ++
 3 files changed, 14 insertions(+)

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index 630fe0a34bc6..739a58755df2 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
kfree(entry);
return 0;
}
+   dev->gsi = gsi;
 
rc = acpi_register_gsi(>dev, gsi, triggering, polarity);
if (rc < 0) {
diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 2321fdfefd7d..c51df88d079e 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(irq);
 
+static ssize_t gsi_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buf)
+{
+   struct pci_dev *pdev = to_pci_dev(dev);
+
+   return sysfs_emit(buf, "%u\n", pdev->gsi);
+}
+static DEVICE_ATTR_RO(gsi);
+
 static ssize_t broken_parity_status_show(struct device *dev,
 struct device_attribute *attr,
 char *buf)
@@ -596,6 +606,7 @@ static struct attribute *pci_dev_attrs[] = {
_attr_revision.attr,
_attr_class.attr,
_attr_irq.attr,
+   _attr_gsi.attr,
_attr_local_cpus.attr,
_attr_local_cpulist.attr,
_attr_modalias.attr,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60ca768bc867..7ef9060b239c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -529,6 +529,8 @@ struct pci_dev {
 
/* These methods index pci_reset_fn_methods[] */
u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
+
+   unsigned intgsi;
 };
 
 static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
-- 
2.34.1




[RFC KERNEL PATCH v3 2/3] xen/pvh: Setup gsi and map pirq for passthrough device

2023-12-10 Thread Jiqian Chen
When dom0 is PVH, the gsi isn't be unmasked, that causes two
problems.

First, in PVH dom0, the gsis don't get registered, but the gsi of
a passthrough device must be configured for it to be able to be
mapped into a domU.

When assign a device to passthrough, proactively setup the gsi
of the device during that process.

Second, for hvm guest, it allocates a pirq and irq for a
passthrough device by using gsi, before that, the gsi must first
have a mapping in dom0, see Xen code
pci_add_dm_done->xc_domain_irq_permission, it calls into Xen and
check whether dom0 has the mapping. But currently PVH dom0 uses
the kernel local interrupt mechanism instead of the pirq. So if
passthrough a device to guest on PVH dom0, it will fail at the
permission checking.

When assign a device to passthrough, proactively map priq for the
gsi of the device during that process.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 arch/x86/xen/enlighten_pvh.c   | 116 +
 drivers/acpi/pci_irq.c |   2 +-
 drivers/xen/xen-pciback/pci_stub.c |   8 ++
 include/linux/acpi.h   |   1 +
 include/xen/acpi.h |   1 +
 5 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
index ada3868c02c2..d74a221bfb81 100644
--- a/arch/x86/xen/enlighten_pvh.c
+++ b/arch/x86/xen/enlighten_pvh.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include 
 #include 
+#include 
 
 #include 
 
@@ -25,6 +26,121 @@
 bool __ro_after_init xen_pvh;
 EXPORT_SYMBOL_GPL(xen_pvh);
 
+typedef struct gsi_info {
+   u32 gsi;
+   int trigger;
+   int polarity;
+   int pirq;
+} gsi_info_t;
+
+struct acpi_prt_entry {
+   struct acpi_pci_id  id;
+   u8  pin;
+   acpi_handle link;
+   u32 index;  /* GSI, or link _CRS index */
+};
+
+static int xen_pvh_get_gsi_info(struct pci_dev *dev,
+   gsi_info_t 
*gsi_info)
+{
+   int gsi;
+   u8 pin = 0;
+   struct acpi_prt_entry *entry;
+   int trigger = ACPI_LEVEL_SENSITIVE;
+   int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
+ ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
+
+   if (dev)
+   pin = dev->pin;
+   if (!dev || !pin || !gsi_info)
+   return -EINVAL;
+
+   entry = acpi_pci_irq_lookup(dev, pin);
+   if (entry) {
+   if (entry->link)
+   gsi = acpi_pci_link_allocate_irq(entry->link,
+entry->index,
+, ,
+NULL);
+   else
+   gsi = entry->index;
+   } else
+   return -EINVAL;
+
+   if (gsi < 0)
+   return -EINVAL;
+
+   gsi_info->gsi = gsi;
+   gsi_info->trigger = trigger;
+   gsi_info->polarity = polarity;
+
+   return 0;
+}
+
+static int xen_pvh_setup_gsi(gsi_info_t *gsi_info)
+{
+   struct physdev_setup_gsi setup_gsi;
+
+   if (!gsi_info)
+   return -EINVAL;
+
+   setup_gsi.gsi = gsi_info->gsi;
+   setup_gsi.triggering = (gsi_info->trigger == ACPI_EDGE_SENSITIVE ? 0 : 
1);
+   setup_gsi.polarity = (gsi_info->polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, _gsi);
+}
+
+static int xen_pvh_map_pirq(gsi_info_t *gsi_info)
+{
+   struct physdev_map_pirq map_irq;
+   int ret;
+
+   if (!gsi_info)
+   return -EINVAL;
+
+   map_irq.domid = DOMID_SELF;
+   map_irq.type = MAP_PIRQ_TYPE_GSI;
+   map_irq.index = gsi_info->gsi;
+   map_irq.pirq = gsi_info->gsi;
+
+   ret = HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, _irq);
+   gsi_info->pirq = map_irq.pirq;
+
+   return ret;
+}
+
+int xen_pvh_passthrough_gsi(struct pci_dev *dev)
+{
+   int ret;
+   gsi_info_t gsi_info;
+
+   if (!dev)
+   return -EINVAL;
+
+   ret = xen_pvh_get_gsi_info(dev, _info);
+   if (ret) {
+   xen_raw_printk("Fail to get gsi info!\n");
+   return ret;
+   }
+
+   ret = xen_pvh_setup_gsi(_info);
+   if (ret == -EEXIST) {
+   ret = 0;
+   xen_raw_printk("Already setup the GSI :%u\n", gsi_info.gsi);
+   } else if (ret) {
+   xen_raw_printk("Fail to setup gsi (%d)!\n", gsi_info.gsi);
+   return ret;
+   }
+
+   ret = xen_pvh_map_pirq(_info);
+   if (ret)
+   xen_raw_printk("Fail to map pirq for gsi (%d)!\n", 
gsi_info.gsi);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(xen_pvh_passthrough_gsi);
+
 void __init xen_pvh_init(struct boot_params *boot_

[RFC KERNEL PATCH v3 1/3] xen/pci: Add xen_reset_device_state function

2023-12-10 Thread Jiqian Chen
When device on dom0 side has been reset, the vpci on Xen side
won't get notification, so that the cached state in vpci is
all out of date with the real device state.
To solve that problem, add a new function to clear all vpci
device state when device is reset on dom0 side.

And call that function in pcistub_init_device. Because when
using "pci-assignable-add" to assign a passthrough device in
Xen, it will reset passthrough device and the vpci state will
out of date, and then device will fail to restore bar state.

Co-developed-by: Huang Rui 
Signed-off-by: Jiqian Chen 
---
 drivers/xen/pci.c  | 12 
 drivers/xen/xen-pciback/pci_stub.c |  4 
 include/xen/interface/physdev.h|  8 
 include/xen/pci.h  |  6 ++
 4 files changed, 30 insertions(+)

diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
index 72d4e3f193af..e9b30bc09139 100644
--- a/drivers/xen/pci.c
+++ b/drivers/xen/pci.c
@@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev)
return r;
 }
 
+int xen_reset_device_state(const struct pci_dev *dev)
+{
+   struct physdev_pci_device device = {
+   .seg = pci_domain_nr(dev->bus),
+   .bus = dev->bus->number,
+   .devfn = dev->devfn
+   };
+
+   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, );
+}
+EXPORT_SYMBOL_GPL(xen_reset_device_state);
+
 static int xen_pci_notifier(struct notifier_block *nb,
unsigned long action, void *data)
 {
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index e34b623e4b41..24f599eaec14 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -421,6 +421,10 @@ static int pcistub_init_device(struct pci_dev *dev)
else {
dev_dbg(>dev, "resetting (FLR, D3, etc) the device\n");
__pci_reset_function_locked(dev);
+   if (!xen_pv_domain())
+   err = xen_reset_device_state(dev);
+   if (err)
+   goto config_release;
pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h
index a237af867873..bed53afc4c52 100644
--- a/include/xen/interface/physdev.h
+++ b/include/xen/interface/physdev.h
@@ -256,6 +256,14 @@ struct physdev_pci_device_add {
  */
 #define PHYSDEVOP_prepare_msix  30
 #define PHYSDEVOP_release_msix  31
+/*
+ * On PVH dom0, when device is reset, the vpci on Xen side
+ * won't get notification, so that the cached state in vpci is
+ * all out of date with the real device state. Use this to reset
+ * the vpci state of device.
+ */
+#define PHYSDEVOP_pci_device_state_reset 32
+
 struct physdev_pci_device {
 /* IN */
 uint16_t seg;
diff --git a/include/xen/pci.h b/include/xen/pci.h
index b8337cf85fd1..b2e2e856efd6 100644
--- a/include/xen/pci.h
+++ b/include/xen/pci.h
@@ -4,10 +4,16 @@
 #define __XEN_PCI_H__
 
 #if defined(CONFIG_XEN_DOM0)
+int xen_reset_device_state(const struct pci_dev *dev);
 int xen_find_device_domain_owner(struct pci_dev *dev);
 int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
 int xen_unregister_device_domain_owner(struct pci_dev *dev);
 #else
+static inline int xen_reset_device_state(const struct pci_dev *dev)
+{
+   return -1;
+}
+
 static inline int xen_find_device_domain_owner(struct pci_dev *dev)
 {
return -1;
-- 
2.34.1




[RFC KERNEL PATCH v3 0/3] Support device passthrough when dom0 is PVH on Xen

2023-12-10 Thread Jiqian Chen
m: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function
xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu
pass irq to it and treat irq as gsi, it is got from file 
/sys/bus/pci/devices/:xx:xx.x/irq in
function xen_host_pci_device_get(). But actually the gsi number is not equal 
with irq. On PVH dom0,
when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), 
allocation is dynamic, and
follow the principle of applying first, distributing first. And if you debug 
the kernel codes(see
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order,
but the applying gsi number is not, gsi 38 may come before gsi 28, that causes 
gsi 38 get a smaller
irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi,
we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall 
to get gsi from irq)
records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize 
pci devices, and provide
a syscall for userspace to get the gsi from irq. The third patch of xen(tools: 
Add new function to get
gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new 
syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success. This v2
patch is the same as v1(
kernel 
https://lore.kernel.org/xen-devel/20230312120157.452859-6-ray.hu...@amd.com/ and
xen 
https://lore.kernel.org/xen-devel/20230312075455.450187-6-ray.hu...@amd.com/)

About the v2 patch of qemu, just change an included head file, other are 
similar to the v1 (
qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

Jiqian Chen (3):
  xen/pci: Add xen_reset_device_state function
  xen/pvh: Setup gsi and map pirq for passthrough device
  PCI/sysfs: Add gsi sysfs for pci_dev

 arch/x86/xen/enlighten_pvh.c   | 116 +
 drivers/acpi/pci_irq.c |   3 +-
 drivers/pci/pci-sysfs.c|  11 +++
 drivers/xen/pci.c  |  12 +++
 drivers/xen/xen-pciback/pci_stub.c |  12 +++
 include/linux/acpi.h   |   1 +
 include/linux/pci.h|   2 +
 include/xen/acpi.h |   1 +
 include/xen/interface/physdev.h|   8 ++
 include/xen/pci.h  |   6 ++
 10 files changed, 171 insertions(+), 1 deletion(-)

-- 
2.34.1




[RFC QEMU PATCH v2 1/1] xen/pci: get gsi from irq for passthrough devices

2023-11-24 Thread Jiqian Chen
In PVH dom0, it uses the linux local interrupt mechanism,
when it allocs irq for a gsi, it is dynamic, and follow
the principle of applying first, distributing first. And
if you debug the kernel codes, you will find the irq
number is alloced from small to large, but the applying
gsi number is not, may gsi 38 comes before gsi 28, that
causes the irq number is not equal with the gsi number.
And when we passthrough a device, QEMU will use its gsi
number to do mapping actions, see xen_pt_realize->
xc_physdev_map_pirq, but the gsi number is got from file
/sys/bus/pci/devices/:xx:xx.x/irq in current code,
that is irq not gsi, so it will fail when mapping.

For above reason, on Xen side, we add a new function to
translate irq to gsi. And at here, we call that function
to get the correct gsi number.

Signed-off-by: Jiqian Chen 
Signed-off-by: Huang Rui 
---
 hw/xen/xen-host-pci-device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/xen/xen-host-pci-device.c b/hw/xen/xen-host-pci-device.c
index 8c6e9a1716..00218f9080 100644
--- a/hw/xen/xen-host-pci-device.c
+++ b/hw/xen/xen-host-pci-device.c
@@ -10,6 +10,7 @@
 #include "qapi/error.h"
 #include "qemu/cutils.h"
 #include "xen-host-pci-device.h"
+#include "hw/xen/xen_native.h"
 
 #define XEN_HOST_PCI_MAX_EXT_CAP \
 ((PCIE_CONFIG_SPACE_SIZE - PCI_CONFIG_SPACE_SIZE) / (PCI_CAP_SIZEOF + 4))
@@ -368,7 +369,7 @@ void xen_host_pci_device_get(XenHostPCIDevice *d, uint16_t 
domain,
 if (*errp) {
 goto error;
 }
-d->irq = v;
+d->irq = xc_physdev_gsi_from_irq(xen_xc, v);
 
 xen_host_pci_get_hex_value(d, "class", , errp);
 if (*errp) {
-- 
2.34.1




[RFC QEMU PATCH v2 0/1] Support device passthrough when dom0 is PVH on Xen

2023-11-24 Thread Jiqian Chen
Hi All,

This patch is the v2 of the implementation of passthrough when dom0 is PVH on 
Xen.

Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device’s gsi 
to pirq in function
xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi 
instead of irq, but qemu
pass irq to it and treat irq as gsi, it is got from file 
/sys/bus/pci/devices/:xx:xx.x/irq in
function xen_host_pci_device_get(). But actually the gsi number is not equal 
with irq. On PVH dom0,
when it allocates irq for a gsi in function acpi_register_gsi_ioapic(), 
allocation is dynamic, and
follow the principle of applying first, distributing first. And if you debug 
the kernel codes(see
function __irq_alloc_descs), you will find the irq number is allocated from 
small to large by order,
but the applying gsi number is not, gsi 38 may come before gsi 28, that causes 
gsi 38 get a smaller
irq number than gsi 28, and then gsi != irq.

Solution: we can record the relation between gsi and irq, then when 
userspace(qemu) want to use gsi,
we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall 
to get gsi from irq)
records all the relations in acpi_register_gsi_xen_pvh() when dom0 initialize 
pci devices, and provide
a syscall for userspace to get the gsi from irq. The third patch of xen(tools: 
Add new function to get
gsi from irq) add a new function xc_physdev_gsi_from_irq() to call the new 
syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() 
will success.

This v2 on qemu side is the same as the v1 (
qemu 
https://lore.kernel.org/xen-devel/20230312092244.451465-19-ray.hu...@amd.com/), 
just call
xc_physdev_gsi_from_irq() to get gsi from irq.

v2 on kernel side:
https://lore.kernel.org/lkml/20231124103123.3263471-1-jiqian.c...@amd.com/T/#t

v2 on Xen side:
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#t

Jiqian Chen (1):
  xen/pci: get gsi from irq for passthrough devices

 hw/xen/xen-host-pci-device.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.34.1




  1   2   >