from:"Chen, Jiqian"

Re: [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev

2024-08-01 Thread Chen, Jiqian

On 2024/8/1 21:01, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:23PM +0800, Jiqian Chen wrote:
>> When passthrough a device to domU, QEMU and xl tools use its gsi
>> number to do pirq mapping, see QEMU code
>> xen_pt_realize->xc_physdev_map_pirq, and xl code
>> pci_add_dm_done->xc_physdev_map_pirq, but the gsi number is got
>> from file /sys/bus/pci/devices//irq, that is wrong, because
>> irq is not equal with gsi, they are in different spaces, so pirq
>> mapping fails.
>>
>> And in current codes, there is no method to get gsi for userspace.
>> For above purpose, add new function to get gsi, and the
>> corresponding ioctl is implemented on linux kernel side.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Chen Jiqian 
>> ---
>> RFC: it needs to wait for the corresponding third patch on linux kernel side 
>> to be merged.
>> https://lore.kernel.org/xen-devel/20240607075109.126277-4-jiqian.c...@amd.com/
>> This patch must be merged after the patch on linux kernel side
>>
>> CC: Anthony PERARD 
>> Remaining comment @Anthony PERARD:
>> Do I need to make " opening of /dev/xen/privcmd " as a single function, then 
>> use it in this
>> patch and other libraries?
>> ---
>>  tools/include/xen-sys/Linux/privcmd.h |  7 ++
>>  tools/include/xenctrl.h   |  2 ++
>>  tools/libs/ctrl/xc_physdev.c  | 35 +++
>>  3 files changed, 44 insertions(+)
>>
>> diff --git a/tools/include/xen-sys/Linux/privcmd.h 
>> b/tools/include/xen-sys/Linux/privcmd.h
>> index bc60e8fd55eb..4cf719102116 100644
>> --- a/tools/include/xen-sys/Linux/privcmd.h
>> +++ b/tools/include/xen-sys/Linux/privcmd.h
>> @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
>>  __u64 addr;
>>  } privcmd_mmap_resource_t;
>>  
>> +typedef struct privcmd_gsi_from_pcidev {
>> +__u32 sbdf;
>> +__u32 gsi;
>> +} privcmd_gsi_from_pcidev_t;
>> +
>>  /*
>>   * @cmd: IOCTL_PRIVCMD_HYPERCALL
>>   * @arg: _hypercall_t
>> @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource {
>>  _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
>>  #define IOCTL_PRIVCMD_MMAP_RESOURCE \
>>  _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t))
>> +#define IOCTL_PRIVCMD_GSI_FROM_PCIDEV   \
>> +_IOC(_IOC_NONE, 'P', 10, sizeof(privcmd_gsi_from_pcidev_t))
>>  #define IOCTL_PRIVCMD_UNIMPLEMENTED \
>>  _IOC(_IOC_NONE, 'P', 0xFF, 0)
>>  
>> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
>> index 9ceca0cffc2f..3720e22b399a 100644
>> --- a/tools/include/xenctrl.h
>> +++ b/tools/include/xenctrl.h
>> @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>uint32_t domid,
>>int pirq);
>>  
>> +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf);
>> +
>>  /*
>>   *  LOGGING AND ERROR REPORTING
>>   */
>> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
>> index e9fcd755fa62..54edb0f3c0dc 100644
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>  return rc;
>>  }
>>  
>> +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)
> 
> FWIW, I'm not sure it's fine to use the xc_physdev prefix here, as
> this is not a PHYSDEVOP hypercall.
> 
> As Anthony suggested, it would be better placed in xc_linux.c, and
> possibly named xc_pcidev_get_gsi() or similar, to avoid polluting the
> xc_physdev namespace.
Thanks, will change in next version.

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi

2024-08-01 Thread Chen, Jiqian

On 2024/8/1 19:06, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:21PM +0800, Jiqian Chen wrote:
>> Some type of domains don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq to set the access of irq, it is not suitable for
>> dom0 that doesn't have PIRQs.
>>
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
>> the permission of irq(translate from x86 gsi) to dumU when dom0
>^ missing space, and s/translate/translated/
> 
>> has no PIRQs.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>> CC: Daniel P . Smith 
>> Remaining comment @Daniel P . Smith:
>> +ret = -EPERM;
>> +if ( !irq_access_permitted(currd, irq) ||
>> + xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +goto gsi_permission_out;
>> Is it okay to issue the XSM check using the translated value, 
>> not the one that was originally passed into the hypercall?
> 
> FWIW, I don't see the GSI -> IRQ translation much different from the
> pIRQ -> IRQ translation done by pirq_access_permitted(), which is also
> ahead of the xsm check.
> 
>> ---
>>  xen/arch/x86/domctl.c  | 32 ++
>>  xen/arch/x86/include/asm/io_apic.h |  2 ++
>>  xen/arch/x86/io_apic.c | 17 
>>  xen/arch/x86/mpparse.c |  5 ++---
>>  xen/include/public/domctl.h|  9 +
>>  xen/xsm/flask/hooks.c  |  1 +
>>  6 files changed, 63 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
>> index 9190e11faaa3..4e9e4c4cfed3 100644
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -36,6 +36,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  static int update_domain_cpu_policy(struct domain *d,
>>  xen_domctl_cpu_policy_t *xdpc)
>> @@ -237,6 +238,37 @@ long arch_do_domctl(
>>  break;
>>  }
>>  
>> +case XEN_DOMCTL_gsi_permission:
>> +{
>> +int irq;
>> +unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +uint8_t access_flag = domctl->u.gsi_permission.access_flag;
>> +
>> +/* Check all bits and pads are zero except lowest bit */
>> +ret = -EINVAL;
>> +if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) )
>   ^ unneeded parentheses and spaces.
>> +goto gsi_permission_out;
>> +for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
>> +if ( domctl->u.gsi_permission.pad[i] )
>> +goto gsi_permission_out;
>> +
>> +if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 )
> 
> FWIW, I would place the gsi > highest_gsi() check inside gsi_2_irq().
> There's no reason to open-code it here, and it could help other
> users of gsi_2_irq().  The error code could also be ERANGE here
> instead of EINVAL IMO.
> 
>> +goto gsi_permission_out;
>> +
>> +ret = -EPERM;
>> +if ( !irq_access_permitted(currd, irq) ||
>> + xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +goto gsi_permission_out;
>> +
>> +if ( access_flag )
>> +ret = irq_permit_access(d, irq);
>> +else
>> +ret = irq_deny_access(d, irq);
>> +
>> +gsi_permission_out:
>> +break;
> 
> Why do you need a label when it just contains a break?  Instead of the
> goto gsi_permission_out just use break directly.
> 
>> +}
>> +
>>  case XEN_DOMCTL_getpageframeinfo3:
>>  {
>>  unsigned int num = domctl->u.getpageframeinfo3.num;
>> diff --git a/xen/arch/x86/include/asm/io_apic.h 
>> b/xen/arch/x86/include/asm/io_apic.h
>> index 78268ea8f666..7e86d8337758 100644
>> --- a/xen/arch/x86/include/asm/io_apic.h
>> +++ b/xen/arch/x86/include/asm/io_apic.h
>> @@ -213,5 +213,7 @@ unsigned highest_gsi(void);
>>  
>>  int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
>>  int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
>> +int mp_find_ioapic(int gsi);
>> +int gsi_2_irq(int gsi);
>>  
>>  #endif
>> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
>> index d2a313c4ac72..5968c8055671 100644
>> --- a/xen/arch/x86/io_apic.c
>> +++ b/xen/arch/x86/io_apic.c
>> @@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
>>  return irq;
>>  }
>>  
>> +int gsi_2_irq(int gsi)
> 
> unsigned int for gsi.
> 
>> +{
>> +int ioapic, pin, irq;
> 
> pin would better be unsigned int also.
> 
>> +
>> +ioapic = mp_find_ioapic(gsi);
>> +if (

Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev

2024-08-01 Thread Chen, Jiqian

On 2024/8/1 14:49, Jan Beulich wrote:
> On 31.07.2024 18:13, Roger Pau Monné wrote:
>> On Wed, Jul 31, 2024 at 05:58:54PM +0200, Jan Beulich wrote:
>>> On 31.07.2024 17:55, Roger Pau Monné wrote:
 On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>  
>  return rc;
>  }
> +
> +int vpci_reset_device_state(struct pci_dev *pdev,
> +uint32_t reset_type)

 There's probably no use in passing reset_type to
 vpci_reset_device_state() if it's ignored?
>>>
>>> I consider this forward-looking. It seems rather unlikely that in the
>>> longer run the reset type doesn't matter.
>>
>> I'm fine with having it in the hypercall interface, but passing it to
>> vpci_reset_device_state() can be done once there's a purpose for it,
>> and it won't change any public facing interface.
> 
> Jiqian, just to clarify: I'm okay either way.
Thank you very much! You dispelled my concerns.
I will remove reset_type in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev

2024-08-01 Thread Chen, Jiqian

On 2024/7/31 23:55, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:18PM +0800, Jiqian Chen wrote:
>> When a device has been reset on dom0 side, the Xen hypervisor
>> doesn't get notification, so the cached state in vpci is all
>> out of date compare with the real device state.
>>
>> To solve that problem, add a new hypercall to support the reset
>> of pcidev and clear the vpci state of device. So that once the
>> state of device is reset on dom0 side, dom0 can call this
>> hypercall to notify hypervisor.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> Reviewed-by: Stewart Hildebrand 
>> Reviewed-by: Stefano Stabellini 
> 
> Thanks, just a couple of nits.
> 
> This is missing a changelog between versions, and I haven't been
> following all the versions, so some of my questions might have been
> answered in previous revisions.
Sorry, I will add changelogs here in next version.

> 
>> ---
>>  xen/arch/x86/hvm/hypercall.c |  1 +
>>  xen/drivers/pci/physdev.c| 52 
>>  xen/drivers/vpci/vpci.c  | 10 +++
>>  xen/include/public/physdev.h | 16 +++
>>  xen/include/xen/vpci.h   |  8 ++
>>  5 files changed, 87 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 7fb3136f0c7c..0fab670a4871 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -83,6 +83,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
>> arg)
>>  case PHYSDEVOP_pci_mmcfg_reserved:
>>  case PHYSDEVOP_pci_device_add:
>>  case PHYSDEVOP_pci_device_remove:
>> +case PHYSDEVOP_pci_device_state_reset:
>>  case PHYSDEVOP_dbgp_op:
>>  if ( !is_hardware_domain(currd) )
>>  return -ENOSYS;
>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
>> index 42db3e6d133c..c0f47945d955 100644
>> --- a/xen/drivers/pci/physdev.c
>> +++ b/xen/drivers/pci/physdev.c
>> @@ -2,6 +2,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #ifndef COMPAT
>>  typedef long ret_t;
>> @@ -67,6 +68,57 @@ ret_t pci_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  break;
>>  }
>>  
>> +case PHYSDEVOP_pci_device_state_reset:
>> +{
>> +struct pci_device_state_reset dev_reset;
>> +struct pci_dev *pdev;
>> +pci_sbdf_t sbdf;
>> +
>> +ret = -EOPNOTSUPP;
>> +if ( !is_pci_passthrough_enabled() )
>> +break;
>> +
>> +ret = -EFAULT;
>> +if ( copy_from_guest(_reset, arg, 1) != 0 )
>> +break;
>> +
>> +sbdf = PCI_SBDF(dev_reset.dev.seg,
>> +dev_reset.dev.bus,
>> +dev_reset.dev.devfn);
>> +
>> +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>> +if ( ret )
>> +break;
>> +
>> +pcidevs_lock();
>> +pdev = pci_get_pdev(NULL, sbdf);
>> +if ( !pdev )
>> +{
>> +pcidevs_unlock();
>> +ret = -ENODEV;
>> +break;
>> +}
>> +
>> +write_lock(>domain->pci_lock);
>> +pcidevs_unlock();
>> +switch ( dev_reset.reset_type )
>> +{
>> +case PCI_DEVICE_STATE_RESET_COLD:
>> +case PCI_DEVICE_STATE_RESET_WARM:
>> +case PCI_DEVICE_STATE_RESET_HOT:
>> +case PCI_DEVICE_STATE_RESET_FLR:
>> +ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
>> +break;
>> +
>> +default:
>> +ret = -EOPNOTSUPP;
>> +break;
>> +}
>> +write_unlock(>domain->pci_lock);
>> +
>> +break;
>> +}
>> +
>>  default:
>>  ret = -ENOSYS;
>>  break;
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 1e6aa5d799b9..7e914d1eff9f 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -172,6 +172,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>  
>>  return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev,
>> +uint32_t reset_type)
> 
> There's probably no use in passing reset_type to
> vpci_reset_device_state() if it's ignored?
> 
>> +{
>> +ASSERT(rw_is_write_locked(>domain->pci_lock));
>> +
>> +vpci_deassign_device(pdev);
>> +return vpci_assign_device(pdev);
>> +}
>> +
>>  #endif /* __XEN__ */
>>  
>>  static int vpci_register_cmp(const struct vpci_register *r1,
>> diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h
>> index f0c0d4727c0b..3cfde3fd2389 100644
>> --- a/xen/include/public/physdev.h
>> +++ b/xen/include/public/physdev.h
>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>>   */
>>  #define PHYSDEVOP_prepare_msix  30
>>  #define PHYSDEVOP_release_msix  31
>> +/*
>> + * Notify the hypervisor that a PCI device has been reset, so that any
>> + * internally cached

Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-08-01 Thread Chen, Jiqian

On 2024/7/31 21:03, Roger Pau Monné wrote:
> On Wed, Jul 31, 2024 at 01:39:40PM +0200, Jan Beulich wrote:
>> On 31.07.2024 13:29, Roger Pau Monné wrote:
>>> On Wed, Jul 31, 2024 at 11:55:35AM +0200, Jan Beulich wrote:
 On 31.07.2024 11:37, Roger Pau Monné wrote:
> On Wed, Jul 31, 2024 at 11:02:01AM +0200, Jan Beulich wrote:
>> On 31.07.2024 10:51, Roger Pau Monné wrote:
>>> I agree with (a), but I don't think enabling PVH dom0 usage of the
>>> hypercalls should be gated on this.  As said a PV dom0 is already
>>> capable of issuing PHYSDEVOP_{,un}map_pirq operations against a PVH
>>> domU.
>>
>> Okay, I can accept that as an intermediate position. We ought to deny
>> such requests at some point though for PVH domains, the latest in the
>> course of making vPCI work there.
>
> Hm, once physdev_map_pirq() works as intended against PVH domains, I
> don't see why we would prevent the usage of PHYSDEVOP_{,un}map_pirq
> against such domains.

 Well. If it can be made work as intended, then I certainly agree. However,
 without even the concept of pIRQ in PVH I'm having a hard time seeing how
 it can be made work. Iirc you were advocating for us to not introduce pIRQ
 into PVH.
>>>
>>> From what I'm seeing here the intention is to expose
>>> PHYSDEVOP_{,un}map_pirq to PVH dom0, so there must be some notion of
>>> pIRQs or akin in a PVH dom0?  Even if only for passthrough needs.
>>
>> Only in so far as it is an abstract, handle-like value pertaining solely
>> to the target domain.
>>
 Maybe you're thinking of re-using the sub-ops, requiring PVH domains to
 pass in GSIs?
>>>
>>> I think that was one my proposals, to either introduce a new
>>> hypercall that takes a GSI, or to modify the PHYSDEVOP_{,un}map_pirq
>>> in an ABI compatible way so that semantically the field could be a GSI
>>> rather than a pIRQ.  We however would also need a way to reference an
>>> MSI entry.
>>
>> Of course.
>>
>>> My main concern is not with pIRQs by itself, pIRQs are just an
>>> abstract way to reference interrupts, my concern and what I wanted to
>>> avoid on PVH is being able to route pIRQs over event channels.  IOW:
>>> have interrupts from physical devices delivered over event channels.
>>
>> Oh, I might have slightly misunderstood your intentions then.
> 
> My intention would be to not even use pIRQs at all, in order to avoid
> the temptation of the guest itself managing interrupts using
> hypercalls, hence I would have preferred that abstract interface to be
> something else.
> 
> Maybe we could even expose the Xen IRQ space directly, and just use
> that as interrupt handles, but since I'm not the one doing the work
> I'm not sure it's fair to ask for something that would require more
> changes internally to Xen.
> 
 I think I suggested something along these lines also to
 Jiqian, yet with the now intended exposure to !has_pirq() domains I'm
 not sure this could be made work reliably.
>>>
>>> I'm afraid I've been lacking behind on reviewing those series.
>>>
 Which reminds me of another question I had: What meaning does the pirq
 field have right now, if Dom0 would issue the request against a PVH DomU?
 What meaning will it have for a !has_pirq() HVM domain?
>>>
>>> The pirq field could be a way to reference an interrupt.  It doesn't
>>> need to be exposed to the PVH domU at all, but it's a way for the
>>> device model to identify which interrupt should be mapped to which
>>> domain.
>>
>> Since pIRQ-s are per-domain, _that_ kind of association won't be
>> helped. But yes, as per above it could serve as an abstract handle-
>> like value.
> 
> I would be fine with doing the interrupt bindings based on IRQs
> instead of pIRQs, but I'm afraid that would require more changes to
> hypercalls and Xen internals.
> 
> At some point I need to work on a new interface to do passthrough, so
> that we can remove the usage of domctls from QEMU.  That might be a
> good opportunity to switch from using pIRQs.

Thanks for your input, but I may be a bit behind you with my knowledge and 
can't fully understand the discussion.
How should I modify this question later?
Should I add a new hypercall specifically for passthrough?
Or if it is to prevent the (un)map from being used for PVH guests, can I just 
add a new function to check if the subject domain is a PVH type? Like 
is_pvh_domain().

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-07-31 Thread Chen, Jiqian

On 2024/7/31 15:50, Roger Pau Monné wrote:
> On Mon, Jul 08, 2024 at 07:41:19PM +0800, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
>> And add a new check to prevent (un)map when the subject domain
>> doesn't have a notion of PIRQ.
>>
>> So that the interrupt of a passthrough device can be
>> successfully mapped to pirq for domU with a notion of PIRQ
>> when dom0 is PVH
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>>  xen/arch/x86/hvm/hypercall.c |  6 ++
>>  xen/arch/x86/physdev.c   | 12 ++--
>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 0fab670a4871..03ada3c880bd 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
>> arg)
>>  
>>  switch ( cmd )
>>  {
>> +/*
>> +* Only being permitted for management of other domains.
>> +* Further restrictions are enforced in do_physdev_op.
>> +*/
>>  case PHYSDEVOP_map_pirq:
>>  case PHYSDEVOP_unmap_pirq:
>> +break;
>> +
>>  case PHYSDEVOP_eoi:
>>  case PHYSDEVOP_irq_status_query:
>>  case PHYSDEVOP_get_free_pirq:
>> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
>> index d6dd622952a9..9f30a8c63a06 100644
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  if ( !d )
>>  break;
>>  
>> -ret = physdev_map_pirq(d, map.type, , , );
>> +/* Only mapping when the subject domain has a notion of PIRQ */
>> +if ( !is_hvm_domain(d) || has_pirq(d) )
> 
> I'm afraid this is not true.  It's fine to map interrupts to HVM
> domains that don't have XENFEAT_hvm_pirqs enabled.  has_pirq() simply
> allow HVM domains to route interrupts from devices (either emulated or
> passed through) over event channels.
> 
> It might have worked in the past (when using a version of Xen < 4.19)
> because XENFEAT_hvm_pirqs was enabled by default for HVM guests.
> 
> physdev_map_pirq() will work fine when used against domains that don't
> have XENFEAT_hvm_pirqs enabled, and it needs to be kept this way.
> 
> I think you want to allow PHYSDEVOP_{,un}map_pirq for HVM domains, but
> keep the code in do_physdev_op() as-is.  You will have to check
> whether the current paths in do_physdev_op() are not making
> assumptions about XENFEAT_hvm_pirqs being enabled when the calling
> domain is of HVM type.  I don't think that's the case, but better
> check.
If I understand correctly, you also talked about preventing self-mapping when 
the domain is HVM type and doesn't has XENFEAT_hvm_pirqs.
Change to this?
if ( !is_hvm_domain(d) || has_pirq(d) || d != currd )
ret = physdev_map_pirq(d, map.type, , , );
else
ret = -EOPNOTSUPP;

> 
> Thanks, Roger.

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-07-31 Thread Chen, Jiqian

On 2024/7/30 21:09, Andrew Cooper wrote:
> On 08/07/2024 12:41 pm, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
>> And add a new check to prevent (un)map when the subject domain
>> doesn't have a notion of PIRQ.
>>
>> So that the interrupt of a passthrough device can be
>> successfully mapped to pirq for domU with a notion of PIRQ
>> when dom0 is PVH
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>>  xen/arch/x86/hvm/hypercall.c |  6 ++
>>  xen/arch/x86/physdev.c   | 12 ++--
>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 0fab670a4871..03ada3c880bd 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
>> arg)
>>  
>>  switch ( cmd )
>>  {
>> +/*
>> +* Only being permitted for management of other domains.
>> +* Further restrictions are enforced in do_physdev_op.
>> +*/
>>  case PHYSDEVOP_map_pirq:
>>  case PHYSDEVOP_unmap_pirq:
>> +break;
>> +
>>  case PHYSDEVOP_eoi:
>>  case PHYSDEVOP_irq_status_query:
>>  case PHYSDEVOP_get_free_pirq:
>> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
>> index d6dd622952a9..9f30a8c63a06 100644
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  if ( !d )
>>  break;
>>  
>> -ret = physdev_map_pirq(d, map.type, , , );
>> +/* Only mapping when the subject domain has a notion of PIRQ */
>> +if ( !is_hvm_domain(d) || has_pirq(d) )
>> +ret = physdev_map_pirq(d, map.type, , , 
>> );
>> +else
>> +ret = -EOPNOTSUPP;
>>  
>>  rcu_unlock_domain(d);
>>  
>> @@ -346,7 +350,11 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  if ( !d )
>>  break;
>>  
>> -ret = physdev_unmap_pirq(d, unmap.pirq);
>> +/* Only unmapping when the subject domain has a notion of PIRQ */
>> +if ( !is_hvm_domain(d) || has_pirq(d) )
>> +ret = physdev_unmap_pirq(d, unmap.pirq);
>> +else
>> +ret = -EOPNOTSUPP;
>>  
>>  rcu_unlock_domain(d);
>>  
> 
> Gitlab is displeased with your offering.
> 
> https://gitlab.com/xen-project/xen/-/pipelines/1393459622
> 
> This breaks both {adl,zen3p}-pci-hvm-x86-64-gcc-debug, and given the:
> 
> (XEN) [    8.150305] HVM restore d1: CPU 0
> libxl: error: libxl_pci.c:1491:pci_add_dm_done: Domain
> 1:xc_physdev_map_pirq irq=18 (error=-1): Not supported
> libxl: error: libxl_pci.c:1809:device_pci_add_done: Domain
> 1:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3)
> libxl: error: libxl_create.c:1962:domcreate_attach_devices: Domain
> 1:unable to add pci devices
> libxl: error: libxl_xshelp.c:206:libxl__xs_read_mandatory: xenstore read
> failed: `/libxl/1/type': No such file or directory
> libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain
> type for domid=1, assuming HVM
> libxl: error: libxl_domain.c:1616:domain_destroy_domid_cb: Domain
> 1:xc_domain_destroy failed: No such process

Sorry to forget to validate the scenario of "hvm_pirq=0" for HVM guest since 
V10->V11(remove the self-check "d == currd").

V10 version:
+/* Prevent self-map when currd has no X86_EMU_USE_PIRQ flag */
+if ( is_hvm_domain(d) && !has_pirq(d) && d == currd )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}

V11 version:
+/* Prevent mapping when the subject domain has no X86_EMU_USE_PIRQ */
+if ( is_hvm_domain(d) && !has_pirq(d) )
+{
+rcu_unlock_domain(d);
+return -EOPNOTSUPP;
+}

V10 is fine for when hvm_pirq is enable or disable. 
This issue is from V11, the cause is that when pass "hvm_pirq=0" to HVM guest, 
then has_pirq() is false, but it still uses the pirq to route the interrupt of 
passthrough devices.
So, it still does xc_physdev_(un)map_pirq, then fails at the has_pirq() check.

Hi Jan,
Should I need to change to V10 to only prevent the self-mapping when the 
subject domain has no PIRQ?
So that it can allow PHYSDEVOP_map_pirq for foreign mapping, no matter the dom0 
or the domU has PIRQ or

Re: [XEN PATCH v12 2/7] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-07-30 Thread Chen, Jiqian

Hi Andrew,

On 2024/7/30 21:09, Andrew Cooper wrote:
> On 08/07/2024 12:41 pm, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
>> And add a new check to prevent (un)map when the subject domain
>> doesn't have a notion of PIRQ.
>>
>> So that the interrupt of a passthrough device can be
>> successfully mapped to pirq for domU with a notion of PIRQ
>> when dom0 is PVH
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>>  xen/arch/x86/hvm/hypercall.c |  6 ++
>>  xen/arch/x86/physdev.c   | 12 ++--
>>  2 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 0fab670a4871..03ada3c880bd 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
>> arg)
>>  
>>  switch ( cmd )
>>  {
>> +/*
>> +* Only being permitted for management of other domains.
>> +* Further restrictions are enforced in do_physdev_op.
>> +*/
>>  case PHYSDEVOP_map_pirq:
>>  case PHYSDEVOP_unmap_pirq:
>> +break;
>> +
>>  case PHYSDEVOP_eoi:
>>  case PHYSDEVOP_irq_status_query:
>>  case PHYSDEVOP_get_free_pirq:
>> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
>> index d6dd622952a9..9f30a8c63a06 100644
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -323,7 +323,11 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  if ( !d )
>>  break;
>>  
>> -ret = physdev_map_pirq(d, map.type, , , );
>> +/* Only mapping when the subject domain has a notion of PIRQ */
>> +if ( !is_hvm_domain(d) || has_pirq(d) )
>> +ret = physdev_map_pirq(d, map.type, , , 
>> );
>> +else
>> +ret = -EOPNOTSUPP;
>>  
>>  rcu_unlock_domain(d);
>>  
>> @@ -346,7 +350,11 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  if ( !d )
>>  break;
>>  
>> -ret = physdev_unmap_pirq(d, unmap.pirq);
>> +/* Only unmapping when the subject domain has a notion of PIRQ */
>> +if ( !is_hvm_domain(d) || has_pirq(d) )
>> +ret = physdev_unmap_pirq(d, unmap.pirq);
>> +else
>> +ret = -EOPNOTSUPP;
>>  
>>  rcu_unlock_domain(d);
>>  
> 
> Gitlab is displeased with your offering.
> 
> https://gitlab.com/xen-project/xen/-/pipelines/1393459622
> 
> This breaks both {adl,zen3p}-pci-hvm-x86-64-gcc-debug, and given the:
> 
> (XEN) [    8.150305] HVM restore d1: CPU 0
> libxl: error: libxl_pci.c:1491:pci_add_dm_done: Domain
> 1:xc_physdev_map_pirq irq=18 (error=-1): Not supported
> libxl: error: libxl_pci.c:1809:device_pci_add_done: Domain
> 1:libxl__device_pci_add failed for PCI device 0:3:0.0 (rc -3)
> libxl: error: libxl_create.c:1962:domcreate_attach_devices: Domain
> 1:unable to add pci devices
> libxl: error: libxl_xshelp.c:206:libxl__xs_read_mandatory: xenstore read
> failed: `/libxl/1/type': No such file or directory
> libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain
> type for domid=1, assuming HVM
> libxl: error: libxl_domain.c:1616:domain_destroy_domid_cb: Domain
> 1:xc_domain_destroy failed: No such process
> 
> I'd say that we're hitting the newly introduced -EOPNOTSUPP path.
> 
> In the test scenario, dom0 is PV, and it's an HVM domU which is breaking.
> 
> The sibling *-pci-pv-* tests (a PV domU) are working fine.
> 
> Either way, I'm going to revert this for now because clearly the "the
> subject domain has a notion of PIRQ" hasn't been reasoned about
> correctly, and it's important to keep Gitlab CI green across the board.

OK, I will try to reproduce and investigate this issue, thanks.

> 
> ~Andrew

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi

2024-07-26 Thread Chen, Jiqian

Hi Daniel,

On 2024/7/9 21:08, Jan Beulich wrote:
> On 08.07.2024 13:41, Jiqian Chen wrote:
>> Some type of domains don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq to set the access of irq, it is not suitable for
>> dom0 that doesn't have PIRQs.
>>
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
>> the permission of irq(translate from x86 gsi) to dumU when dom0
>> has no PIRQs.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>> CC: Daniel P . Smith 
>> Remaining comment @Daniel P . Smith:
>> +ret = -EPERM;
>> +if ( !irq_access_permitted(currd, irq) ||
>> + xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +goto gsi_permission_out;
>> Is it okay to issue the XSM check using the translated value, 
>> not the one that was originally passed into the hypercall?

Need your input.

> 
> As long as the answer to this is going to be "Yes":
> Reviewed-by: Jan Beulich 
> 
> Daniel, awaiting your input.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v12 4/7] x86/domctl: Add hypercall to set the access of x86 gsi

2024-07-26 Thread Chen, Jiqian

On 2024/7/23 06:10, Stefano Stabellini wrote:
> On Mon, 8 Jul 2024, Jiqian Chen wrote:
>> Some type of domains don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq to set the access of irq, it is not suitable for
>> dom0 that doesn't have PIRQs.
>>
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant/deny
>> the permission of irq(translate from x86 gsi) to dumU when dom0
>> has no PIRQs.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>> CC: Daniel P . Smith 
>> Remaining comment @Daniel P . Smith:
>> +ret = -EPERM;
>> +if ( !irq_access_permitted(currd, irq) ||
>> + xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +goto gsi_permission_out;
>> Is it okay to issue the XSM check using the translated value, 
>> not the one that was originally passed into the hypercall?
>> ---
>>  xen/arch/x86/domctl.c  | 32 ++
>>  xen/arch/x86/include/asm/io_apic.h |  2 ++
>>  xen/arch/x86/io_apic.c | 17 
>>  xen/arch/x86/mpparse.c |  5 ++---
>>  xen/include/public/domctl.h|  9 +
>>  xen/xsm/flask/hooks.c  |  1 +
>>  6 files changed, 63 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
>> index 9190e11faaa3..4e9e4c4cfed3 100644
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -36,6 +36,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  static int update_domain_cpu_policy(struct domain *d,
>>  xen_domctl_cpu_policy_t *xdpc)
>> @@ -237,6 +238,37 @@ long arch_do_domctl(
>>  break;
>>  }
>>  
>> +case XEN_DOMCTL_gsi_permission:
>> +{
>> +int irq;
>> +unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +uint8_t access_flag = domctl->u.gsi_permission.access_flag;
>> +
>> +/* Check all bits and pads are zero except lowest bit */
>> +ret = -EINVAL;
>> +if ( access_flag & ( ~XEN_DOMCTL_GSI_PERMISSION_MASK ) )
>> +goto gsi_permission_out;
>> +for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
>> +if ( domctl->u.gsi_permission.pad[i] )
>> +goto gsi_permission_out;
>> +
>> +if ( gsi > highest_gsi() || (irq = gsi_2_irq(gsi)) <= 0 )
> 
> gsi is unsigned int but it is passed to gsi_2_irq which takes an int as
> parameter. If gsi >= INT32_MAX we have a problem. I think we should
> explicitly check for the possible overflow and return error in that
> case.
But here has checked "gsi > highest_gsi()", can highesi_gsi() return a gsi >= 
INT32_MAX?

> 
> 
>> +goto gsi_permission_out;
>> +
>> +ret = -EPERM;
>> +if ( !irq_access_permitted(currd, irq) ||
>> + xsm_irq_permission(XSM_HOOK, d, irq, access_flag) )
>> +goto gsi_permission_out;
>> +
>> +if ( access_flag )
>> +ret = irq_permit_access(d, irq);
>> +else
>> +ret = irq_deny_access(d, irq);
>> +
>> +gsi_permission_out:
>> +break;
>> +}
>> +
>>  case XEN_DOMCTL_getpageframeinfo3:
>>  {
>>  unsigned int num = domctl->u.getpageframeinfo3.num;
>> diff --git a/xen/arch/x86/include/asm/io_apic.h 
>> b/xen/arch/x86/include/asm/io_apic.h
>> index 78268ea8f666..7e86d8337758 100644
>> --- a/xen/arch/x86/include/asm/io_apic.h
>> +++ b/xen/arch/x86/include/asm/io_apic.h
>> @@ -213,5 +213,7 @@ unsigned highest_gsi(void);
>>  
>>  int ioapic_guest_read( unsigned long physbase, unsigned int reg, u32 *pval);
>>  int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val);
>> +int mp_find_ioapic(int gsi);
>> +int gsi_2_irq(int gsi);
>>  
>>  #endif
>> diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
>> index d2a313c4ac72..5968c8055671 100644
>> --- a/xen/arch/x86/io_apic.c
>> +++ b/xen/arch/x86/io_apic.c
>> @@ -955,6 +955,23 @@ static int pin_2_irq(int idx, int apic, int pin)
>>  return irq;
>>  }
>>  
>> +int gsi_2_irq(int gsi)
>> +{
>> +int ioapic, pin, irq;
>> +
>> +ioapic = mp_find_ioapic(gsi);
>> +if ( ioapic < 0 )
>> +return -EINVAL;
>> +
>> +pin = gsi - io_apic_gsi_base(ioapic);
>> +
>> +irq = apic_pin_2_gsi_irq(ioapic, pin);
>> +if ( irq <= 0 )
>> +return -EINVAL;
>> +
>> +return irq;
>> +}
>> +
>>  static inline int IO_APIC_irq_trigger(int irq)
>>  {
>>  int apic, idx, pin;
>> diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
>> index d8ccab2449c6..7786a3337760 100644
>> --- a/xen/arch/x86/mpparse.c
>>

Re: [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-07-11 Thread Chen, Jiqian

Hi all,

On 2024/7/8 19:41, Jiqian Chen wrote:
> The gsi of a passthrough device must be configured for it to be
> able to be mapped into a hvm domU.
> But When dom0 is PVH, the gsis may not get registered(see below
> clarification), it causes the info of apic, pin and irq not be
> added into irq_2_pin list, and the handler of irq_desc is not set,
> then when passthrough a device, setting ioapic affinity and vector
> will fail.
> 
> To fix above problem, on Linux kernel side, a new code will
> need to call PHYSDEVOP_setup_gsi for passthrough devices to
> register gsi when dom0 is PVH.
> 
> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
> purpose.
> 
> Clarify two questions:
> First, why the gsi of devices belong to PVH dom0 can work?
> Because when probe a driver to a normal device, it uses the normal
> probe function of pci device, in its callstack, it requests irq
> and unmask corresponding ioapic of gsi, then trap into xen and
> register gsi finally.
> Callstack is(on linux kernel side) pci_device_probe->
> request_threaded_irq-> irq_startup-> __unmask_ioapic->
> io_apic_write, then trap into xen hvmemul_do_io->
> hvm_io_intercept-> hvm_process_io_intercept->
> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
> So that the gsi can be registered.
> 
> Second, why the gsi of passthrough device can't work when dom0
> is PVH?
> Because when assign a device to passthrough, it uses the specific
> probe function of pciback, in its callstack, it doesn't install a
> fake irq handler due to the ISR is not running. So that
> mp_register_gsi on Xen side is never called, then the gsi is not
> registered.
> Callstack is(on linux kernel side) pcistub_probe->pcistub_seize->
> pcistub_init_device-> xen_pcibk_reset_device->
> xen_pcibk_control_isr->isr_on==0.
> 
> Signed-off-by: Jiqian Chen 
> Signed-off-by: Huang Rui 
> Signed-off-by: Jiqian Chen 
> ---
>  xen/arch/x86/hvm/hypercall.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 03ada3c880bd..cfe82d0f96ed 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
> arg)
>  return -ENOSYS;
>  break;
>  
> +case PHYSDEVOP_setup_gsi:
>  case PHYSDEVOP_pci_mmcfg_reserved:
>  case PHYSDEVOP_pci_device_add:
>  case PHYSDEVOP_pci_device_remove:

If you still have concerns about this implementation that allow 
PHYSDEVOP_setup_gsi for PVH on Xen side
and call PHYSDEVOP_setup_gsi when pciback probe the passthrough device.

I have another method to solve this gsi is not registered problem.
It is to adjust the codes of pciback on linux kernl side.
See:
diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index 51b3002b085b..db94529e65f9 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -445,6 +445,10 @@ static int pcistub_init_device(struct pcistub_device 
*psdev)
err = pci_enable_device(dev);
if (err)
goto config_release;
+   else {
+   dev_data->enable_intx = 1;
+   xen_pcibk_control_isr(dev, 0);
+   }

During pcistub_init_device, once pcidev is enabled(through pci_enable_device), 
I enable the isr for pciback, so that the fake irq handler can be installed and 
then gsi can be registered.
In the end of pcistub_init_device, original code calls xen_pcibk_reset_device 
to disable isr and pcidev, so the fake irq handler will be freed. Then like 
nothing happened.
Do you think this method is feasible?
If so, we don't need this patch anymore.

Looking forward to getting your input.

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v12 3/7] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-07-10 Thread Chen, Jiqian

Hi,

On 2024/7/8 19:41, Jiqian Chen wrote:
> The gsi of a passthrough device must be configured for it to be
> able to be mapped into a hvm domU.
> But When dom0 is PVH, the gsis may not get registered(see below
> clarification), it causes the info of apic, pin and irq not be
> added into irq_2_pin list, and the handler of irq_desc is not set,
> then when passthrough a device, setting ioapic affinity and vector
> will fail.
> 
> To fix above problem, on Linux kernel side, a new code will
> need to call PHYSDEVOP_setup_gsi for passthrough devices to
> register gsi when dom0 is PVH.
> 
> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
> purpose.
> 
> Clarify two questions:
> First, why the gsi of devices belong to PVH dom0 can work?
> Because when probe a driver to a normal device, it uses the normal
> probe function of pci device, in its callstack, it requests irq
> and unmask corresponding ioapic of gsi, then trap into xen and
> register gsi finally.
> Callstack is(on linux kernel side) pci_device_probe->
> request_threaded_irq-> irq_startup-> __unmask_ioapic->
> io_apic_write, then trap into xen hvmemul_do_io->
> hvm_io_intercept-> hvm_process_io_intercept->
> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
> So that the gsi can be registered.
> 
> Second, why the gsi of passthrough device can't work when dom0
> is PVH?
> Because when assign a device to passthrough, it uses the specific
> probe function of pciback, in its callstack, it doesn't install a
> fake irq handler due to the ISR is not running. So that
> mp_register_gsi on Xen side is never called, then the gsi is not
> registered.
> Callstack is(on linux kernel side) pcistub_probe->pcistub_seize->
> pcistub_init_device-> xen_pcibk_reset_device->
> xen_pcibk_control_isr->isr_on==0.
> 
> Signed-off-by: Jiqian Chen 
> Signed-off-by: Huang Rui 
> Signed-off-by: Jiqian Chen 
> ---
>  xen/arch/x86/hvm/hypercall.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
> index 03ada3c880bd..cfe82d0f96ed 100644
> --- a/xen/arch/x86/hvm/hypercall.c
> +++ b/xen/arch/x86/hvm/hypercall.c
> @@ -86,6 +86,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
> arg)
>  return -ENOSYS;
>  break;
>  
> +case PHYSDEVOP_setup_gsi:
>  case PHYSDEVOP_pci_mmcfg_reserved:
>  case PHYSDEVOP_pci_device_add:
>  case PHYSDEVOP_pci_device_remove:

Do you have any other concern about this patch?
If not, may I get your Reviewd-by?
Then the first three patches of this series can be considered to merged once I 
send next version, so that I can continue to upstream the kernel patches that 
depend on them.

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v12 5/7] tools/libxc: Allow gsi be mapped into a free pirq

2024-07-10 Thread Chen, Jiqian

On 2024/7/9 21:26, Jan Beulich wrote:
> On 08.07.2024 13:41, Jiqian Chen wrote:
>> Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific
>> pirq or a free pirq, it depends on the parameter pirq(>0 or <0).
>> But in current xc_physdev_map_pirq, it set *pirq=index when
>> parameter pirq is <0, it causes to force all cases to be mapped
>> to a specific pirq. That has some problems, one is caller can't
>> get a free pirq value, another is that once the pecific pirq was
>> already mapped to other gsi, then it will fail.
>>
>> So, change xc_physdev_map_pirq to allow to pass negative parameter
>> in and then get a free pirq.
>>
>> There are four caller of xc_physdev_map_pirq in original codes, so
>> clarify the affect below(just need to clarify the pirq<0 case):
>>
>> First, pci_add_dm_done->xc_physdev_map_pirq, it pass irq to pirq
>> parameter, if pirq<0 means irq<0, then it will fail at check
>> "index < 0" in allocate_and_map_gsi_pirq and get EINVAL, logic is
>> the same as original code.
> 
> There we have
> 
> int pirq = XEN_PT_UNASSIGNED_PIRQ;
> 
> (with XEN_PT_UNASSIGNED_PIRQ being -1) and then
> 
> rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, );
> 
> Therefore ...
> 
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch,
>>  map.domid = domid;
>>  map.type = MAP_PIRQ_TYPE_GSI;
>>  map.index = index;
>> -map.pirq = *pirq < 0 ? index : *pirq;
>> +map.pirq = *pirq;
>>  
>>  rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, , sizeof(map));
> 
> ... this very much looks like a change in behavior to me: *pirq is
> negative, and hence index would have been put in map.pirq instead. While
> with your change we'd then pass -1, i.e. requesting to obtain a new
> pIRQ.
> 
> I also consider it questionable to go by in-tree users. I think proof of
> no functional change needs to also consider possible out-of-tree users,
> not the least seeing the Python binding below (even if right there you
> indeed attempt to retain prior behavior). The one aspect in your favor
> is that libxc isn't considered to have a stable ABI.
> 
> Overall I see little room to avoid introducing a new function with this
> improved behavior (maybe xc_physdev_map_pirq_gsi()). Ideally existing
> callers would then be switched, to eventually allow removing the old
> function (thus cleanly and noticeably breaking any out-of-tree users
> that there may be, indicating to their developers that they need to
> adjust their code).
Make sense, adding a new function xc_physdev_map_pirq_gsi is much better, and 
it has the least impact.
Thank you very much!
I will change to add xc_physdev_map_pirq_gsi in next version.

> 
>> --- a/tools/python/xen/lowlevel/xc/xc.c
>> +++ b/tools/python/xen/lowlevel/xc/xc.c
>> @@ -774,6 +774,8 @@ static PyObject *pyxc_physdev_map_pirq(PyObject *self,
>>  if ( !PyArg_ParseTupleAndKeywords(args, kwds, "iii", kwd_list,
>>, , ) )
>>  return NULL;
>> +if ( pirq < 0 )
>> +pirq = index;
>>  ret = xc_physdev_map_pirq(xc->xc_handle, dom, index, );
>>  if ( ret != 0 )
>>return pyxc_error_to_exception(xc->xc_handle);
> 
> I question this change, yet without Cc-ing the maintainer (now added)
> you're not very likely to get a comment (let alone an ack) on this.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v12 7/7] tools: Add new function to do PIRQ (un)map on PVH dom0

2024-07-09 Thread Chen, Jiqian

On 2024/7/8 22:57, Anthony PERARD wrote:
> On Mon, Jul 08, 2024 at 07:41:24PM +0800, Jiqian Chen wrote:
>> diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
>> index a4029e3ac810..d869bbec769e 100644
>> --- a/tools/libs/light/libxl_arm.c
>> +++ b/tools/libs/light/libxl_arm.c
>> @@ -1774,6 +1774,16 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
>>  {
>>  }
>>  
>> +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
>> +{
>> +return -1;
> 
> It's best to return an ERROR_* for libxl error code instead of -1.
> ERROR_NI seems to be the one, it probably means not-implemented. Or
> maybe ERROR_INVAL would do to.
Seems ERROR_INVAL is more suitable. Will change in next version.

> 
>> +}
>> +
>> +int libxl__arch_hvm_unmap_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
>> +{
>> +return -1;
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index 96cb4da0794e..3d25997921cc 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -17,6 +17,7 @@
>>  #include "libxl_osdeps.h" /* must come before any other headers */
>>  
>>  #include "libxl_internal.h"
>> +#include "libxl_arch.h"
>>  
>>  #define PCI_BDF"%04x:%02x:%02x.%01x"
>>  #define PCI_BDF_SHORT  "%02x:%02x.%01x"
>> @@ -1478,6 +1479,16 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  fclose(f);
>>  if (!pci_supp_legacy_irq())
>>  goto out_no_irq;
>> +
>> +/*
>> + * When dom0 is PVH and mapping a x86 gsi to pirq for domU,
>> + * should use gsi to grant irq permission.
>> + */
>> +if (!libxl__arch_hvm_map_gsi(gc, pci_encode_bdf(pci), domid))
> 
> Could you store the result of libxl__arch_hvm_map_gsi() in `rc', then
> test that in the condition?
Will change in next version.
> 
>> +goto pci_permissive;
> 
> Why do you skip part of the function on success?
Because libxl__arch_hvm_map_gsi do the same thing for PVH dom0, and the 
following part is for PV dom0.
If libxl__arch_hvm_map_gsi success, it should skip the following part.

> But also, please avoid the "goto" coding style, in libxl, it's tolerated
> for error handling when used to skip to the end of function to have a
> single path (or error path) out of a function.
Maybe I should split the part " xc_domain_getinfo_single(ctx->xch, 
LIBXL_TOOLSTACK_DOMID, ); " in libxl__arch_hvm_map_gsi to a single 
function.
Then I can distinguish PVH and PV, and do different things for them.

> 
>> +else
>> +LOGED(WARN, domid, "libxl__arch_hvm_map_gsi failed (err=%d)", 
>> errno);
> 
> No one reads logs unless there's a failure or something doesn't work. So
> here we just ignore failure returned by libxl__arch_hvm_map_gsi(), is it
> the right things to do? Usually, just ignoring error is wrong.
Will change in next version.
> 
> FYI: LOGE* already logs errno.
> 
>> +
>>  sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>>  pci->bus, pci->dev, pci->func);
>>  f = fopen(sysfs_path, "r");
>> @@ -1505,6 +1516,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  }
>>  fclose(f);
>>  
>> +pci_permissive:
>>  /* Don't restrict writes to the PCI config space from this VM */
>>  if (pci->permissive) {
>>  if ( sysfs_write_bdf(gc, SYSFS_PCIBACK_DRIVER"/permissive",
>> @@ -2229,6 +2241,11 @@ skip_bar:
>>  if (!pci_supp_legacy_irq())
>>  goto skip_legacy_irq;
>>  
>> +if (!libxl__arch_hvm_unmap_gsi(gc, pci_encode_bdf(pci), domid))
>> +goto skip_legacy_irq;
>> +else
>> +LOGED(WARN, domid, "libxl__arch_hvm_unmap_gsi failed (err=%d)", 
>> errno);
>> +
>>  sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>> pci->bus, pci->dev, pci->func);
>>  
>> diff --git a/tools/libs/light/libxl_x86.c b/tools/libs/light/libxl_x86.c
>> index 60643d6f5376..e7756d323cb6 100644
>> --- a/tools/libs/light/libxl_x86.c
>> +++ b/tools/libs/light/libxl_x86.c
>> @@ -879,6 +879,117 @@ void libxl__arch_update_domain_config(libxl__gc *gc,
>>   libxl_defbool_val(src->b_info.u.hvm.pirq));
>>  }
>>  
>> +struct pcidev_map_pirq {
>> +uint32_t sbdf;
>> +uint32_t pirq;
>> +XEN_LIST_ENTRY(struct pcidev_map_pirq) entry;
>> +};
>> +
>> +static pthread_mutex_t pcidev_pirq_mutex = PTHREAD_MUTEX_INITIALIZER;
>> +static XEN_LIST_HEAD(, struct pcidev_map_pirq) pcidev_pirq_list =
>> +XEN_LIST_HEAD_INITIALIZER(pcidev_pirq_list);
>> +
>> +int libxl__arch_hvm_map_gsi(libxl__gc *gc, uint32_t sbdf, uint32_t domid)
>> +{
>> +int pirq = -1, gsi, r;
>> +xc_domaininfo_t info;
>> +struct pcidev_map_pirq *pcidev_pirq;
>> +libxl_ctx *ctx = libxl__gc_owner(gc);
> 
> Instead of declaring "ctx", you can use the macro "CTX" when you need
> "ctx".
Will change in next version.

> 
>> +
>> +r =

Re: [RFC XEN PATCH v12 6/7] tools: Add new function to get gsi from dev

2024-07-08 Thread Chen, Jiqian

On 2024/7/8 21:27, Anthony PERARD wrote:
> On Mon, Jul 08, 2024 at 07:41:23PM +0800, Jiqian Chen wrote:
>> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
>> index e9fcd755fa62..54edb0f3c0dc 100644
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>  return rc;
>>  }
>>  
>> +int -(xc_interface *xch, uint32_t sbdf)
>> +{
>> +int rc = -1;
>> +
>> +#if defined(__linux__)
>> +int fd;
>> +privcmd_gsi_from_pcidev_t dev_gsi = {
>> +.sbdf = sbdf,
>> +.gsi = 0,
>> +};
>> +
>> +fd = open("/dev/xen/privcmd", O_RDWR);
> 
> 
> You could reuse the already opened fd from libxencall:
> xencall_fd(xch->xcall)
Do I need to check it this fd<0?

> 
>> +
>> +if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) {
>> +/* Fallback to /proc/xen/privcmd */
>> +fd = open("/proc/xen/privcmd", O_RDWR);
>> +}
>> +
>> +if (fd < 0) {
>> +PERROR("Could not obtain handle on privileged command interface");
>> +return rc;
>> +}
>> +
>> +rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, _gsi);
> 
> I think this would be better implemented in Linux only C file instead of
> using #define. There's already "xc_linux.c" which is probably good
> enough to be used here.
> 
> Implementation for other OS would just set errno to ENOSYS and
> return -1.
Thanks, will change in next version.

> 
> 

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v12 1/7] xen/pci: Add hypercall to support reset of pcidev

2024-07-08 Thread Chen, Jiqian

On 2024/7/8 22:56, Jan Beulich wrote:
> On 08.07.2024 13:41, Jiqian Chen wrote:
>> When a device has been reset on dom0 side, the Xen hypervisor
>> doesn't get notification, so the cached state in vpci is all
>> out of date compare with the real device state.
>>
>> To solve that problem, add a new hypercall to support the reset
>> of pcidev and clear the vpci state of device. So that once the
>> state of device is reset on dom0 side, dom0 can call this
>> hypercall to notify hypervisor.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> Reviewed-by: Stewart Hildebrand 
>> Reviewed-by: Stefano Stabellini 
> 
> Reviewed-by: Jan Beulich 
Thank you very much!

> 
> Just to double check: You're sure the other two R-b are still applicable,
> despite the various changes that have been made?
Will remove in next version.

> 
> As a purely cosmetic remark: I think I would have preferred if the new
> identifiers didn't have "state" as a part; I simply don't think this adds
> much value, while at the same time making these pretty long.
Do you mean: remove "state" identifier on all the new codes?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [PATCH for-4.19 v2] x86/physdev: Return pirq that irq was already mapped to

2024-07-08 Thread Chen, Jiqian

On 2024/7/8 18:32, Andrew Cooper wrote:
> On 08/07/2024 9:04 am, Jiqian Chen wrote:
>> Fix bug introduced by 0762e2502f1f ("x86/physdev: factor out the code to 
>> allocate and
>> map a pirq"). After that re-factoring, when pirq<0 and current_pirq>0, it 
>> means
>> caller want to allocate a free pirq for irq but irq already has a mapped 
>> pirq, then
>> it returns the negative pirq, so it fails. However, the logic before that
>> re-factoring is different, it should return the current_pirq that irq was 
>> already
>> mapped to and make the call success.
>>
>> Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a 
>> pirq")
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> Reviewed-by: Jan Beulich 
> 
> As a minor note, we treat Fixes: as a tag like all the others, so tend
> not to have a blank line between it an the SoB.
Learned it, thank you!

> 
> Can be fixed on commit - no need to resend.
> 
> ~Andrew

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v11 5/8] x86/domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-07-07 Thread Chen, Jiqian

On 2024/7/4 21:33, Jan Beulich wrote:
> On 30.06.2024 14:33, Jiqian Chen wrote:
>> @@ -237,6 +238,38 @@ long arch_do_domctl(
>>  break;
>>  }
>>  
>> +case XEN_DOMCTL_gsi_permission:
>> +{
>> +int irq;
>> +uint8_t mask = 1;
>> +unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +bool allow = domctl->u.gsi_permission.allow_access;
>> +
>> +/* Check all bits and pads are zero except lowest bit */
>> +ret = -EINVAL;
>> +if ( domctl->u.gsi_permission.allow_access & ( !mask ) )
>> +goto gsi_permission_out;
> 
> I'm pretty sure that if you had, as would have been expected, added a
> #define to the public header for just the low bit you assign meaning to,
> you wouldn't have caused yourself problems here. For one, the
> initializer for "allow" will be easy to miss if meaning is assigned to
> another of the bits. It sadly _still_ takes the full 8 bits and
> converts those to a boolean. And then the check here won't work either.
> I don't see what use the local variable "mask" is, but at the very
> least I expect in place of ! you mean ~ really.
> 
>> +for ( i = 0; i < ARRAY_SIZE(domctl->u.gsi_permission.pad); ++i )
>> +if ( domctl->u.gsi_permission.pad[i] )
>> +goto gsi_permission_out;
>> +
>> +if ( gsi >= nr_irqs_gsi || ( irq = gsi_2_irq(gsi) ) < 0 )
> 
> nr_irqs_gsi is the upper bound on IRQs representing a GSI; as said before,
> GSIs and IRQs are different number spaces, and hence you can't compare
> gsi against nr_irqs_gsi. The (inclusive) upper bound on (valid) GSIs is
> mp_ioapic_routing[nr_ioapics - 1].gsi_end, or the return value of
> highest_gsi().
Will change to highest_gsi in next version.

> 
> Also, style nit: Blanks belong immediately inside parentheses only for the
> outer pair of control statements; no inner expressions should have them this
> way.
> 
> Finally I'd like to ask that you use "<= 0", as we do in various places
> elsewhere. IRQ0 is the timer interrupt; we never want to have that used by
> any entity outside of Xen itself.
> 
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -464,6 +464,12 @@ struct xen_domctl_irq_permission {
>>  uint8_t pad[3];
>>  };
>>  
>> +/* XEN_DOMCTL_gsi_permission */
>> +struct xen_domctl_gsi_permission {
>> +uint32_t gsi;
>> +uint8_t allow_access;/* flag to specify enable/disable of x86 gsi 
>> access */
> 
> See above. It's not the field that serves as a flag for the purpose you
> state, but just the low bit. You want to rename the field (flags?) and
> #define a suitable constant.

You mean?

struct xen_domctl_gsi_permission {
uint32_t gsi;
#define GSI_PERMISSION_MASK1
#define GSI_PERMISSION_DISABLE 0
#define GSI_PERMISSION_ENABLE  1
uint8_t access_flag;/* flag to specify enable/disable of x86 gsi access 
*/
uint8_t pad[3];
};

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v11 4/8] x86/physdev: Return pirq that irq was already mapped to

2024-07-07 Thread Chen, Jiqian

On 2024/7/4 20:47, Jan Beulich wrote:
> On 30.06.2024 14:33, Jiqian Chen wrote:
>> allocate_pirq is to allocate a pirq for a irq, and it supports to
>> allocate a free pirq(pirq parameter is <0) or a specific pirq (pirq
>> parameter is > 0).
>>
>> For current code, it has four usecases.
>>
>> First, pirq>0 and current_pirq>0, (current_pirq means if irq already
>> has a mapped pirq), if pirq==current_pirq means the irq already has
>> mapped to the pirq expected by the caller, it successes, if
>> pirq!=current_pirq means the pirq expected by the caller has been
>> mapped into other irq, it fails.
>>
>> Second, pirq>0 and current_pirq<0, it means pirq expected by the
>> caller has not been allocated to any irqs, so it can be allocated to
>> caller, it successes.
>>
>> Third, pirq<0 and current_pirq<0, it means caller want to allocate a
>> free pirq for irq and irq has no mapped pirq, it successes.
>>
>> Fourth, pirq<0 and current_pirq>0, it means caller want to allocate
>> a free pirq for irq but irq has a mapped pirq, then it returns the
>> negative pirq, so it fails.
>>
>> The problem is in Fourth, since the irq has a mapped pirq(current_pirq),
>> and the caller doesn't want to allocate a specified pirq to the irq, so
>> the current_pirq should be returned directly in this case, indicating
>> that the allocation is successful. That can help caller to success when
>> caller just want to allocate a free pirq but doesn't know if the irq
>> already has a mapped pirq or not.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
> 
> I think the change is correct, and actually fixes a regression. You want
> 
> Fixes: 0762e2502f1f ("x86/physdev: factor out the code to allocate and map a 
> pirq")
> 
> which would also have helped reviewing quite a bit. And it likely would
> also have helped you write a description which is easier to follow.
> Enumerating all the cases isn't really needed here; what is needed is
> an explanation of what went wrong in that re-factoring.
> 
>> --- a/xen/arch/x86/irq.c
>> +++ b/xen/arch/x86/irq.c
>> @@ -2897,6 +2897,8 @@ static int allocate_pirq(struct domain *d, int index, 
>> int pirq, int irq,
>>  d->domain_id, index, pirq, current_pirq);
>>  if ( current_pirq < 0 )
>>  return -EBUSY;
>> +else
>> +return current_pirq;
> 
> Please can this be simply
> 
> pirq = current_pirq;
> 
> without any "else", and then taking the normal return path. That again is
> (imo) closer to what was there before.
> 
> I would further suggest that you split this fix out of this series and
> re-submit soon with a for-4.19 tag and with Oleksii Cc-ed, so that this
> can be considered for inclusion in 4.19.
Thanks, will split and send today.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v11 2/8] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-07-04 Thread Chen, Jiqian

On 2024/7/4 14:38, Jan Beulich wrote:
> On 04.07.2024 04:56, Chen, Jiqian wrote:
>> On 2024/7/2 16:44, Jan Beulich wrote:
>>> On 02.07.2024 05:15, Chen, Jiqian wrote:
>>>> On 2024/7/1 15:44, Jan Beulich wrote:
>>>>> On 30.06.2024 14:33, Jiqian Chen wrote:
>>>>>> --- a/xen/arch/x86/physdev.c
>>>>>> +++ b/xen/arch/x86/physdev.c
>>>>>> @@ -323,6 +323,13 @@ ret_t do_physdev_op(int cmd, 
>>>>>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>  if ( !d )
>>>>>>  break;
>>>>>>  
>>>>>> +/* Prevent mapping when the subject domain has no 
>>>>>> X86_EMU_USE_PIRQ */
>>>>>> +if ( is_hvm_domain(d) && !has_pirq(d) )
>>>>>> +{
>>>>>> +rcu_unlock_domain(d);
>>>>>> +return -EOPNOTSUPP;
>>>>>> +}
>>>>>> +
>>>>>>  ret = physdev_map_pirq(d, map.type, , , 
>>>>>> );
>>>>>>  
>>>>>>  rcu_unlock_domain(d);
>>>>>> @@ -346,6 +353,13 @@ ret_t do_physdev_op(int cmd, 
>>>>>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>>>  if ( !d )
>>>>>>  break;
>>>>>>  
>>>>>> +/* Prevent unmapping when the subject domain has no 
>>>>>> X86_EMU_USE_PIRQ */
>>>>>> +if ( is_hvm_domain(d) && !has_pirq(d) )
>>>>>> +{
>>>>>> +rcu_unlock_domain(d);
>>>>>> +return -EOPNOTSUPP;
>>>>>> +}
>>>>>> +
>>>>>>  ret = physdev_unmap_pirq(d, unmap.pirq);
>>>>>>  
>>>>>>  rcu_unlock_domain(d);
>>>>>
>>>>> If you did go look, you will have noticed that we use "return" in the 
>>>>> middle
>>>>> of this function only very sparingly (when alternatives would result in 
>>>>> more
>>>>> complicated code elsewhere). I think you want to avoid "return" here, too,
>>>>> and probably go even further and avoid the extra rcu_unlock_domain() as 
>>>>> well.
>>>>> That's easily possible to arrange for (taking the latter case as example):
>>>>>
>>>>> /* Prevent unmapping when the subject domain has no 
>>>>> X86_EMU_USE_PIRQ */
>>>>> if ( !is_hvm_domain(d) || has_pirq(d) )
>>>>> ret = physdev_unmap_pirq(d, unmap.pirq);
>>>>> else
>>>>> ret = -EOPNOTSUPP;
>>>>>
>>>>> rcu_unlock_domain(d);
>>>>>
>>>>> Personally I would even use a conditional operator here, but I believe
>>>>> others might dislike its use in situations like this one.
>>>>>
>>>>> The re-arrangement make a little more noticeable though that the comment
>>>>> isn't quite right either: PV domains necessarily have no
>>>>> X86_EMU_USE_PIRQ. Maybe "... has no notion of pIRQ"?
>>>>
>>>> Or just like below?
>>>>
>>>> /*
>>>>  * Prevent unmapping when the subject hvm domain has no
>>>>  * X86_EMU_USE_PIRQ
>>>>  */
>>>> if ( is_hvm_domain(d) && !has_pirq(d) )
>>>> ret = -EOPNOTSUPP;
>>>> else
>>>> ret = physdev_unmap_pirq(d, unmap.pirq);
>>>
>>> No objection to the slightly changed comment. The code alternative you
>>> present is of course functionally identical, yet personally I prefer to
>>> have the "good" case on the "if" branch and the "bad" one following
>>> "else". I wouldn't insist, though.
>> OK, will change "good" case on the "if" branch.
>> Do I need to change "!is_hvm_domain(d)" to "is_pv_domain(d)" ?
>> And then have:
>>
>> /* Only unmapping when the subject domain has a notion of PIRQ */
>> if ( is_pv_domain(d) || has_pirq(d) )
>> ret = physdev_unmap_pirq(d, unmap.pirq);
>> else
>> ret = -EOPNOTSUPP;
> 
> I for one would prefer if you kept using is_hvm_domain(), for being more
> precise in this situation.
OK, thanks. Will change in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v11 7/8] tools: Add new function to get gsi from dev

2024-07-03 Thread Chen, Jiqian

Hi Anthony,

On 2024/7/2 11:47, Chen, Jiqian wrote:
> On 2024/7/1 15:32, Jan Beulich wrote:
>> On 30.06.2024 14:33, Jiqian Chen wrote:
>>> --- a/tools/libs/ctrl/xc_physdev.c
>>> +++ b/tools/libs/ctrl/xc_physdev.c
>>> @@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>>  return rc;
>>>  }
>>>  
>>> +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)
>>> +{
>>> +int rc = -1;
>>> +
>>> +#if defined(__linux__)
>>> +int fd;
>>> +privcmd_gsi_from_pcidev_t dev_gsi = {
>>> +.sbdf = sbdf,
>>> +.gsi = 0,
>>> +};
>>> +
>>> +fd = open("/dev/xen/privcmd", O_RDWR);
>>> +
>>> +if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) {
>>> +/* Fallback to /proc/xen/privcmd */
>>> +fd = open("/proc/xen/privcmd", O_RDWR);
>>> +}
>>> +
>>> +if (fd < 0) {
>>> +PERROR("Could not obtain handle on privileged command interface");
>>> +return rc;
>>> +}
>>> +
>>> +rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, _gsi);
>>> +close(fd);
>>> +
>>> +if (rc) {
>>> +PERROR("Failed to get gsi from dev");
>>> +} else {
>>> +rc = dev_gsi.gsi;
>>> +}
>>> +#endif
>>> +
>>> +return rc;
>>> +}
>>
>> I realize Anthony had asked to move this out of libxencall, yet doing it like
>> this (without really abstracting away the OS specifics) doesn't look quite
>> right either. In particular the opening of /dev/xen/privcmd looks 
>> questionable
>> to now have yet another instance in yet another library. Couldn't we split
>> osdep_xencall_open(), making available its former half for use here and in 
>> the
>> other two libraries? 
> Hi Anthony, what about your opinion?
What's your opinion about taking " opening of /dev/xen/privcmd " as a single 
function, then use it in this patch and other libraries?

> 
>> Of course that'll still leave the ioctl() invocation, which necessarily is 
>> OS-specific, too.
>>
>> Jan
> 

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v11 2/8] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-07-03 Thread Chen, Jiqian

On 2024/7/2 16:44, Jan Beulich wrote:
> On 02.07.2024 05:15, Chen, Jiqian wrote:
>> On 2024/7/1 15:44, Jan Beulich wrote:
>>> On 30.06.2024 14:33, Jiqian Chen wrote:
>>>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>>>> a passthrough device by using gsi, see qemu code
>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>>>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>>>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>>>> is not allowed because currd is PVH dom0 and PVH has no
>>>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>>>
>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
>>>> And add a new check to prevent (un)map when the subject domain
>>>> has no X86_EMU_USE_PIRQ flag.
>>>>
>>>> So that the interrupt of a passthrough device can be
>>>> successfully mapped to pirq for domU with X86_EMU_USE_PIRQ flag
>>>> when dom0 is PVH
>>>>
>>>> Signed-off-by: Jiqian Chen 
>>>> Signed-off-by: Huang Rui 
>>>> Signed-off-by: Jiqian Chen 
>>>> Reviewed-by: Stefano Stabellini 
>>>
>>> You keep carrying this R-b, despite making functional changes. This can't be
>>> quite right.
>> Will remove in next version.
>>
>>>
>>> While functionally I'm now okay with the change, I still have a code 
>>> structure
>>> concern:
>>>
>>>> --- a/xen/arch/x86/physdev.c
>>>> +++ b/xen/arch/x86/physdev.c
>>>> @@ -323,6 +323,13 @@ ret_t do_physdev_op(int cmd, 
>>>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>  if ( !d )
>>>>  break;
>>>>  
>>>> +/* Prevent mapping when the subject domain has no 
>>>> X86_EMU_USE_PIRQ */
>>>> +if ( is_hvm_domain(d) && !has_pirq(d) )
>>>> +{
>>>> +rcu_unlock_domain(d);
>>>> +return -EOPNOTSUPP;
>>>> +}
>>>> +
>>>>  ret = physdev_map_pirq(d, map.type, , , );
>>>>  
>>>>  rcu_unlock_domain(d);
>>>> @@ -346,6 +353,13 @@ ret_t do_physdev_op(int cmd, 
>>>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>  if ( !d )
>>>>  break;
>>>>  
>>>> +/* Prevent unmapping when the subject domain has no 
>>>> X86_EMU_USE_PIRQ */
>>>> +if ( is_hvm_domain(d) && !has_pirq(d) )
>>>> +{
>>>> +rcu_unlock_domain(d);
>>>> +return -EOPNOTSUPP;
>>>> +}
>>>> +
>>>>  ret = physdev_unmap_pirq(d, unmap.pirq);
>>>>  
>>>>  rcu_unlock_domain(d);
>>>
>>> If you did go look, you will have noticed that we use "return" in the middle
>>> of this function only very sparingly (when alternatives would result in more
>>> complicated code elsewhere). I think you want to avoid "return" here, too,
>>> and probably go even further and avoid the extra rcu_unlock_domain() as 
>>> well.
>>> That's easily possible to arrange for (taking the latter case as example):
>>>
>>> /* Prevent unmapping when the subject domain has no 
>>> X86_EMU_USE_PIRQ */
>>> if ( !is_hvm_domain(d) || has_pirq(d) )
>>> ret = physdev_unmap_pirq(d, unmap.pirq);
>>> else
>>> ret = -EOPNOTSUPP;
>>>
>>> rcu_unlock_domain(d);
>>>
>>> Personally I would even use a conditional operator here, but I believe
>>> others might dislike its use in situations like this one.
>>>
>>> The re-arrangement make a little more noticeable though that the comment
>>> isn't quite right either: PV domains necessarily have no
>>> X86_EMU_USE_PIRQ. Maybe "... has no notion of pIRQ"?
>>
>> Or just like below?
>>
>> /*
>>  * Prevent unmapping when the subject hvm domain has no
>>  * X86_EMU_USE_PIRQ
>>  */
>> if ( is_hvm_domain(d) && !has_pirq(d) )
>> ret = -EOPNOTSUPP;
>> else
>> ret = physdev_unmap_pirq(d, unmap.pirq);
> 
> No objection to the slightly changed comment. The code alternative you
> present is of course functionally identical, yet personally I prefer to
> have the "good" case on the "if" branch and the "bad" one following
> "else". I wouldn't insist, though.
OK, will change "good" case on the "if" branch.
Do I need to change "!is_hvm_domain(d)" to "is_pv_domain(d)" ?
And then have:

/* Only unmapping when the subject domain has a notion of PIRQ */
if ( is_pv_domain(d) || has_pirq(d) )
ret = physdev_unmap_pirq(d, unmap.pirq);
else
ret = -EOPNOTSUPP;

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v11 7/8] tools: Add new function to get gsi from dev

2024-07-01 Thread Chen, Jiqian

On 2024/7/1 15:32, Jan Beulich wrote:
> On 30.06.2024 14:33, Jiqian Chen wrote:
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -111,3 +111,38 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>  return rc;
>>  }
>>  
>> +int xc_physdev_gsi_from_pcidev(xc_interface *xch, uint32_t sbdf)
>> +{
>> +int rc = -1;
>> +
>> +#if defined(__linux__)
>> +int fd;
>> +privcmd_gsi_from_pcidev_t dev_gsi = {
>> +.sbdf = sbdf,
>> +.gsi = 0,
>> +};
>> +
>> +fd = open("/dev/xen/privcmd", O_RDWR);
>> +
>> +if (fd < 0 && (errno == ENOENT || errno == ENXIO || errno == ENODEV)) {
>> +/* Fallback to /proc/xen/privcmd */
>> +fd = open("/proc/xen/privcmd", O_RDWR);
>> +}
>> +
>> +if (fd < 0) {
>> +PERROR("Could not obtain handle on privileged command interface");
>> +return rc;
>> +}
>> +
>> +rc = ioctl(fd, IOCTL_PRIVCMD_GSI_FROM_PCIDEV, _gsi);
>> +close(fd);
>> +
>> +if (rc) {
>> +PERROR("Failed to get gsi from dev");
>> +} else {
>> +rc = dev_gsi.gsi;
>> +}
>> +#endif
>> +
>> +return rc;
>> +}
> 
> I realize Anthony had asked to move this out of libxencall, yet doing it like
> this (without really abstracting away the OS specifics) doesn't look quite
> right either. In particular the opening of /dev/xen/privcmd looks questionable
> to now have yet another instance in yet another library. Couldn't we split
> osdep_xencall_open(), making available its former half for use here and in the
> other two libraries? 
Hi Anthony, what about your opinion?

> Of course that'll still leave the ioctl() invocation, which necessarily is 
> OS-specific, too.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v11 6/8] tools/libxc: Allow gsi be mapped into a free pirq

2024-07-01 Thread Chen, Jiqian

On 2024/7/1 15:54, Jan Beulich wrote:
> On 30.06.2024 14:33, Jiqian Chen wrote:
>> Hypercall PHYSDEVOP_map_pirq support to map a gsi into a specific
>> pirq or a free pirq, it depends on the parameter pirq(>0 or <0).
>> But in current xc_physdev_map_pirq, it set *pirq=index when
>> parameter pirq is <0, it causes to force all cases to be mapped
>> to a specific pirq. That has some problems, one is caller can't
>> get a free pirq value, another is that once the pecific pirq was
>> already mapped to other gsi, then it will fail.
>>
>> So, change xc_physdev_map_pirq to allow to pass negative parameter
>> in and then get a free pirq.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>>  tools/libs/ctrl/xc_physdev.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
>> index 460a8e779ce8..e9fcd755fa62 100644
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -50,7 +50,7 @@ int xc_physdev_map_pirq(xc_interface *xch,
>>  map.domid = domid;
>>  map.type = MAP_PIRQ_TYPE_GSI;
>>  map.index = index;
>> -map.pirq = *pirq < 0 ? index : *pirq;
>> +map.pirq = *pirq;
>>  
>>  rc = do_physdev_op(xch, PHYSDEVOP_map_pirq, , sizeof(map));
>>  
> 
> This is a functional change to existing callers, without any kind of
> clarification whether this changed behavior is actually okay for them.
Make sense.
There are three callers pci_add_dm_done, libxl__arch_domain_map_irq and 
pyxc_physdev_map_pirq,
I know how to clarify the first two, but the last one, I have no idea.

Hi Marek,
Will this patch break the existing behavior of pyxc_physdev_map_pirq, and why?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v11 3/8] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-07-01 Thread Chen, Jiqian

On 2024/7/1 15:52, Jan Beulich wrote:
> On 30.06.2024 14:33, Jiqian Chen wrote:
>> The gsi of a passthrough device must be configured for it to be
>> able to be mapped into a hvm domU.
>> But When dom0 is PVH, the gsis don't get registered, it causes
> 
> As per below, it's not "don't" but "may not". As the details don't
> follow right away, you may also want to add something like "(see
> below)".
OK, will change in next version.

> 
>> the info of apic, pin and irq not be added into irq_2_pin list,
>> and the handler of irq_desc is not set, then when passthrough a
>> device, setting ioapic affinity and vector will fail.
>>
>> To fix above problem, on Linux kernel side, a new code will
>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>> register gsi when dom0 is PVH.
>>
>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>> purpose.
>>
>> Clarify two questions:
>> First, why the gsi of devices belong to PVH dom0 can work?
>> Because when probe a driver to a normal device, it calls(on linux
>> kernel side) pci_device_probe-> request_threaded_irq->
>> irq_startup-> __unmask_ioapic-> io_apic_write, then trap into xen
>> side hvmemul_do_io-> hvm_io_intercept-> hvm_process_io_intercept->
>> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> mp_register_gsi.
>> So that the gsi can be registered.
>>
>> Second, why the gsi of passthrough device can't work when dom0
>> is PVH?
>> Because when assign a device to passthrough, it uses pciback to
>> probe the device, and it calls pcistub_probe->pcistub_seize->
>> pcistub_init_device-> xen_pcibk_reset_device->
>> xen_pcibk_control_isr->isr_on, but isr_on is not set, so that the
>> fake IRQ handler is not installed, then the gsi isn't unmasked.
>> What's more, we can see on Xen side, the function
>> vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when
>> the gsi is unmasked, so that the gsi can't work for passthrough
>> device.
> 
> While this provides the requested detail (thanks), personally I find
> this pretty hard to follow. It would likely be easier if it was
> written to a larger part in English, rather than in call chain
> terminology. But I'm not going to insist, unless others would agree
> with that view of mine.
I will add the language description in next version, and also keep the call 
stack if not necessary to remove.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v11 2/8] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-07-01 Thread Chen, Jiqian

On 2024/7/1 15:44, Jan Beulich wrote:
> On 30.06.2024 14:33, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
>> And add a new check to prevent (un)map when the subject domain
>> has no X86_EMU_USE_PIRQ flag.
>>
>> So that the interrupt of a passthrough device can be
>> successfully mapped to pirq for domU with X86_EMU_USE_PIRQ flag
>> when dom0 is PVH
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> Reviewed-by: Stefano Stabellini 
> 
> You keep carrying this R-b, despite making functional changes. This can't be
> quite right.
Will remove in next version.

> 
> While functionally I'm now okay with the change, I still have a code structure
> concern:
> 
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -323,6 +323,13 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  if ( !d )
>>  break;
>>  
>> +/* Prevent mapping when the subject domain has no X86_EMU_USE_PIRQ 
>> */
>> +if ( is_hvm_domain(d) && !has_pirq(d) )
>> +{
>> +rcu_unlock_domain(d);
>> +return -EOPNOTSUPP;
>> +}
>> +
>>  ret = physdev_map_pirq(d, map.type, , , );
>>  
>>  rcu_unlock_domain(d);
>> @@ -346,6 +353,13 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  if ( !d )
>>  break;
>>  
>> +/* Prevent unmapping when the subject domain has no 
>> X86_EMU_USE_PIRQ */
>> +if ( is_hvm_domain(d) && !has_pirq(d) )
>> +{
>> +rcu_unlock_domain(d);
>> +return -EOPNOTSUPP;
>> +}
>> +
>>  ret = physdev_unmap_pirq(d, unmap.pirq);
>>  
>>  rcu_unlock_domain(d);
> 
> If you did go look, you will have noticed that we use "return" in the middle
> of this function only very sparingly (when alternatives would result in more
> complicated code elsewhere). I think you want to avoid "return" here, too,
> and probably go even further and avoid the extra rcu_unlock_domain() as well.
> That's easily possible to arrange for (taking the latter case as example):
> 
> /* Prevent unmapping when the subject domain has no X86_EMU_USE_PIRQ 
> */
> if ( !is_hvm_domain(d) || has_pirq(d) )
> ret = physdev_unmap_pirq(d, unmap.pirq);
> else
> ret = -EOPNOTSUPP;
> 
> rcu_unlock_domain(d);
> 
> Personally I would even use a conditional operator here, but I believe
> others might dislike its use in situations like this one.
> 
> The re-arrangement make a little more noticeable though that the comment
> isn't quite right either: PV domains necessarily have no
> X86_EMU_USE_PIRQ. Maybe "... has no notion of pIRQ"?

Or just like below?

/*
 * Prevent unmapping when the subject hvm domain has no
 * X86_EMU_USE_PIRQ
 */
if ( is_hvm_domain(d) && !has_pirq(d) )
ret = -EOPNOTSUPP;
else
ret = physdev_unmap_pirq(d, unmap.pirq);

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v11 1/8] xen/vpci: Clear all vpci status of device

2024-07-01 Thread Chen, Jiqian

On 2024/7/1 15:18, Jan Beulich wrote:
> On 30.06.2024 14:33, Jiqian Chen wrote:
>> When a device has been reset on dom0 side, the vpci on Xen
>> side won't get notification, so the cached state in vpci is
>> all out of date compare with the real device state.
>> To solve that problem, add a new hypercall to clear all vpci
>> device state. When the state of device is reset on dom0 side,
>> dom0 can call this hypercall to notify vpci.
> 
> While the description properly talks about all of this being about device
> reset, the title suggests otherwise (leaving open what the context is, thus
> - to me at least - suggesting it's during vPCI init for a particular
> device).
Change title to "xen/pci: Add hypercall to support reset of pcidev" ?

> 
>> @@ -67,6 +68,63 @@ ret_t pci_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  break;
>>  }
>>  
>> +case PHYSDEVOP_pci_device_state_reset:
>> +{
>> +struct pci_device_state_reset dev_reset;
>> +struct pci_dev *pdev;
>> +pci_sbdf_t sbdf;
>> +
>> +ret = -EOPNOTSUPP;
>> +if ( !is_pci_passthrough_enabled() )
>> +break;
>> +
>> +ret = -EFAULT;
>> +if ( copy_from_guest(_reset, arg, 1) != 0 )
>> +break;
>> +
>> +sbdf = PCI_SBDF(dev_reset.dev.seg,
>> +dev_reset.dev.bus,
>> +dev_reset.dev.devfn);
>> +
>> +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>> +if ( ret )
>> +break;
>> +
>> +pcidevs_lock();
>> +pdev = pci_get_pdev(NULL, sbdf);
>> +if ( !pdev )
>> +{
>> +pcidevs_unlock();
>> +ret = -ENODEV;
>> +break;
>> +}
>> +
>> +write_lock(>domain->pci_lock);
>> +pcidevs_unlock();
>> +/* Implement FLR, other reset types may be implemented in future */
> 
> The comment isn't in sync with the code anymore.
Change to "/* vpci_reset_device_state is called by default for all reset types, 
other specific operations can be added later as needed */" ?

> 
>> +switch ( dev_reset.reset_type )
>> +{
>> +case PCI_DEVICE_STATE_RESET_COLD:
>> +case PCI_DEVICE_STATE_RESET_WARM:
>> +case PCI_DEVICE_STATE_RESET_HOT:
>> +case PCI_DEVICE_STATE_RESET_FLR:
>> +{
> 
> This brace isn't needed while at the same time it is confusing.
> 
>> +ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
>> +if ( ret )
>> +dprintk(XENLOG_ERR,
>> +"%pp: failed to reset vPCI device state\n", );
> 
> I question the need for a log message here.
OK, will delete it in next version.

> 
>> --- a/xen/include/public/physdev.h
>> +++ b/xen/include/public/physdev.h
>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>>   */
>>  #define PHYSDEVOP_prepare_msix  30
>>  #define PHYSDEVOP_release_msix  31
>> +/*
>> + * Notify the hypervisor that a PCI device has been reset, so that any
>> + * internally cached state is regenerated.  Should be called after any
>> + * device reset performed by the hardware domain.
>> + */
>> +#define PHYSDEVOP_pci_device_state_reset 32
>> +
>>  struct physdev_pci_device {
>>  /* IN */
>>  uint16_t seg;
>> @@ -305,6 +312,19 @@ struct physdev_pci_device {
>>  typedef struct physdev_pci_device physdev_pci_device_t;
>>  DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_t);
>>  
>> +struct pci_device_state_reset {
>> +physdev_pci_device_t dev;
>> +#define _PCI_DEVICE_STATE_RESET_COLD 0
>> +#define PCI_DEVICE_STATE_RESET_COLD  (1U<<_PCI_DEVICE_STATE_RESET_COLD)
>> +#define _PCI_DEVICE_STATE_RESET_WARM 1
>> +#define PCI_DEVICE_STATE_RESET_WARM  (1U<<_PCI_DEVICE_STATE_RESET_WARM)
>> +#define _PCI_DEVICE_STATE_RESET_HOT  2
>> +#define PCI_DEVICE_STATE_RESET_HOT   (1U<<_PCI_DEVICE_STATE_RESET_HOT)
>> +#define _PCI_DEVICE_STATE_RESET_FLR  3
>> +#define PCI_DEVICE_STATE_RESET_FLR   (1U<<_PCI_DEVICE_STATE_RESET_FLR)
>> +uint32_t reset_type;
>> +};
> 
> Do we really need the _PCI_DEVICE_STATE_RESET_* bit positions as separate
> #define-s? I can't spot any use anywhere.
I thought it was a coding style.
I will delete them in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-25 Thread Chen, Jiqian

On 2024/6/24 20:33, Anthony PERARD wrote:
> On Fri, Jun 21, 2024 at 08:20:55AM +0000, Chen, Jiqian wrote:
>> On 2024/6/20 18:42, Jan Beulich wrote:
>>> On 20.06.2024 11:40, Chen, Jiqian wrote:
>>>> On 2024/6/18 17:23, Jan Beulich wrote:
>>>>> On 18.06.2024 10:23, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 23:32, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>>  rc = ERROR_FAIL;
>>>>>>>>  goto out;
>>>>>>>>  }
>>>>>>>> -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>> +/* If dom0 doesn't have PIRQs, need to use 
>>>>>>>> xc_domain_gsi_permission */
>>>>>>>> +r = xc_domain_getinfo_single(ctx->xch, 0, );
>>>>>>>
>>>>>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, 
>>>>>>> but
>>>>>>> I didn't check if that can be used with the underlying hypercall(s). 
>>>>>>> Otherwise
>>>> From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is 
>>>> not allowed for XEN_DOMCTL_getdomaininfo.
>>>> And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.
>>>>
>>>>>>> you want to pass the actual domid of the local domain here.
>>>> What is the local domain here?
>>>
>>> The domain your code is running in.
>>>
>>>> What is method for me to get its domid?
>>>
>>> I hope there's an available function in one of the libraries to do that.
>> I didn't find relate function.
>> Hi Anthony, do you know?
> 
> Yes, I managed to find:
> LIBXL_TOOLSTACK_DOMID
> That's the value you can use instead of "0" do designate dom0.
> (That was harder than necessary to find.)
Thank you very much! I will use LIBXL_TOOLSTACK_DOMID in next version.

> 
> Cheers,
> 

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-25 Thread Chen, Jiqian

On 2024/6/24 16:17, Jan Beulich wrote:
> On 21.06.2024 10:20, Chen, Jiqian wrote:
>> On 2024/6/20 18:42, Jan Beulich wrote:
>>> On 20.06.2024 11:40, Chen, Jiqian wrote:
>>>> On 2024/6/18 17:23, Jan Beulich wrote:
>>>>> On 18.06.2024 10:23, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 23:32, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>>  rc = ERROR_FAIL;
>>>>>>>>  goto out;
>>>>>>>>  }
>>>>>>>> -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>> +/* If dom0 doesn't have PIRQs, need to use 
>>>>>>>> xc_domain_gsi_permission */
>>>>>>>> +r = xc_domain_getinfo_single(ctx->xch, 0, );
>>>>>>>
>>>>>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, 
>>>>>>> but
>>>>>>> I didn't check if that can be used with the underlying hypercall(s). 
>>>>>>> Otherwise
>>>> From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is 
>>>> not allowed for XEN_DOMCTL_getdomaininfo.
>>>> And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.
>>>>
>>>>>>> you want to pass the actual domid of the local domain here.
>>>> What is the local domain here?
>>>
>>> The domain your code is running in.
>>>
>>>> What is method for me to get its domid?
>>>
>>> I hope there's an available function in one of the libraries to do that.
>> I didn't find relate function.
>> Hi Anthony, do you know?
>>
>>> But I wouldn't even know what to look for; that's a question to (primarily)
>>> Anthony then, who sadly continues to be our only tool stack maintainer.
>>>
>>> Alternatively we could maybe enable XEN_DOMCTL_getdomaininfo to permit
>>> DOMID_SELF.
>> It didn't permit DOMID_SELF since below commit. Does it still have the same 
>> problem if permit DOMID_SELF?
> 
> To answer this, all respective callers would need auditing. However, ...
> 
>> commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf
>> Author: kfraser@localhost.localdomain 
>> Date:   Tue Aug 14 09:56:46 2007 +0100
>>
>> xen: Do not accept DOMID_SELF as input to DOMCTL_getdomaininfo.
>> This was screwing up callers that loop on getdomaininfo(), if there
>> was a domain with domid DOMID_FIRST_RESERVED-1 (== DOMID_SELF-1).
>> They would see DOMID_SELF-1, then look up DOMID_SELF, which has domid
>> 0 of course, and then start their domain-finding loop all over again!
>> Found by Kouya Shimura . Thanks!
>> Signed-off-by: Keir Fraser 
> 
> ... I view this as a pretty odd justification for the change, when imo the
> bogus loops should instead have been adjusted.
Yes, you are right.
And Anthony suggested to use LIBXL_TOOLSTACK_DOMID to replace 0 domid.
It seems there is no need to change hypercall DOMCTL_getdomaininfo for now?

> 
> Jan
> 
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index 09a1e84d98e0..5d29667b7c3d 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -463,19 +463,13 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domctl_t) u_domctl)
>>  case XEN_DOMCTL_getdomaininfo:
>>  {
>>  struct domain *d;
>> -domid_t dom;
>> -
>> -dom = op->domain;
>> -if ( dom == DOMID_SELF )
>> -dom = current->domain->domain_id;
>> +domid_t dom = op->domain;
>>
>>  rcu_read_lock(_read_lock);
>>
>>  for_each_domain ( d )
>> -{
>>  if ( d->domain_id >= dom )
>>  break;
>> -}
>>
>>  if ( d == NULL )
>>  {
> 

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev

2024-06-25 Thread Chen, Jiqian

On 2024/6/24 16:13, Jan Beulich wrote:
> On 21.06.2024 10:15, Chen, Jiqian wrote:
>> On 2024/6/20 18:37, Jan Beulich wrote:
>>> On 20.06.2024 12:23, Chen, Jiqian wrote:
>>>> On 2024/6/20 15:43, Jan Beulich wrote:
>>>>> On 20.06.2024 09:03, Chen, Jiqian wrote:
>>>>>> On 2024/6/18 17:13, Jan Beulich wrote:
>>>>>>> On 18.06.2024 10:10, Chen, Jiqian wrote:
>>>>>>>> On 2024/6/17 23:10, Jan Beulich wrote:
>>>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>>>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>>>>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>>>>>>>  #endif
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> +#define PCI_DEVID(bus, devfn)\
>>>>>>>>>> +uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>>>>>>>> +
>>>>>>>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>>>>>>>> +uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>>>>>>
>>>>>>>>> I'm not a maintainer of this file; if I were, I'd ask that for 
>>>>>>>>> readability's
>>>>>>>>> sake all excess parentheses be dropped from these.
>>>>>>>> Isn't it a coding requirement to enclose each element in parentheses 
>>>>>>>> in the macro definition?
>>>>>>>> It seems other files also do this. See 
>>>>>>>> tools/libs/light/libxl_internal.h
>>>>>>>
>>>>>>> As said, I'm not a maintainer of this code. Yet while I'm aware that 
>>>>>>> libxl
>>>>>>> has its own CODING_STYLE, I can't spot anything towards excessive use of
>>>>>>> parentheses there.
>>>>>> So, which parentheses do you think are excessive use?
>>>>>
>>>>> #define PCI_DEVID(bus, devfn)\
>>>>> (((uint16_t)(bus) << 8) | ((devfn) & 0xff))
>>>>>
>>>>> #define PCI_SBDF(seg, bus, devfn) \
>>>>> (((uint32_t)(seg) << 16) | PCI_DEVID(bus, devfn))
>>>> Thanks, will change in next version.
>>>>
>>>>>
>>>>>>>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>>>>  goto out_no_irq;
>>>>>>>>>>  }
>>>>>>>>>>  if ((fscanf(f, "%u", ) == 1) && irq) {
>>>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>>>> +sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>>>>>>>> +(PCI_DEVFN(pci->dev, pci->func)));
>>>>>>>>>> +gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>>>>>>>> +/*
>>>>>>>>>> + * Old kernel version may not support this function,
>>>>>>>>>
>>>>>>>>> Just kernel?
>>>>>>>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on 
>>>>>>>> linux kernel side.
>>>>>>>
>>>>>>> Okay, and when the kernel supports it but the underlying hypervisor 
>>>>>>> doesn't
>>>>>>> support what the kernel wants to use in order to fulfill the request, 
>>>>>>> all
>>>>>> I don't know what things you mentioned hypervisor doesn't support are,
>>>>>> because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf 
>>>>>> information,
>>>>>> that relationship can be got only in dom0 instead of Xen hypervisor.
>>>>>>
>>>>>>> is fine? (See also below for what may be needed in the hypervisor, even 
>>>>>>> if
>>>>>> You mean xc_physdev_map_pirq needs gsi?
>>>>>
>>>>> I'd put it slightly differently: You arrange for that function to now 
>>>>> take a
>>>>> GSI when the caller is PVH. But yes, the function, when used with
>>>>> MAP_PIRQ_TYPE_GSI, clearly e

Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev

2024-06-21 Thread Chen, Jiqian

On 2024/6/20 22:38, Anthony PERARD wrote:
> On Mon, Jun 17, 2024 at 05:00:34PM +0800, Jiqian Chen wrote:
>> diff --git a/tools/include/xencall.h b/tools/include/xencall.h
>> index fc95ed0fe58e..750aab070323 100644
>> --- a/tools/include/xencall.h
>> +++ b/tools/include/xencall.h
>> @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
>>   uint64_t arg1, uint64_t arg2, uint64_t arg3,
>>   uint64_t arg4, uint64_t arg5);
>>  
>> +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
> 
> I don't think that an appropriate library for this new feature.
> libxencall is a generic lib to make hypercall.
Do you have a suggested place to put this new function?
This new function is to get gsi of a pci device, and only depend on the dom0 
kernel, doesn't need to interact with hypervisor.

> 
>>  /* Variant(s) of the above, as needed, returning "long" instead of "int". */
>>  long xencall2L(xencall_handle *xcall, unsigned int op,
>> uint64_t arg1, uint64_t arg2);
>> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
>> index 9ceca0cffc2f..a0381f74d24b 100644
>> --- a/tools/include/xenctrl.h
>> +++ b/tools/include/xenctrl.h
>> @@ -1641,6 +1641,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>uint32_t domid,
>>int pirq);
>>  
>> +int xc_physdev_gsi_from_dev(xc_interface *xch, uint32_t sbdf);
>> +
>>  /*
>>   *  LOGGING AND ERROR REPORTING
>>   */
>> diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c
>> index 02c4f8e1aefa..6dae50c9a6ba 100644
>> --- a/tools/libs/call/core.c
>> +++ b/tools/libs/call/core.c
>> @@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op,
>>  return osdep_hypercall(xcall, );
>>  }
>>  
>> +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf)
>> +{
>> +return osdep_oscall(xcall, sbdf);
>> +}
>> +
>>  /*
>>   * Local variables:
>>   * mode: C
>> diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map
>> index d18a3174e9dc..b92a0b5dc12c 100644
>> --- a/tools/libs/call/libxencall.map
>> +++ b/tools/libs/call/libxencall.map
>> @@ -10,6 +10,8 @@ VERS_1.0 {
>>  xencall4;
>>  xencall5;
>>  
>> +xen_oscall_gsi_from_dev;
> 
> FYI, never change already released version of a library, this would add
> a new function to libxencall.1.0. Instead, when adding a new function
> to a library that is supposed to be stable (they have a *.map file in
> xen case), add it to a new section, that woud be VERS_1.4 in this case.
> But libxencall isn't approriate for this new function, so just for
> future reference.
> 
>>  xencall_alloc_buffer;
>>  xencall_free_buffer;
>>  xencall_alloc_buffer_pages;
>> diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c
>> index 6d588e6bea8f..92c740e176f2 100644
>> --- a/tools/libs/call/linux.c
>> +++ b/tools/libs/call/linux.c
>> @@ -85,6 +85,21 @@ long osdep_hypercall(xencall_handle *xcall, 
>> privcmd_hypercall_t *hypercall)
>>  return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall);
>>  }
>>  
>> +int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
>> +{
>> +privcmd_gsi_from_dev_t dev_gsi = {
>> +.sbdf = sbdf,
>> +.gsi = -1,
>> +};
>> +
>> +if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_DEV, _gsi)) {
> 
> Looks like libxencall is only for hypercall, and so I don't think
> it's the right place to introducing another ioctl() call.
It seems IOCTL_PRIVCMD_HYPERCALL is for hypercall.
What I do here is to introduce new call into privcmd fd.
Maybe I can open "/dev/xen/privcmd" directly, so that I don't have to add the 
*_oscal function.

> 
>> +PERROR("failed to get gsi from dev");
>> +return -1;
>> +}
>> +
>> +return dev_gsi.gsi;
>> +}
>> +
>>  static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages)
>>  {
>>  void *p;
>> diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h
>> index 9c3aa432efe2..cd6eb5a3e66f 100644
>> --- a/tools/libs/call/private.h
>> +++ b/tools/libs/call/private.h
>> @@ -57,6 +57,15 @@ int osdep_xencall_close(xencall_handle *xcall);
>>  
>>  long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall);
>>  
>> +#if defined(__linux__)
>> +int osdep_oscall(xencall_handle *xcall, unsigned int sbdf);
>> +#else
>> +static inline int osdep_oscall(xencall_handle *xcall, unsigned int sbdf)
>> +{
>> +return -1;
>> +}
>> +#endif
>> +
>>  void *osdep_alloc_pages(xencall_handle *xcall, size_t nr_pages);
>>  void osdep_free_pages(xencall_handle *xcall, void *p, size_t nr_pages);
>>  
>> diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c
>> index 460a8e779ce8..c1458f3a38b5 100644
>> --- a/tools/libs/ctrl/xc_physdev.c
>> +++ b/tools/libs/ctrl/xc_physdev.c
>> @@ -111,3 +111,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch,
>>  return rc;
>>  }
>>  
>>

Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-21 Thread Chen, Jiqian

On 2024/6/20 18:42, Jan Beulich wrote:
> On 20.06.2024 11:40, Chen, Jiqian wrote:
>> On 2024/6/18 17:23, Jan Beulich wrote:
>>> On 18.06.2024 10:23, Chen, Jiqian wrote:
>>>> On 2024/6/17 23:32, Jan Beulich wrote:
>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>  rc = ERROR_FAIL;
>>>>>>  goto out;
>>>>>>  }
>>>>>> -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>>>>> +#ifdef CONFIG_X86
>>>>>> +/* If dom0 doesn't have PIRQs, need to use 
>>>>>> xc_domain_gsi_permission */
>>>>>> +r = xc_domain_getinfo_single(ctx->xch, 0, );
>>>>>
>>>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, 
>>>>> but
>>>>> I didn't check if that can be used with the underlying hypercall(s). 
>>>>> Otherwise
>> From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is not 
>> allowed for XEN_DOMCTL_getdomaininfo.
>> And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.
>>
>>>>> you want to pass the actual domid of the local domain here.
>> What is the local domain here?
> 
> The domain your code is running in.
> 
>> What is method for me to get its domid?
> 
> I hope there's an available function in one of the libraries to do that.
I didn't find relate function.
Hi Anthony, do you know?

> But I wouldn't even know what to look for; that's a question to (primarily)
> Anthony then, who sadly continues to be our only tool stack maintainer.
> 
> Alternatively we could maybe enable XEN_DOMCTL_getdomaininfo to permit
> DOMID_SELF.
It didn't permit DOMID_SELF since below commit. Does it still have the same 
problem if permit DOMID_SELF?

commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf
Author: kfraser@localhost.localdomain 
Date:   Tue Aug 14 09:56:46 2007 +0100

xen: Do not accept DOMID_SELF as input to DOMCTL_getdomaininfo.
This was screwing up callers that loop on getdomaininfo(), if there
was a domain with domid DOMID_FIRST_RESERVED-1 (== DOMID_SELF-1).
They would see DOMID_SELF-1, then look up DOMID_SELF, which has domid
0 of course, and then start their domain-finding loop all over again!
Found by Kouya Shimura . Thanks!
Signed-off-by: Keir Fraser 

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 09a1e84d98e0..5d29667b7c3d 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -463,19 +463,13 @@ long do_domctl(XEN_GUEST_HANDLE(xen_domctl_t) u_domctl)
 case XEN_DOMCTL_getdomaininfo:
 {
 struct domain *d;
-domid_t dom;
-
-dom = op->domain;
-if ( dom == DOMID_SELF )
-dom = current->domain->domain_id;
+domid_t dom = op->domain;

 rcu_read_lock(_read_lock);

 for_each_domain ( d )
-{
 if ( d->domain_id >= dom )
 break;
-}

 if ( d == NULL )
 {

> 
>>>> But the action of granting permission is from dom0 to domU, what I need to 
>>>> get is the infomation of dom0,
>>>> The actual domid here is domU's id I think, it is not useful.
>>>
>>> Note how I said DOMID_SELF and "local domain". There's no talk of using the
>>> DomU's domid. But what you apparently neglect is the fact that the hardware
>>> domain isn't necessarily Dom0 (see CONFIG_LATE_HWDOM in the hypervisor).
>>> While benign in most cases, this is relevant when it comes to referencing
>>> the hardware domain by domid. And it is the hardware domain which is going
>>> to drive the device re-assignment, as that domain is who's in possession of
>>> all the devices not yet assigned to any DomU.
>> OK, I need to get the information of hardware domain here?
> 
> Right, with (for this purpose) "hardware domain" == "local domain".
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev

2024-06-21 Thread Chen, Jiqian

On 2024/6/20 18:37, Jan Beulich wrote:
> On 20.06.2024 12:23, Chen, Jiqian wrote:
>> On 2024/6/20 15:43, Jan Beulich wrote:
>>> On 20.06.2024 09:03, Chen, Jiqian wrote:
>>>> On 2024/6/18 17:13, Jan Beulich wrote:
>>>>> On 18.06.2024 10:10, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 23:10, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>>>>>  #endif
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +#define PCI_DEVID(bus, devfn)\
>>>>>>>> +uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>>>>>> +
>>>>>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>>>>>> +uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>>>>
>>>>>>> I'm not a maintainer of this file; if I were, I'd ask that for 
>>>>>>> readability's
>>>>>>> sake all excess parentheses be dropped from these.
>>>>>> Isn't it a coding requirement to enclose each element in parentheses in 
>>>>>> the macro definition?
>>>>>> It seems other files also do this. See tools/libs/light/libxl_internal.h
>>>>>
>>>>> As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
>>>>> has its own CODING_STYLE, I can't spot anything towards excessive use of
>>>>> parentheses there.
>>>> So, which parentheses do you think are excessive use?
>>>
>>> #define PCI_DEVID(bus, devfn)\
>>> (((uint16_t)(bus) << 8) | ((devfn) & 0xff))
>>>
>>> #define PCI_SBDF(seg, bus, devfn) \
>>> (((uint32_t)(seg) << 16) | PCI_DEVID(bus, devfn))
>> Thanks, will change in next version.
>>
>>>
>>>>>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>>  goto out_no_irq;
>>>>>>>>  }
>>>>>>>>  if ((fscanf(f, "%u", ) == 1) && irq) {
>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>> +sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>>>>>> +(PCI_DEVFN(pci->dev, pci->func)));
>>>>>>>> +gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>>>>>> +/*
>>>>>>>> + * Old kernel version may not support this function,
>>>>>>>
>>>>>>> Just kernel?
>>>>>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on 
>>>>>> linux kernel side.
>>>>>
>>>>> Okay, and when the kernel supports it but the underlying hypervisor 
>>>>> doesn't
>>>>> support what the kernel wants to use in order to fulfill the request, all
>>>> I don't know what things you mentioned hypervisor doesn't support are,
>>>> because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf 
>>>> information,
>>>> that relationship can be got only in dom0 instead of Xen hypervisor.
>>>>
>>>>> is fine? (See also below for what may be needed in the hypervisor, even if
>>>> You mean xc_physdev_map_pirq needs gsi?
>>>
>>> I'd put it slightly differently: You arrange for that function to now take a
>>> GSI when the caller is PVH. But yes, the function, when used with
>>> MAP_PIRQ_TYPE_GSI, clearly expects a GSI as input (see also below).
>>>
>>>>> this IOCTL would be satisfied by the kernel without needing to interact 
>>>>> with
>>>>> the hypervisor.)
>>>>>
>>>>>>>> + * so if fail, keep using irq; if success, use gsi
>>>>>>>> + */
>>>>>>>> +if (gsi > 0) {
>>>>>>>> +irq = gsi;
>>>>>>>
>>>>>>> I'm still puzzled by this, when by now I think we've sufficiently 
>>>>>>> clarified
>>>>>>> that IRQs and GSIs use two distinct numbering spaces.
>>>>>>>
>>>>>&g

Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev

2024-06-20 Thread Chen, Jiqian

On 2024/6/20 15:43, Jan Beulich wrote:
> On 20.06.2024 09:03, Chen, Jiqian wrote:
>> On 2024/6/18 17:13, Jan Beulich wrote:
>>> On 18.06.2024 10:10, Chen, Jiqian wrote:
>>>> On 2024/6/17 23:10, Jan Beulich wrote:
>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>>>  #endif
>>>>>>  }
>>>>>>  
>>>>>> +#define PCI_DEVID(bus, devfn)\
>>>>>> +uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>>>> +
>>>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>>>> +uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>>
>>>>> I'm not a maintainer of this file; if I were, I'd ask that for 
>>>>> readability's
>>>>> sake all excess parentheses be dropped from these.
>>>> Isn't it a coding requirement to enclose each element in parentheses in 
>>>> the macro definition?
>>>> It seems other files also do this. See tools/libs/light/libxl_internal.h
>>>
>>> As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
>>> has its own CODING_STYLE, I can't spot anything towards excessive use of
>>> parentheses there.
>> So, which parentheses do you think are excessive use?
> 
> #define PCI_DEVID(bus, devfn)\
> (((uint16_t)(bus) << 8) | ((devfn) & 0xff))
> 
> #define PCI_SBDF(seg, bus, devfn) \
> (((uint32_t)(seg) << 16) | PCI_DEVID(bus, devfn))
Thanks, will change in next version.

> 
>>>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>  goto out_no_irq;
>>>>>>  }
>>>>>>  if ((fscanf(f, "%u", ) == 1) && irq) {
>>>>>> +#ifdef CONFIG_X86
>>>>>> +sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>>>> +(PCI_DEVFN(pci->dev, pci->func)));
>>>>>> +gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>>>> +/*
>>>>>> + * Old kernel version may not support this function,
>>>>>
>>>>> Just kernel?
>>>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux 
>>>> kernel side.
>>>
>>> Okay, and when the kernel supports it but the underlying hypervisor doesn't
>>> support what the kernel wants to use in order to fulfill the request, all
>> I don't know what things you mentioned hypervisor doesn't support are,
>> because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf 
>> information,
>> that relationship can be got only in dom0 instead of Xen hypervisor.
>>
>>> is fine? (See also below for what may be needed in the hypervisor, even if
>> You mean xc_physdev_map_pirq needs gsi?
> 
> I'd put it slightly differently: You arrange for that function to now take a
> GSI when the caller is PVH. But yes, the function, when used with
> MAP_PIRQ_TYPE_GSI, clearly expects a GSI as input (see also below).
> 
>>> this IOCTL would be satisfied by the kernel without needing to interact with
>>> the hypervisor.)
>>>
>>>>>> + * so if fail, keep using irq; if success, use gsi
>>>>>> + */
>>>>>> +if (gsi > 0) {
>>>>>> +irq = gsi;
>>>>>
>>>>> I'm still puzzled by this, when by now I think we've sufficiently 
>>>>> clarified
>>>>> that IRQs and GSIs use two distinct numbering spaces.
>>>>>
>>>>> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
>>>>> the assumption that it'll fail. What if we decide to make the 
>>>>> functionality
>>>>> available there, too (if only for informational purposes, or for
>>>>> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
>>>>> you'd call ...
>>>>>
>>>>>> +}
>>>>>> +#endif
>>>>>>  r = xc_physdev_map_pirq(ctx->xch, domid, irq, );
>>>>>
>>>>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested 
>>>>> before,
>>>>> yo

Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-20 Thread Chen, Jiqian

On 2024/6/18 17:23, Jan Beulich wrote:
> On 18.06.2024 10:23, Chen, Jiqian wrote:
>> On 2024/6/17 23:32, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>  rc = ERROR_FAIL;
>>>>  goto out;
>>>>  }
>>>> -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>>> +#ifdef CONFIG_X86
>>>> +/* If dom0 doesn't have PIRQs, need to use 
>>>> xc_domain_gsi_permission */
>>>> +r = xc_domain_getinfo_single(ctx->xch, 0, );
>>>
>>> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
>>> I didn't check if that can be used with the underlying hypercall(s). 
>>> Otherwise
From the commit 10ef7a91b5a8cb8c58903c60e2dd16ed490b3bcf, DOMID_SELF is not 
allowed for XEN_DOMCTL_getdomaininfo.
And now XEN_DOMCTL_getdomaininfo gets domain through rcu_lock_domain_by_id.

>>> you want to pass the actual domid of the local domain here.
What is the local domain here?
What is method for me to get its domid?

>> But the action of granting permission is from dom0 to domU, what I need to 
>> get is the infomation of dom0,
>> The actual domid here is domU's id I think, it is not useful.
> 
> Note how I said DOMID_SELF and "local domain". There's no talk of using the
> DomU's domid. But what you apparently neglect is the fact that the hardware
> domain isn't necessarily Dom0 (see CONFIG_LATE_HWDOM in the hypervisor).
> While benign in most cases, this is relevant when it comes to referencing
> the hardware domain by domid. And it is the hardware domain which is going
> to drive the device re-assignment, as that domain is who's in possession of
> all the devices not yet assigned to any DomU.
OK, I need to get the information of hardware domain here?

> 
>>>> @@ -237,6 +238,48 @@ long arch_do_domctl(
>>>>  break;
>>>>  }
>>>>  
>>>> +case XEN_DOMCTL_gsi_permission:
>>>> +{
>>>> +unsigned int gsi = domctl->u.gsi_permission.gsi;
>>>> +int irq;
>>>> +bool allow = domctl->u.gsi_permission.allow_access;
>>>
>>> See my earlier comments on this conversion of 8 bits into just one.
>> Do you mean that I need to check allow_access is >= 0?
>> But allow_access is u8, it can't be negative.
> 
> Right. What I can only re-iterate from earlier commenting is that you
> want to check for 0 or 1 (can be viewed as looking at just the low bit),
> rejecting everything else. It is only this way that down the road we
> could assign meaning to the other bits, without risking to break existing
> callers. That's the same as the requirement to check padding fields to be
> zero.
OK, I will add check the other bit is zero except the lowest one bit.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev

2024-06-20 Thread Chen, Jiqian

On 2024/6/18 17:13, Jan Beulich wrote:
> On 18.06.2024 10:10, Chen, Jiqian wrote:
>> On 2024/6/17 23:10, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> --- a/tools/include/xen-sys/Linux/privcmd.h
>>>> +++ b/tools/include/xen-sys/Linux/privcmd.h
>>>> @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
>>>>__u64 addr;
>>>>  } privcmd_mmap_resource_t;
>>>>  
>>>> +typedef struct privcmd_gsi_from_dev {
>>>> +  __u32 sbdf;
>>>
>>> That's PCI-centric, without struct and IOCTL names reflecting this fact.
>> So, change to privcmd_gsi_from_pcidev ?
> 
> That's what I'd suggest, yes. But remember that it's the kernel maintainers
> who have the ultimate say here, as here you're only making a copy of what
> the canonical header (in the kernel tree) is going to have.
OK, then let's wait for the corresponding patch on kernel side to be accepted 
first.
> 
>>>> +  int gsi;
>>>
>>> Is "int" legitimate to use here? Doesn't this want to similarly be __u32?
>> I want to set gsi to negative if there is no record of this translation.
> 
> There are surely more explicit ways to signal that case?
Maybe, I will think about the implementation on kernel side again.
> 
>>>> --- a/tools/libs/light/libxl_pci.c
>>>> +++ b/tools/libs/light/libxl_pci.c
>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>  #endif
>>>>  }
>>>>  
>>>> +#define PCI_DEVID(bus, devfn)\
>>>> +uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>> +
>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>> +uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>
>>> I'm not a maintainer of this file; if I were, I'd ask that for readability's
>>> sake all excess parentheses be dropped from these.
>> Isn't it a coding requirement to enclose each element in parentheses in the 
>> macro definition?
>> It seems other files also do this. See tools/libs/light/libxl_internal.h
> 
> As said, I'm not a maintainer of this code. Yet while I'm aware that libxl
> has its own CODING_STYLE, I can't spot anything towards excessive use of
> parentheses there.
So, which parentheses do you think are excessive use?
> 
>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>  goto out_no_irq;
>>>>  }
>>>>  if ((fscanf(f, "%u", ) == 1) && irq) {
>>>> +#ifdef CONFIG_X86
>>>> +sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>> +(PCI_DEVFN(pci->dev, pci->func)));
>>>> +gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>> +/*
>>>> + * Old kernel version may not support this function,
>>>
>>> Just kernel?
>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux 
>> kernel side.
> 
> Okay, and when the kernel supports it but the underlying hypervisor doesn't
> support what the kernel wants to use in order to fulfill the request, all
I don't know what things you mentioned hypervisor doesn't support are,
because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf 
information,
that relationship can be got only in dom0 instead of Xen hypervisor.

> is fine? (See also below for what may be needed in the hypervisor, even if
You mean xc_physdev_map_pirq needs gsi?

> this IOCTL would be satisfied by the kernel without needing to interact with
> the hypervisor.)
> 
>>>> + * so if fail, keep using irq; if success, use gsi
>>>> + */
>>>> +if (gsi > 0) {
>>>> +irq = gsi;
>>>
>>> I'm still puzzled by this, when by now I think we've sufficiently clarified
>>> that IRQs and GSIs use two distinct numbering spaces.
>>>
>>> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
>>> the assumption that it'll fail. What if we decide to make the functionality
>>> available there, too (if only for informational purposes, or for
>>> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
>>> you'd call ...
>>>
>>>> +}
>>>> +#endif
>>>>  r = xc_physdev_map_pirq(ctx->xch, domid, irq, );
>>>
>>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
>>> you strictly want to avoid the call on PV Dom0.
>>>
>&

Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-19 Thread Chen, Jiqian

On 2024/6/19 17:49, Jan Beulich wrote:
> On 19.06.2024 10:51, Chen, Jiqian wrote:
>> On 2024/6/19 16:06, Jan Beulich wrote:
>>> On 19.06.2024 09:53, Chen, Jiqian wrote:
>>>> On 2024/6/18 16:55, Jan Beulich wrote:
>>>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>>>> able to be mapped into a hvm domU.
>>>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>>>
>>>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>>>> register gsi when dom0 is PVH.
>>>>>>>>
>>>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>>>> purpose.
>>>>>>>>
>>>>>>>> Signed-off-by: Jiqian Chen 
>>>>>>>> Signed-off-by: Huang Rui 
>>>>>>>> Signed-off-by: Jiqian Chen 
>>>>>>>> ---
>>>>>>>> The code link that will call this hypercall on linux kernel side is as 
>>>>>>>> follows:
>>>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
>>>>>>>
>>>>>>> One of my v9 comments was addressed, thanks. Repeating the other, 
>>>>>>> unaddressed
>>>>>>> one here:
>>>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>>>  operation, I think it'll also want/need explaining why what is 
>>>>>>> sufficient for
>>>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>>>> I have modified the commit message to describe why GSIs are not 
>>>>>> registered can cause passthrough not work, according to this v9 comment.
>>>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin 
>>>>>> list, and the handler of irq_desc is not set, then when passthrough a 
>>>>>> device, setting ioapic affinity and vector will fail."
>>>>>> What description do you want me to add?
>>>>>
>>>>> What I'd first like to have clarification on (i.e. before putting it in
>>>>> the description one way or another): How come Dom0 alone gets away fine
>>>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>>>> perhaps that it just so happened that for Dom0 things have been working
>>>>> on systems where it was tested, but the call should in principle have been
>>>>> there in this case, too [1]? That (to me at least) would make quite a
>>>>> difference for both this patch's description and us accepting it.
>>>> Oh, I think I know what's your concern now. Thanks.
>>>> First question, why gsi of device can work on PVH dom0:
>>>> Because when probe a driver to a normal device, it will call linux kernel 
>>>> side:pci_device_probe-> request_threaded_irq-> irq_startup-> 
>>>> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> 
>>>> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> 
>>>> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>>>> Second question, why gsi of passthrough can't work on PVH dom0:
>>>> Because when assign a device to be passthrough, it uses pciback to probe 
>>>> the device, and it calls pcistub_probe, but in all callstack of 
>>>> pcistub_probe, it doesn't unmask the gsi, and we can see on Xen side, the 
>>>> function vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when 
>>>> the gsi is unmasked, so that the gsi can't work for passthrough device.
>>>
>>> And why exactly would the fake IRQ handler not be set up by pciback? Its
>>> setting up ought to lead to those same IO-APIC RTE wr

Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-19 Thread Chen, Jiqian

On 2024/6/19 16:06, Jan Beulich wrote:
> On 19.06.2024 09:53, Chen, Jiqian wrote:
>> On 2024/6/18 16:55, Jan Beulich wrote:
>>> On 18.06.2024 08:57, Chen, Jiqian wrote:
>>>> On 2024/6/17 22:52, Jan Beulich wrote:
>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>> The gsi of a passthrough device must be configured for it to be
>>>>>> able to be mapped into a hvm domU.
>>>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>>>> and the handler of irq_desc is not set, then when passthrough a
>>>>>> device, setting ioapic affinity and vector will fail.
>>>>>>
>>>>>> To fix above problem, on Linux kernel side, a new code will
>>>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>>>> register gsi when dom0 is PVH.
>>>>>>
>>>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>>>> purpose.
>>>>>>
>>>>>> Signed-off-by: Jiqian Chen 
>>>>>> Signed-off-by: Huang Rui 
>>>>>> Signed-off-by: Jiqian Chen 
>>>>>> ---
>>>>>> The code link that will call this hypercall on linux kernel side is as 
>>>>>> follows:
>>>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
>>>>>
>>>>> One of my v9 comments was addressed, thanks. Repeating the other, 
>>>>> unaddressed
>>>>> one here:
>>>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>>>  operation, I think it'll also want/need explaining why what is 
>>>>> sufficient for
>>>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>>>> I have modified the commit message to describe why GSIs are not registered 
>>>> can cause passthrough not work, according to this v9 comment.
>>>> " it causes the info of apic, pin and irq not be added into irq_2_pin 
>>>> list, and the handler of irq_desc is not set, then when passthrough a 
>>>> device, setting ioapic affinity and vector will fail."
>>>> What description do you want me to add?
>>>
>>> What I'd first like to have clarification on (i.e. before putting it in
>>> the description one way or another): How come Dom0 alone gets away fine
>>> without making the call, yet for passthrough-to-DomU it's needed? Is it
>>> perhaps that it just so happened that for Dom0 things have been working
>>> on systems where it was tested, but the call should in principle have been
>>> there in this case, too [1]? That (to me at least) would make quite a
>>> difference for both this patch's description and us accepting it.
>> Oh, I think I know what's your concern now. Thanks.
>> First question, why gsi of device can work on PVH dom0:
>> Because when probe a driver to a normal device, it will call linux kernel 
>> side:pci_device_probe-> request_threaded_irq-> irq_startup-> 
>> __unmask_ioapic-> io_apic_write, then trap into xen side hvmemul_do_io-> 
>> hvm_io_intercept-> hvm_process_io_intercept-> vioapic_write_indirect-> 
>> vioapic_hwdom_map_gsi-> mp_register_gsi. So that the gsi can be registered.
>> Second question, why gsi of passthrough can't work on PVH dom0:
>> Because when assign a device to be passthrough, it uses pciback to probe the 
>> device, and it calls pcistub_probe, but in all callstack of pcistub_probe, 
>> it doesn't unmask the gsi, and we can see on Xen side, the function 
>> vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is 
>> unmasked, so that the gsi can't work for passthrough device.
> 
> And why exactly would the fake IRQ handler not be set up by pciback? Its
> setting up ought to lead to those same IO-APIC RTE writes that Xen
> intercepts.
Because isr_on is not set, when xen_pcibk_control_isr is called, it will return 
due to " !dev_data->isr_on". So that fake IRQ handler aren't installed.
And it seems isr_on is set through driver sysfs " irq_handler_state" for a 
level device that is to be shared with guest and the IRQ is shared with the 
initial domain.

> 
> In any event, imo a summary of the above wants to be part of the patch
> description.
OK, will add into the commit message in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-19 Thread Chen, Jiqian

On 2024/6/18 16:55, Jan Beulich wrote:
> On 18.06.2024 08:57, Chen, Jiqian wrote:
>> On 2024/6/17 22:52, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> The gsi of a passthrough device must be configured for it to be
>>>> able to be mapped into a hvm domU.
>>>> But When dom0 is PVH, the gsis don't get registered, it causes
>>>> the info of apic, pin and irq not be added into irq_2_pin list,
>>>> and the handler of irq_desc is not set, then when passthrough a
>>>> device, setting ioapic affinity and vector will fail.
>>>>
>>>> To fix above problem, on Linux kernel side, a new code will
>>>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>>>> register gsi when dom0 is PVH.
>>>>
>>>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>>>> purpose.
>>>>
>>>> Signed-off-by: Jiqian Chen 
>>>> Signed-off-by: Huang Rui 
>>>> Signed-off-by: Jiqian Chen 
>>>> ---
>>>> The code link that will call this hypercall on linux kernel side is as 
>>>> follows:
>>>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
>>>
>>> One of my v9 comments was addressed, thanks. Repeating the other, 
>>> unaddressed
>>> one here:
>>> "As to GSIs not being registered: If that's not a problem for Dom0's own
>>>  operation, I think it'll also want/need explaining why what is sufficient 
>>> for
>>>  Dom0 alone isn't sufficient when pass-through comes into play."
>> I have modified the commit message to describe why GSIs are not registered 
>> can cause passthrough not work, according to this v9 comment.
>> " it causes the info of apic, pin and irq not be added into irq_2_pin list, 
>> and the handler of irq_desc is not set, then when passthrough a device, 
>> setting ioapic affinity and vector will fail."
>> What description do you want me to add?
> 
> What I'd first like to have clarification on (i.e. before putting it in
> the description one way or another): How come Dom0 alone gets away fine
> without making the call, yet for passthrough-to-DomU it's needed? Is it
> perhaps that it just so happened that for Dom0 things have been working
> on systems where it was tested, but the call should in principle have been
> there in this case, too [1]? That (to me at least) would make quite a
> difference for both this patch's description and us accepting it.
Oh, I think I know what's your concern now. Thanks.
First question, why gsi of device can work on PVH dom0:
Because when probe a driver to a normal device, it will call linux kernel 
side:pci_device_probe-> request_threaded_irq-> irq_startup-> __unmask_ioapic-> 
io_apic_write, then trap into xen side hvmemul_do_io-> hvm_io_intercept-> 
hvm_process_io_intercept-> vioapic_write_indirect-> vioapic_hwdom_map_gsi-> 
mp_register_gsi. So that the gsi can be registered.
Second question, why gsi of passthrough can't work on PVH dom0:
Because when assign a device to be passthrough, it uses pciback to probe the 
device, and it calls pcistub_probe, but in all callstack of pcistub_probe, it 
doesn't unmask the gsi, and we can see on Xen side, the function 
vioapic_hwdom_map_gsi-> mp_register_gsi will be called only when the gsi is 
unmasked, so that the gsi can't work for passthrough device.

> 
> Jan
> 
> [1] Alternative e.g. being that because of other actions PVH Dom0 takes,
> like the IO-APIC RTE programming it does for IRQs it wants to use for
> itself, the necessary information is already suitably conveyed to Xen in
> that case. In such a case imo it's relevant to mention in the description.
> Not the least because iirc the pciback driver sets up a fake IRQ handler
> in such cases, which ought to lead to similar IO-APIC RTE programming, at
> which point the question would again arise why the hypercall needs
> exposing.

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-18 Thread Chen, Jiqian

On 2024/6/18 16:38, Jan Beulich wrote:
> On 18.06.2024 08:49, Chen, Jiqian wrote:
>> On 2024/6/17 22:45, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>>>> a passthrough device by using gsi, see qemu code
>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>>>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>>>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>>>> is not allowed because currd is PVH dom0 and PVH has no
>>>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>>>
>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq.
>>>
>>> Why "failed path"? Isn't unmapping also part of normal device removal
>>> from a guest?
>> Yes, both. I will change to also "allow PHYSDEVOP_unmap_pirq for the device 
>> removal path to unmap pirq".
>>
>>>
>>>> And
>>>> add a new check to prevent self map when subject domain has no
>>>> PIRQ flag.
>>>
>>> You still talk of only self mapping, and the code also still does only
>>> that. As pointed out before: Why would you allow mapping into a PVH
>>> DomU? IOW what purpose do the "d == currd" checks have?
>> The checking I added has two purpose, first is I need to allow this case:
>> Dom0(without PIRQ) + DomU(with PIRQ), because the original code just do 
>> (!has_pirq(currd)) will cause map_pirq fail in this case.
>> Second I need to disallow self-mapping:
>> DomU(without PIRQ) do map_pirq, the "d==currd" means the currd is the 
>> subject domain itself.
>>
>> Emmm, I think I know what's your concern.
>> Do you mean I need to
>> " Prevent map_pirq when currd has no X86_EMU_USE_PIRQ flag "
>> instead of
>> " Prevent self-map when currd has no X86_EMU_USE_PIRQ flag ",
> 
> No. What I mean is that I continue to fail to see why you mention "currd".
> IOW it would be more like "prevent mapping when the subject domain has no
> X86_EMU_USE_PIRQ" (which, as a specific sub-case, includes self-mapping
> if the caller specifies DOMID_SELF for the subject domain).
Oh, I see, not only to prevent self-mapping, but if the subject domain has no 
PIRQs, we should reject, self-mapping is just the one sub case.

> 
>> so I need to remove "d==currd", right?
> 
> Removing this check is what I'm after, yes. Yet that's not in sync with
> either of the two quoted sentences above.
> 
>>>> So that domU with PIRQ flag can success to map pirq for
>>>> passthrough devices even dom0 has no PIRQ flag.
>>>
>>> There's still a description problem here. Much like the first sentence,
>>> this last one also says that the guest would itself map the pIRQ. In
>>> which case there would still not be any reason to expose the sub-
>>> functions to Dom0.
>> If change to " So that the interrupt of a passthrough device can success to 
>> be mapped to pirq for domU with PIRQ flag when dom0 is PVH.",
>> Is it OK?
> 
> Kind of, yes. "can be successfully mapped" is one of the various possibilities
> of making this read a little more smoothly.
OK.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device

2024-06-18 Thread Chen, Jiqian

On 2024/6/18 16:33, Jan Beulich wrote:
> On 18.06.2024 08:25, Chen, Jiqian wrote:
>> On 2024/6/17 22:17, Jan Beulich wrote:
>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>> --- a/xen/drivers/pci/physdev.c
>>>> +++ b/xen/drivers/pci/physdev.c
>>>> @@ -2,11 +2,17 @@
>>>>  #include 
>>>>  #include 
>>>>  #include 
>>>> +#include 
>>>>  
>>>>  #ifndef COMPAT
>>>>  typedef long ret_t;
>>>>  #endif
>>>>  
>>>> +static const struct pci_device_state_reset_method
>>>> +pci_device_state_reset_methods[] = {
>>>> +[ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
>>>> +};
>>>
>>> What about the other three DEVICE_RESET_*? In particular ...
>> I don't know how to implement the other three types of reset.
>> This is a design form so that corresponding processing functions can be 
>> added later if necessary. Do I need to set them to NULL pointers in this 
>> array?
> 
> No.
> 
>> Does this form conform to your previous suggestion of using only one 
>> hypercall to handle all types of resets?
> 
> Yes, at least in principle. Question here is: To be on the safe side,
> wouldn't we better reset state for all forms of reset, leaving possible
> relaxation of that for later? At which point the function wouldn't need
> calling indirectly, and instead would be passed the reset type as an
> argument.
If I understood correctly, next version should be?
Use macros to represent different reset types.
Add switch cases in PHYSDEVOP_pci_device_state_reset to handle different reset 
functions.
Add reset_type as a function parameter to vpci_reset_device_state for possible 
future use.

+case PHYSDEVOP_pci_device_state_reset:
+{
+struct pci_device_state_reset dev_reset;
+struct pci_dev *pdev;
+pci_sbdf_t sbdf;
+
+if ( !is_pci_passthrough_enabled() )
+return -EOPNOTSUPP;
+
+ret = -EFAULT;
+if ( copy_from_guest(_reset, arg, 1) != 0 )
+break;
+
+sbdf = PCI_SBDF(dev_reset.dev.seg,
+dev_reset.dev.bus,
+dev_reset.dev.evfn);
+
+ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
+if ( ret )
+break;
+
+pcidevs_lock();
+pdev = pci_get_pdev(NULL, sbdf);
+if ( !pdev )
+{
+pcidevs_unlock();
+ret = -ENODEV;
+break;
+}
+
+write_lock(>domain->pci_lock);
+pcidevs_unlock();
+/* Implement FLR, other reset types may be implemented in future */
+switch ( dev_reset.reset_type )
+{
+case PCI_DEVICE_STATE_RESET_COLD:
+case PCI_DEVICE_STATE_RESET_WARM:
+case PCI_DEVICE_STATE_RESET_HOT:
+case PCI_DEVICE_STATE_RESET_FLR:
+ret = vpci_reset_device_state(pdev, dev_reset.reset_type);
+break;
+}
+write_unlock(>domain->pci_lock);
+
+if ( ret )
+dprintk(XENLOG_ERR,
+"%pp: failed to reset vPCI device state\n", );
+break;
+}

> 
>>> Also, nit (further up): Opening figure braces for a new scope go onto their
>> OK, will change in next version.
>>> own line. Then again I notice that apparenly _all_ other instances in this
>>> file are doing it the wrong way, too.
>> Do I need to change them in this patch?
> 
> No.
> 
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
>>>>  
>>>>  return rc;
>>>>  }
>>>> +
>>>> +int vpci_reset_device_state(struct pci_dev *pdev)
>>>
>>> As a target of an indirect call this needs to be annotated cf_check (both
>>> here and in the declaration, unlike __must_check, which is sufficient to
>>> have on just the declaration).
>> OK, will add cf_check in next version.
> 
> Which may not be necessary if you go the route suggested above.
> 
>>>> --- a/xen/include/xen/pci.h
>>>> +++ b/xen/include/xen/pci.h
>>>> @@ -156,6 +156,22 @@ struct pci_dev {
>>>>  struct vpci *vpci;
>>>>  };
>>>>  
>>>> +struct pci_device_state_reset_method {
>>>> +int (*reset_fn)(struct pci_dev *pdev);
>>>> +};
>>>> +
>>>> +enum pci_device_state_reset_type {
>>>> +DEVICE_RESET_FLR,
>>>> +DEVICE_RESET_COLD,
>>>> +DEVICE_RESET_WARM,
>>>> +DEVICE_RESET_HOT,

Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-18 Thread Chen, Jiqian

On 2024/6/17 23:32, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> Some type of domain don't have PIRQs, like PVH, it doesn't do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest base on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
>> irq on Xen side.
>> What's more, current hypercall XEN_DOMCTL_irq_permission requires
>> passing in pirq, it is not suitable for dom0 that doesn't have
>> PIRQs.
>>
>> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the
>> permission of irq(translate from gsi) to dumU when dom0 has no
>> PIRQs.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>> RFC: it needs review and needs to wait for the corresponding third patch on 
>> linux kernel side to be merged.
>> ---
>>  tools/include/xenctrl.h|  5 +++
>>  tools/libs/ctrl/xc_domain.c| 15 +++
>>  tools/libs/light/libxl_pci.c   | 67 +++---
>>  xen/arch/x86/domctl.c  | 43 +++
>>  xen/arch/x86/include/asm/io_apic.h |  2 +
>>  xen/arch/x86/io_apic.c | 17 
>>  xen/arch/x86/mpparse.c |  3 +-
>>  xen/include/public/domctl.h|  8 
>>  xen/xsm/flask/hooks.c  |  1 +
>>  9 files changed, 153 insertions(+), 8 deletions(-)
>>
>> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
>> index a0381f74d24b..f3feb6848e25 100644
>> --- a/tools/include/xenctrl.h
>> +++ b/tools/include/xenctrl.h
>> @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
>>   uint32_t pirq,
>>   bool allow_access);
>>  
>> +int xc_domain_gsi_permission(xc_interface *xch,
>> + uint32_t domid,
>> + uint32_t gsi,
>> + bool allow_access);
>> +
>>  int xc_domain_iomem_permission(xc_interface *xch,
>> uint32_t domid,
>> unsigned long first_mfn,
>> diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
>> index f2d9d14b4d9f..8540e84fda93 100644
>> --- a/tools/libs/ctrl/xc_domain.c
>> +++ b/tools/libs/ctrl/xc_domain.c
>> @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
>>  return do_domctl(xch, );
>>  }
>>  
>> +int xc_domain_gsi_permission(xc_interface *xch,
>> + uint32_t domid,
>> + uint32_t gsi,
>> + bool allow_access)
>> +{
>> +struct xen_domctl domctl = {
>> +.cmd = XEN_DOMCTL_gsi_permission,
>> +.domain = domid,
>> +.u.gsi_permission.gsi = gsi,
>> +.u.gsi_permission.allow_access = allow_access,
>> +};
>> +
>> +return do_domctl(xch, );
>> +}
>> +
>>  int xc_domain_iomem_permission(xc_interface *xch,
>> uint32_t domid,
>> unsigned long first_mfn,
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index 376f91759ac6..f027f22c0028 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1431,6 +1431,9 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>>  uint32_t domainid = domid;
>>  bool isstubdom = libxl_is_stubdom(ctx, domid, );
>> +#ifdef CONFIG_X86
>> +xc_domaininfo_t info;
>> +#endif
>>  
>>  /* Convenience aliases */
>>  bool starting = pas->starting;
>> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  rc = ERROR_FAIL;
>>  goto out;
>>  }
>> -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>> +#ifdef CONFIG_X86
>> +/* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission 
>> */
>> +r = xc_domain_getinfo_single(ctx->xch, 0, );
> 
> Hard-coded 0 is imposing limitations. Ideally you would use DOMID_SELF, but
> I didn't check if that can be used with the underlying hypercall(s). Otherwise
> you want to pass the actual domid of the local domain here.
But the action of granting permission is from dom0 to domU, what I need to get 
is the infomation of dom0,
The actual domid here is domU's id I think, it is not useful.

> 
>>  if (r < 0) {
>> -LOGED(ERROR, domainid,
>> -  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
>> +LOGED(ERROR, domainid, "getdomaininfo failed (error=%d)", 
>> errno);
>>  fclose(f);
>>  rc = ERROR_FAIL;
>>  goto out;
>>  }
>> +if (info.flags & XEN_DOMINF_hvm_guest &&
> 
> You want to parenthesize the & here.
Will change in next version.

> 
>> +!(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) &&
>> +

Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev

2024-06-18 Thread Chen, Jiqian

On 2024/6/17 23:10, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> In PVH dom0, it uses the linux local interrupt mechanism,
>> when it allocs irq for a gsi, it is dynamic, and follow
>> the principle of applying first, distributing first. And
>> irq number is alloced from small to large, but the applying
>> gsi number is not, may gsi 38 comes before gsi 28, that
>> causes the irq number is not equal with the gsi number.
> 
> Hmm, see my earlier explanations on patch 5: GSI and IRQ generally aren't
> the same anyway. Therefore this part of the description, while not wrong,
> is at least at risk of misleading people.
OK, I wll change to say "irq is not the same as gsi".
> 
>> --- a/tools/include/xen-sys/Linux/privcmd.h
>> +++ b/tools/include/xen-sys/Linux/privcmd.h
>> @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource {
>>  __u64 addr;
>>  } privcmd_mmap_resource_t;
>>  
>> +typedef struct privcmd_gsi_from_dev {
>> +__u32 sbdf;
> 
> That's PCI-centric, without struct and IOCTL names reflecting this fact.
So, change to privcmd_gsi_from_pcidev ?

> 
>> +int gsi;
> 
> Is "int" legitimate to use here? Doesn't this want to similarly be __u32?
I want to set gsi to negative if there is no record of this translation.

> 
>> --- a/tools/include/xencall.h
>> +++ b/tools/include/xencall.h
>> @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op,
>>   uint64_t arg1, uint64_t arg2, uint64_t arg3,
>>   uint64_t arg4, uint64_t arg5);
>>  
>> +int xen_oscall_gsi_from_dev(xencall_handle *xcall, unsigned int sbdf);
> 
> Hmm, something (by name at least) OS-specific being in the public header
> and ...
> 
>> --- a/tools/libs/call/libxencall.map
>> +++ b/tools/libs/call/libxencall.map
>> @@ -10,6 +10,8 @@ VERS_1.0 {
>>  xencall4;
>>  xencall5;
>>  
>> +xen_oscall_gsi_from_dev;
> 
> ... map file. I'm not sure things are intended to be this way.
Let's see other maintainer's opinion.

> 
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>  #endif
>>  }
>>  
>> +#define PCI_DEVID(bus, devfn)\
>> +uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>> +
>> +#define PCI_SBDF(seg, bus, devfn) \
>> +uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
> 
> I'm not a maintainer of this file; if I were, I'd ask that for readability's
> sake all excess parentheses be dropped from these.
Isn't it a coding requirement to enclose each element in parentheses in the 
macro definition?
It seems other files also do this. See tools/libs/light/libxl_internal.h

> 
>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  goto out_no_irq;
>>  }
>>  if ((fscanf(f, "%u", ) == 1) && irq) {
>> +#ifdef CONFIG_X86
>> +sbdf = PCI_SBDF(pci->domain, pci->bus,
>> +(PCI_DEVFN(pci->dev, pci->func)));
>> +gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>> +/*
>> + * Old kernel version may not support this function,
> 
> Just kernel?
Yes, xc_physdev_gsi_from_dev depends on the function implemented on linux 
kernel side.
> 
>> + * so if fail, keep using irq; if success, use gsi
>> + */
>> +if (gsi > 0) {
>> +irq = gsi;
> 
> I'm still puzzled by this, when by now I think we've sufficiently clarified
> that IRQs and GSIs use two distinct numbering spaces.
> 
> Also, as previously indicated, you call this for PV Dom0 as well. Aiui on
> the assumption that it'll fail. What if we decide to make the functionality
> available there, too (if only for informational purposes, or for
> consistency)? Suddenly you're fallback logic wouldn't work anymore, and
> you'd call ...
> 
>> +}
>> +#endif
>>  r = xc_physdev_map_pirq(ctx->xch, domid, irq, );
> 
> ... the function with a GSI when a pIRQ is meant. Imo, as suggested before,
> you strictly want to avoid the call on PV Dom0.
> 
> Also for PVH Dom0: I don't think I've seen changes to the hypercall
> handling, yet. How can that be when GSI and IRQ aren't the same, and hence
> incoming GSI would need translating to IRQ somewhere? I can once again only
> assume all your testing was done with IRQs whose numbers happened to match
> their GSI numbers. (The difference, imo, would also need calling out in the
> public header, where the respective interface struct(s) is/are defined.)
I feel like you missed out on many of the previous discussions.
Without my changes, the original codes use irq (read from file 
/sys/bus/pci/devices//irq) to do xc_physdev_map_pirq,
but xc_physdev_map_pirq require passing into gsi instead of irq, so we need to 
use gsi whether dom0 is PV or PVH, so for the original codes, they are wrong.
Just because by chance, the irq value in the Linux kernel of pv dom0 is equal 
to the gsi value, so there was no problem with the original pv passthrough.
But not

Re: [XEN PATCH v10 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-18 Thread Chen, Jiqian

On 2024/6/17 22:52, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> The gsi of a passthrough device must be configured for it to be
>> able to be mapped into a hvm domU.
>> But When dom0 is PVH, the gsis don't get registered, it causes
>> the info of apic, pin and irq not be added into irq_2_pin list,
>> and the handler of irq_desc is not set, then when passthrough a
>> device, setting ioapic affinity and vector will fail.
>>
>> To fix above problem, on Linux kernel side, a new code will
>> need to call PHYSDEVOP_setup_gsi for passthrough devices to
>> register gsi when dom0 is PVH.
>>
>> So, add PHYSDEVOP_setup_gsi into hvm_physdev_op for above
>> purpose.
>>
>> Signed-off-by: Jiqian Chen 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>> The code link that will call this hypercall on linux kernel side is as 
>> follows:
>> https://lore.kernel.org/xen-devel/20240607075109.126277-3-jiqian.c...@amd.com/
> 
> One of my v9 comments was addressed, thanks. Repeating the other, unaddressed
> one here:
> "As to GSIs not being registered: If that's not a problem for Dom0's own
>  operation, I think it'll also want/need explaining why what is sufficient for
>  Dom0 alone isn't sufficient when pass-through comes into play."
I have modified the commit message to describe why GSIs are not registered can 
cause passthrough not work, according to this v9 comment.
" it causes the info of apic, pin and irq not be added into irq_2_pin list, and 
the handler of irq_desc is not set, then when passthrough a device, setting 
ioapic affinity and vector will fail."
What description do you want me to add?

> 
> Jan
> 

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v10 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-18 Thread Chen, Jiqian

On 2024/6/17 22:45, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq.
> 
> Why "failed path"? Isn't unmapping also part of normal device removal
> from a guest?
Yes, both. I will change to also "allow PHYSDEVOP_unmap_pirq for the device 
removal path to unmap pirq".

> 
>> And
>> add a new check to prevent self map when subject domain has no
>> PIRQ flag.
> 
> You still talk of only self mapping, and the code also still does only
> that. As pointed out before: Why would you allow mapping into a PVH
> DomU? IOW what purpose do the "d == currd" checks have?
The checking I added has two purpose, first is I need to allow this case:
Dom0(without PIRQ) + DomU(with PIRQ), because the original code just do 
(!has_pirq(currd)) will cause map_pirq fail in this case.
Second I need to disallow self-mapping:
DomU(without PIRQ) do map_pirq, the "d==currd" means the currd is the subject 
domain itself.

Emmm, I think I know what's your concern.
Do you mean I need to
" Prevent map_pirq when currd has no X86_EMU_USE_PIRQ flag "
instead of
" Prevent self-map when currd has no X86_EMU_USE_PIRQ flag ",
so I need to remove "d==currd", right?

> 
>> So that domU with PIRQ flag can success to map pirq for
>> passthrough devices even dom0 has no PIRQ flag.
> 
> There's still a description problem here. Much like the first sentence,
> this last one also says that the guest would itself map the pIRQ. In
> which case there would still not be any reason to expose the sub-
> functions to Dom0.
If change to " So that the interrupt of a passthrough device can success to be 
mapped to pirq for domU with PIRQ flag when dom0 is PVH.",
Is it OK?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v10 1/5] xen/vpci: Clear all vpci status of device

2024-06-18 Thread Chen, Jiqian

On 2024/6/17 22:17, Jan Beulich wrote:
> On 17.06.2024 11:00, Jiqian Chen wrote:
>> --- a/xen/drivers/pci/physdev.c
>> +++ b/xen/drivers/pci/physdev.c
>> @@ -2,11 +2,17 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #ifndef COMPAT
>>  typedef long ret_t;
>>  #endif
>>  
>> +static const struct pci_device_state_reset_method
>> +pci_device_state_reset_methods[] = {
>> +[ DEVICE_RESET_FLR ].reset_fn = vpci_reset_device_state,
>> +};
> 
> What about the other three DEVICE_RESET_*? In particular ...
I don't know how to implement the other three types of reset.
This is a design form so that corresponding processing functions can be added 
later if necessary. Do I need to set them to NULL pointers in this array?
Does this form conform to your previous suggestion of using only one hypercall 
to handle all types of resets?

> 
>> @@ -67,6 +73,43 @@ ret_t pci_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  break;
>>  }
>>  
>> +case PHYSDEVOP_pci_device_state_reset: {
>> +struct pci_device_state_reset dev_reset;
>> +struct physdev_pci_device *dev;
>> +struct pci_dev *pdev;
>> +pci_sbdf_t sbdf;
>> +
>> +if ( !is_pci_passthrough_enabled() )
>> +return -EOPNOTSUPP;
>> +
>> +ret = -EFAULT;
>> +if ( copy_from_guest(_reset, arg, 1) != 0 )
>> +break;
>> +dev = _reset.dev;
>> +sbdf = PCI_SBDF(dev->seg, dev->bus, dev->devfn);
>> +
>> +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>> +if ( ret )
>> +break;
>> +
>> +pcidevs_lock();
>> +pdev = pci_get_pdev(NULL, sbdf);
>> +if ( !pdev )
>> +{
>> +pcidevs_unlock();
>> +ret = -ENODEV;
>> +break;
>> +}
>> +
>> +write_lock(>domain->pci_lock);
>> +pcidevs_unlock();
>> +ret = 
>> pci_device_state_reset_methods[dev_reset.reset_type].reset_fn(pdev);
> 
> ... you're setting this up for calling NULL. In fact there's also no bounds
> check for the array index.
Oh, right. I will add checks next version.

> 
> Also, nit (further up): Opening figure braces for a new scope go onto their
OK, will change in next version.
> own line. Then again I notice that apparenly _all_ other instances in this
> file are doing it the wrong way, too.
Do I need to change them in this patch?
> 
> Finally, is the "dev" local variable really needed? It effectively hides that
> PCI_SBDF() is invoked on the hypercall arguments.
Will remove "dev" in next version.
> 
>> +write_unlock(>domain->pci_lock);
>> +if ( ret )
>> +printk(XENLOG_ERR "%pp: failed to reset vPCI device state\n", 
>> );
> 
> Maybe downgrade to dprintk()? The caller ought to handle the error anyway.
Will downgrade in next version.
> 
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -172,6 +172,15 @@ int vpci_assign_device(struct pci_dev *pdev)
>>  
>>  return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev)
> 
> As a target of an indirect call this needs to be annotated cf_check (both
> here and in the declaration, unlike __must_check, which is sufficient to
> have on just the declaration).
OK, will add cf_check in next version.
> 
>> --- a/xen/include/xen/pci.h
>> +++ b/xen/include/xen/pci.h
>> @@ -156,6 +156,22 @@ struct pci_dev {
>>  struct vpci *vpci;
>>  };
>>  
>> +struct pci_device_state_reset_method {
>> +int (*reset_fn)(struct pci_dev *pdev);
>> +};
>> +
>> +enum pci_device_state_reset_type {
>> +DEVICE_RESET_FLR,
>> +DEVICE_RESET_COLD,
>> +DEVICE_RESET_WARM,
>> +DEVICE_RESET_HOT,
>> +};
>> +
>> +struct pci_device_state_reset {
>> +struct physdev_pci_device dev;
>> +enum pci_device_state_reset_type reset_type;
>> +};
> 
> This is the struct to use as hypercall argument. How can it live outside of
> any public header? Also, when moving it there, beware that you should not
> use enum-s there. Only handles and fixed-width types are permitted.t
Yes, I put them there before, but enum is not permitted.
Then, do you have other suggested type to use to distinguish different types of 
resets, because enum can't work in the public header?

> 
>> --- a/xen/include/xen/vpci.h
>> +++ b/xen/include/xen/vpci.h
>> @@ -38,6 +38,7 @@ int __must_check vpci_assign_device(struct pci_dev *pdev);
>>  
>>  /* Remove all handlers and free vpci related structures. */
>>  void vpci_deassign_device(struct pci_dev *pdev);
>> +int __must_check vpci_reset_device_state(struct pci_dev *pdev);
> 
> What's the purpose of this __must_check, when the sole caller is calling
> this through a function pointer, which isn't similarly annotated?
This is what I added before introducing function pointers, but after modifying 
the implementation, it was not taken into account.
I will remove __must_check and change to cf_check, according to your above 
comment.

> 
> Jan

-- 
Best

Re: [XEN PATCH v10 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-17 Thread Chen, Jiqian

Hi Daniel,

On 2024/6/17 17:00, Jiqian Chen wrote:
> Some type of domain don't have PIRQs, like PVH, it doesn't do
> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
> to guest base on PVH dom0, callstack
> pci_add_dm_done->XEN_DOMCTL_irq_permission will fail at function
> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq and
> irq on Xen side.
> What's more, current hypercall XEN_DOMCTL_irq_permission requires
> passing in pirq, it is not suitable for dom0 that doesn't have
> PIRQs.
> 
> So, add a new hypercall XEN_DOMCTL_gsi_permission to grant the
> permission of irq(translate from gsi) to dumU when dom0 has no
> PIRQs.
> 
> Signed-off-by: Jiqian Chen 
> Signed-off-by: Huang Rui 
> Signed-off-by: Jiqian Chen 
> ---
> RFC: it needs review and needs to wait for the corresponding third patch on 
> linux kernel side to be merged.
> ---
>  tools/include/xenctrl.h|  5 +++
>  tools/libs/ctrl/xc_domain.c| 15 +++
>  tools/libs/light/libxl_pci.c   | 67 +++---
>  xen/arch/x86/domctl.c  | 43 +++
>  xen/arch/x86/include/asm/io_apic.h |  2 +
>  xen/arch/x86/io_apic.c | 17 
>  xen/arch/x86/mpparse.c |  3 +-
>  xen/include/public/domctl.h|  8 
>  xen/xsm/flask/hooks.c  |  1 +
>  9 files changed, 153 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
> index a0381f74d24b..f3feb6848e25 100644
> --- a/tools/include/xenctrl.h
> +++ b/tools/include/xenctrl.h
> @@ -1382,6 +1382,11 @@ int xc_domain_irq_permission(xc_interface *xch,
>   uint32_t pirq,
>   bool allow_access);
>  
> +int xc_domain_gsi_permission(xc_interface *xch,
> + uint32_t domid,
> + uint32_t gsi,
> + bool allow_access);
> +
>  int xc_domain_iomem_permission(xc_interface *xch,
> uint32_t domid,
> unsigned long first_mfn,
> diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
> index f2d9d14b4d9f..8540e84fda93 100644
> --- a/tools/libs/ctrl/xc_domain.c
> +++ b/tools/libs/ctrl/xc_domain.c
> @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
>  return do_domctl(xch, );
>  }
>  
> +int xc_domain_gsi_permission(xc_interface *xch,
> + uint32_t domid,
> + uint32_t gsi,
> + bool allow_access)
> +{
> +struct xen_domctl domctl = {
> +.cmd = XEN_DOMCTL_gsi_permission,
> +.domain = domid,
> +.u.gsi_permission.gsi = gsi,
> +.u.gsi_permission.allow_access = allow_access,
> +};
> +
> +return do_domctl(xch, );
> +}
> +
>  int xc_domain_iomem_permission(xc_interface *xch,
> uint32_t domid,
> unsigned long first_mfn,
> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
> index 376f91759ac6..f027f22c0028 100644
> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -1431,6 +1431,9 @@ static void pci_add_dm_done(libxl__egc *egc,
>  uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>  uint32_t domainid = domid;
>  bool isstubdom = libxl_is_stubdom(ctx, domid, );
> +#ifdef CONFIG_X86
> +xc_domaininfo_t info;
> +#endif
>  
>  /* Convenience aliases */
>  bool starting = pas->starting;
> @@ -1516,14 +1519,39 @@ static void pci_add_dm_done(libxl__egc *egc,
>  rc = ERROR_FAIL;
>  goto out;
>  }
> -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
> +#ifdef CONFIG_X86
> +/* If dom0 doesn't have PIRQs, need to use xc_domain_gsi_permission 
> */
> +r = xc_domain_getinfo_single(ctx->xch, 0, );
>  if (r < 0) {
> -LOGED(ERROR, domainid,
> -  "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
> +LOGED(ERROR, domainid, "getdomaininfo failed (error=%d)", errno);
>  fclose(f);
>  rc = ERROR_FAIL;
>  goto out;
>  }
> +if (info.flags & XEN_DOMINF_hvm_guest &&
> +!(info.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ) &&
> +gsi > 0) {
> +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
> +if (r < 0) {
> +LOGED(ERROR, domainid,
> +"xc_domain_gsi_permission gsi=%d (error=%d)", gsi, 
> errno);
> +fclose(f);
> +rc = ERROR_FAIL;
> +goto out;
> +}
> +}
> +else
> +#endif
> +{
> +r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
> +if (r < 0) {
> +LOGED(ERROR, domainid,
> +"xc_domain_irq_permission irq=%d

Re: [RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-14 Thread Chen, Jiqian

On 2024/6/14 14:41, Jan Beulich wrote:
> On 14.06.2024 05:11, Chen, Jiqian wrote:
>> On 2024/6/13 20:51, Anthony PERARD wrote:
>>> On Wed, Jun 12, 2024 at 10:55:14AM +, Chen, Jiqian wrote:
>>>> On 2024/6/12 18:34, Jan Beulich wrote:
>>>>> On 12.06.2024 12:12, Chen, Jiqian wrote:
>>>>>> On 2024/6/11 22:39, Jan Beulich wrote:
>>>>>>> On 07.06.2024 10:11, Jiqian Chen wrote:
>>>>>>>> +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map);
>>>>>>>
>>>>>>> Looking at the hypervisor side, this will fail for PV Dom0. In which 
>>>>>>> case imo
>>>>>>> you better would avoid making the call in the first place.
>>>>>> Yes, for PV dom0, the errno is EOPNOTSUPP, then it will do below 
>>>>>> xc_domain_irq_permission.
>>>>>
>>>>> Hence why call xc_domain_gsi_permission() at all on a PV Dom0?
>>>> Is there a function to distinguish that current dom0 is PV or PVH dom0 in 
>>>> tools/libs?
>>>
>>> That might have never been needed before, so probably not. There's
>>> libxl__domain_type() but if that works with dom0 it might return "HVM"
>>> for PVH dom0. So if xc_domain_getinfo_single() works and give the right
>>> info about dom0, libxl__domain_type() could be extended to deal with
>>> dom0 I guess. I don't know if there's a good way to find out which
>>> flavor of dom0 is running.
>> Thanks Anthony!
>> I think here we really need to check is that whether current domain has PIRQ 
>> flag(X86_EMU_USE_PIRQ) or not.
>> And it seems xc_domain_gsi_permission already return the information.
> 
> By way of failing, if I'm not mistaken? As indicated before, I don't
> think you should invoke the function when it's clear it's going to fail.
Sorry, I wrote wrong here, it should be " And it seems xc_domain_getinfo_single 
already return the information."
And next version will be like:
xc_domaininfo_t xcinfo;
xc_domain_getinfo_single(xc_handle, domid, );
if( xcinfo.arch_config.emulation_flags & XEN_X86_EMU_USE_PIRQ )
xc_domain_irq_permission
else
xc_domain_gsi_permission

> 
> Jan
> 
>> If current domain has no PIRQs, then I should use xc_domain_gsi_permission 
>> to grant permission, otherwise I should
>> keep the original function xc_domain_irq_permission.
>>
>>>
>>> Cheers,
>>>
>>
> 

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-13 Thread Chen, Jiqian

Hi Daniel,

On 2024/6/11 22:39, Jan Beulich wrote:
> On 07.06.2024 10:11, Jiqian Chen wrote: 
>> +case XEN_DOMCTL_gsi_permission:
>> +{
>> +unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +int irq = gsi_2_irq(gsi); 
>> +bool allow = domctl->u.gsi_permission.allow_access; 
>> +/*
>> + * If current domain is PV or it has PIRQ flag, it has a mapping
>> + * of gsi, pirq and irq, so it should use XEN_DOMCTL_irq_permission
>> + * to grant irq permission.
>> + */
>> +if ( is_pv_domain(current->domain) || has_pirq(current->domain) ) 
>> +{
>> +ret = -EOPNOTSUPP;
>> +break;
>> +}
>> +
>> +if ( gsi >= nr_irqs_gsi || irq < 0 )
>> +{
>> +ret = -EINVAL;
>> +break;
>> +}
>> +
>> +if ( !irq_access_permitted(current->domain, irq) ||
>> + xsm_irq_permission(XSM_HOOK, d, irq, allow) )
> 
> Daniel, is it okay to issue the XSM check using the translated value, not
> the one that was originally passed into the hypercall?
Is it okay?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-13 Thread Chen, Jiqian

On 2024/6/13 20:51, Anthony PERARD wrote:
> On Wed, Jun 12, 2024 at 10:55:14AM +0000, Chen, Jiqian wrote:
>> On 2024/6/12 18:34, Jan Beulich wrote:
>>> On 12.06.2024 12:12, Chen, Jiqian wrote:
>>>> On 2024/6/11 22:39, Jan Beulich wrote:
>>>>> On 07.06.2024 10:11, Jiqian Chen wrote:
>>>>>> +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map);
>>>>>
>>>>> Looking at the hypervisor side, this will fail for PV Dom0. In which case 
>>>>> imo
>>>>> you better would avoid making the call in the first place.
>>>> Yes, for PV dom0, the errno is EOPNOTSUPP, then it will do below 
>>>> xc_domain_irq_permission.
>>>
>>> Hence why call xc_domain_gsi_permission() at all on a PV Dom0?
>> Is there a function to distinguish that current dom0 is PV or PVH dom0 in 
>> tools/libs?
> 
> That might have never been needed before, so probably not. There's
> libxl__domain_type() but if that works with dom0 it might return "HVM"
> for PVH dom0. So if xc_domain_getinfo_single() works and give the right
> info about dom0, libxl__domain_type() could be extended to deal with
> dom0 I guess. I don't know if there's a good way to find out which
> flavor of dom0 is running.
Thanks Anthony!
I think here we really need to check is that whether current domain has PIRQ 
flag(X86_EMU_USE_PIRQ) or not.
And it seems xc_domain_gsi_permission already return the information.
If current domain has no PIRQs, then I should use xc_domain_gsi_permission to 
grant permission, otherwise I should
keep the original function xc_domain_irq_permission.

> 
> Cheers,
> 

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-12 Thread Chen, Jiqian

On 2024/6/12 18:34, Jan Beulich wrote:
> On 12.06.2024 12:12, Chen, Jiqian wrote:
>> On 2024/6/11 22:39, Jan Beulich wrote:
>>> On 07.06.2024 10:11, Jiqian Chen wrote:
>>>> Some type of domain don't have PIRQ, like PVH, it do not do
>>>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>>>> to guest on PVH dom0, callstack
>>>> pci_add_dm_done->XEN_DOMCTL_irq_permission will failed at
>>>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq
>>>> and irq on Xen side.
>>>
>>> All of this is, to me at least, in pretty sharp contradiction to what
>>> patch 2 says and does. IOW: Do we want the concept of pIRQ in PVH, or
>>> do we want to keep that to PV?
>> It's not contradictory.
>> What I did is not to add the concept of PIRQs for PVH.
> 
> After your further explanations on patch 2 - yes, I see now. But in particular
> there it needs making more clear what case it is that is being enabled by the
> changes.
OK, I will add some descriptions in next version.

> 
>>>> Signed-off-by: Huang Rui 
>>>> Signed-off-by: Jiqian Chen 
>>>
>>> A problem throughout the series as it seems: Who's the author of these
>>> patches? There's no From: saying it's not you, but your S-o-b also
>>> isn't first.
>> So I need to change to:
>> Signed-off-by: Jiqian Chen  means I am the author.
>> Signed-off-by: Huang Rui  means Rui sent them to upstream 
>> firstly.
>> Signed-off-by: Jiqian Chen  means I take continue to 
>> upstream.
> 
> I guess so, yes.
Thanks.

> 
>>>> --- a/tools/libs/light/libxl_pci.c
>>>> +++ b/tools/libs/light/libxl_pci.c
>>>> @@ -1412,6 +1412,37 @@ static bool pci_supp_legacy_irq(void)
>>>>  #define PCI_SBDF(seg, bus, devfn) \
>>>>  uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>  
>>>> +static int pci_device_set_gsi(libxl_ctx *ctx,
>>>> +  libxl_domid domid,
>>>> +  libxl_device_pci *pci,
>>>> +  bool map,
>>>> +  int *gsi_back)
>>>> +{
>>>> +int r, gsi, pirq;
>>>> +uint32_t sbdf;
>>>> +
>>>> +sbdf = PCI_SBDF(pci->domain, pci->bus, (PCI_DEVFN(pci->dev, 
>>>> pci->func)));
>>>> +r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>> +*gsi_back = r;
>>>> +if (r < 0)
>>>> +return r;
>>>> +
>>>> +gsi = r;
>>>> +pirq = r;
>>>
>>> r is a GSI as per above; why would you store such in a variable named pirq?
>>> And how can ...
>>>
>>>> +if (map)
>>>> +r = xc_physdev_map_pirq(ctx->xch, domid, gsi, );
>>>> +else
>>>> +r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq);
>>>
>>> ... that value be the correct one to pass into here? In fact, the pIRQ 
>>> number
>>> you obtain above in the "map" case isn't handed to the caller, i.e. it is
>>> effectively lost. Yet that's what would need passing into such an unmap 
>>> call.
>> Yes r is GSI and I know pirq will be replaced by xc_physdev_map_pirq.
>> What I do "pirq = r" is for xc_physdev_unmap_pirq, unmap need passing in 
>> pirq,
>> and the number of pirq is always equal to gsi.
> 
> Why would that be? pIRQ is purely a software construct (of Xen's), I
> don't think there's any guarantee whatsoever on the numbering. And even
> if there was (for e.g. non-MSI ones), it would be pIRQ == IRQ. And recall
> that elsewhere I think I meanwhile succeeded in explaining to you that
> IRQ != GSI (in the common case, even if in most cases they match).
OK, will change in next version.

> 
>>>> +if (r)
>>>> +return r;
>>>> +
>>>> +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map);
>>>
>>> Looking at the hypervisor side, this will fail for PV Dom0. In which case 
>>> imo
>>> you better would avoid making the call in the first place.
>> Yes, for PV dom0, the errno is EOPNOTSUPP, then it will do below 
>> xc_domain_irq_permission.
> 
> Hence why call xc_domain_gsi_permission() at all on a PV Dom0?
Is there a function to distinguish that current dom0 is PV or PVH dom0 in 
tools/libs?

> 
>>>> +if (r && errno == EOPNOTSUPP)
>>>
>>> Before here you d

Re: [XEN PATCH v9 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-12 Thread Chen, Jiqian

On 2024/6/12 17:21, Jan Beulich wrote:
> On 12.06.2024 11:07, Chen, Jiqian wrote:
>> On 2024/6/12 16:53, Jan Beulich wrote:
>>> On 12.06.2024 04:43, Chen, Jiqian wrote:
>>>> On 2024/6/10 23:58, Jan Beulich wrote:
>>>>> On 07.06.2024 10:11, Jiqian Chen wrote:
>>>>>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>>>>>> a passthrough device by using gsi, see qemu code
>>>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>>>>>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>>>>>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>>>>>> is not allowed because currd is PVH dom0 and PVH has no
>>>>>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>>>>>
>>>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>>>> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
>>>>>> add a new check to prevent self map when subject domain has no
>>>>>> PIRQ flag.
>>>>>>
>>>>>> Signed-off-by: Huang Rui 
>>>>>> Signed-off-by: Jiqian Chen 
>>>>>> Reviewed-by: Stefano Stabellini 
>>>>>
>>>>> What's imo missing in the description is a clarification / justification 
>>>>> of
>>>>> why it is going to be a good idea (or at least an acceptable one) to 
>>>>> expose
>>>>> the concept of PIRQs to PVH. If I'm not mistaken that concept so far has
>>>>> been entirely a PV one.
>>>> I didn't want to expose the concept of PIRQs to PVH.
>>>> I did this patch is for HVM that use PIRQs, what I said in commit message 
>>>> is HVM will map a pirq for gsi, not PVH.
>>>> For the original code, it checks " !has_pirq(currd)", but currd is PVH 
>>>> dom0, so it failed. So I need to allow PHYSDEVOP_map_pirq
>>>> even currd has no PIRQs, but the subject domain has.
>>>
>>> But that's not what you're enforcing in do_physdev_op(). There you only
>>> prevent self-mapping. If I'm not mistaken all you need to do is drop the
>>> "d == current->domain" checks from those conditionals.
>> What I want is to allow PHYSDEVOP_map_pirq when currd doesn't have PIRQs, 
>> but subject domain has.
>> Then I just add "break" in hvm_physdev_op without any checks, that will 
>> cause self-mapping problems.
>> And in previous mail thread, you suggested me to prevent self-mapping when 
>> subject domain doesn't have PIRQs.
>> So I added checks in do_physdev_op.
> 
> Self-mapping was a primary concern of mine. Yet why deal with only a subset
> of what needs preventing, when generalizing things actually can be done by
> having less code.
Make sense. I will rebase the branch once your codes are merged.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v9 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-06-12 Thread Chen, Jiqian

On 2024/6/11 00:04, Jan Beulich wrote:
> On 07.06.2024 10:11, Jiqian Chen wrote:
>> On PVH dom0, the gsis don't get registered, but
>> the gsi of a passthrough device must be configured for it to
>> be able to be mapped into a hvm domU.
>> On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
>> passthrough devices to register gsi when dom0 is PVH.
> 
> "it calls" implies that ...
> 
>> So, add PHYSDEVOP_setup_gsi for above purpose.
>>
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>> The code link that will call this hypercall on linux kernel side is as 
>> follows
>> https://lore.kernel.org/lkml/20240607075109.126277-3-jiqian.c...@amd.com/T/#u
> 
> ... the code only to be added there would already be upstream. As I think the
> hypervisor change wants to come first, this part of the description will want
> re-wording to along the lines of "will need to" or some such.
Thanks, I will change in next version.

> 
> As to GSIs not being registered: If that's not a problem for Dom0's own
> operation, I think it'll also want/need explaining why what is sufficient for
> Dom0 alone isn't sufficient when pass-through comes into play.
OK, I will add in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-12 Thread Chen, Jiqian

Hi Jan,

On 2024/6/11 22:39, Jan Beulich wrote:
> On 07.06.2024 10:11, Jiqian Chen wrote:
>> Some type of domain don't have PIRQ, like PVH, it do not do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will failed at
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq
>> and irq on Xen side.
> 
> All of this is, to me at least, in pretty sharp contradiction to what
> patch 2 says and does. IOW: Do we want the concept of pIRQ in PVH, or
> do we want to keep that to PV?
It's not contradictory.
What I did is not to add the concept of PIRQs for PVH.
All previous passthrough code was implemented on the basis of pv dom0 + hvm 
domU.
For pv dom0, it has PIRQs. For hvm domU, it has PIRQs too.
So the codes are not suitable for PVH dom0 + hvm domU, because PVH dom0 has no 
PIRQs.
Patch 2 do PHYSDEVOP_map_pirq for hvm domU even when dom0 is PVH instead of PV. 
It didn't add PIRQs for PVH.
This patch is to grant irq( that get from gsi ) to hvm domU, why 
XEN_DOMCTL_irq_permission is not useful is because PVH has no PIRQs, we can't 
get irq through pirq like PV does.

> 
>> What's more, current hypercall XEN_DOMCTL_irq_permission require
>> passing in pirq and grant the access of irq, it is not suitable
>> for dom0 that has no PIRQ flag, because passthrough a device
>> needs gsi and grant the corresponding irq to guest. So, add a
>> new hypercall to grant gsi permission when dom0 is not PV or dom0
>> has not PIRQ flag.
>>
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
> 
> A problem throughout the series as it seems: Who's the author of these
> patches? There's no From: saying it's not you, but your S-o-b also
> isn't first.
So I need to change to:
Signed-off-by: Jiqian Chen  means I am the author.
Signed-off-by: Huang Rui  means Rui sent them to upstream 
firstly.
Signed-off-by: Jiqian Chen  means I take continue to 
upstream.

> 
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1412,6 +1412,37 @@ static bool pci_supp_legacy_irq(void)
>>  #define PCI_SBDF(seg, bus, devfn) \
>>  uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>  
>> +static int pci_device_set_gsi(libxl_ctx *ctx,
>> +  libxl_domid domid,
>> +  libxl_device_pci *pci,
>> +  bool map,
>> +  int *gsi_back)
>> +{
>> +int r, gsi, pirq;
>> +uint32_t sbdf;
>> +
>> +sbdf = PCI_SBDF(pci->domain, pci->bus, (PCI_DEVFN(pci->dev, 
>> pci->func)));
>> +r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>> +*gsi_back = r;
>> +if (r < 0)
>> +return r;
>> +
>> +gsi = r;
>> +pirq = r;
> 
> r is a GSI as per above; why would you store such in a variable named pirq?
> And how can ...
> 
>> +if (map)
>> +r = xc_physdev_map_pirq(ctx->xch, domid, gsi, );
>> +else
>> +r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq);
> 
> ... that value be the correct one to pass into here? In fact, the pIRQ number
> you obtain above in the "map" case isn't handed to the caller, i.e. it is
> effectively lost. Yet that's what would need passing into such an unmap call.
Yes r is GSI and I know pirq will be replaced by xc_physdev_map_pirq.
What I do "pirq = r" is for xc_physdev_unmap_pirq, unmap need passing in pirq,
and the number of pirq is always equal to gsi.

> 
>> +if (r)
>> +return r;
>> +
>> +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map);
> 
> Looking at the hypervisor side, this will fail for PV Dom0. In which case imo
> you better would avoid making the call in the first place.
Yes, for PV dom0, the errno is EOPNOTSUPP, then it will do below 
xc_domain_irq_permission.

> 
>> +if (r && errno == EOPNOTSUPP)
> 
> Before here you don't really need the pIRQ number; if all it really is needed
> for is ...
> 
>> +r = xc_domain_irq_permission(ctx->xch, domid, pirq, map);
> 
> ... this, then it probably also should only be obtained when it's needed. Yet
> overall the intentions here aren't quite clear to me.
Adding the function pci_device_set_gsi is for PVH dom0, while also ensuring 
compatibility with PV dom0.
When PVH dom0, it does xc_physdev_map_pirq and xc_domain_gsi_permission(new 
hypercall for PVH dom0)
When PV dom0, it keeps the same actions as before codes, it does 
xc_physdev_map_pirq and xc_domain_irq_permission.

> 
>> @@ -1485,6 +1516,19 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  fclose(f);
>>  if (!pci_supp_legacy_irq())
>>  goto out_no_irq;
>> +
>> +r = pci_device_set_gsi(ctx, domid, pci, 1, );
>> +if (gsi >= 0) {
>> +if (r < 0) {
> 
> This unusual way of error checking likely wants a comment.
Will add in next version.

> 
>> +rc = ERROR_FAIL;
>> +LOGED(ERROR, domainid,
>> +  "pci_device_set_gsi gsi=%d (error=%d)", gsi, errno);

Re: [XEN PATCH v9 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-12 Thread Chen, Jiqian

On 2024/6/12 16:53, Jan Beulich wrote:
> On 12.06.2024 04:43, Chen, Jiqian wrote:
>> On 2024/6/10 23:58, Jan Beulich wrote:
>>> On 07.06.2024 10:11, Jiqian Chen wrote:
>>>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>>>> a passthrough device by using gsi, see qemu code
>>>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>>>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>>>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>>>> is not allowed because currd is PVH dom0 and PVH has no
>>>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>>>
>>>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>>>> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
>>>> add a new check to prevent self map when subject domain has no
>>>> PIRQ flag.
>>>>
>>>> Signed-off-by: Huang Rui 
>>>> Signed-off-by: Jiqian Chen 
>>>> Reviewed-by: Stefano Stabellini 
>>>
>>> What's imo missing in the description is a clarification / justification of
>>> why it is going to be a good idea (or at least an acceptable one) to expose
>>> the concept of PIRQs to PVH. If I'm not mistaken that concept so far has
>>> been entirely a PV one.
>> I didn't want to expose the concept of PIRQs to PVH.
>> I did this patch is for HVM that use PIRQs, what I said in commit message is 
>> HVM will map a pirq for gsi, not PVH.
>> For the original code, it checks " !has_pirq(currd)", but currd is PVH dom0, 
>> so it failed. So I need to allow PHYSDEVOP_map_pirq
>> even currd has no PIRQs, but the subject domain has.
> 
> But that's not what you're enforcing in do_physdev_op(). There you only
> prevent self-mapping. If I'm not mistaken all you need to do is drop the
> "d == current->domain" checks from those conditionals.
What I want is to allow PHYSDEVOP_map_pirq when currd doesn't have PIRQs, but 
subject domain has.
Then I just add "break" in hvm_physdev_op without any checks, that will cause 
self-mapping problems.
And in previous mail thread, you suggested me to prevent self-mapping when 
subject domain doesn't have PIRQs.
So I added checks in do_physdev_op.

> 
> Further see also
> https://lists.xen.org/archives/html/xen-devel/2024-06/msg00540.html.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v9 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-06-11 Thread Chen, Jiqian

On 2024/6/10 23:58, Jan Beulich wrote:
> On 07.06.2024 10:11, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see qemu code
>> xen_pt_realize->xc_physdev_map_pirq and libxl code
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
>> add a new check to prevent self map when subject domain has no
>> PIRQ flag.
>>
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> Reviewed-by: Stefano Stabellini 
> 
> What's imo missing in the description is a clarification / justification of
> why it is going to be a good idea (or at least an acceptable one) to expose
> the concept of PIRQs to PVH. If I'm not mistaken that concept so far has
> been entirely a PV one.
I didn't want to expose the concept of PIRQs to PVH.
I did this patch is for HVM that use PIRQs, what I said in commit message is 
HVM will map a pirq for gsi, not PVH.
For the original code, it checks " !has_pirq(currd)", but currd is PVH dom0, so 
it failed. So I need to allow PHYSDEVOP_map_pirq
even currd has no PIRQs, but the subject domain has.

> 
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -71,8 +71,14 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
>> arg)
>>  
>>  switch ( cmd )
>>  {
>> +/*
>> + * Only being permitted for management of other domains.
>> + * Further restrictions are enforced in do_physdev_op.
>> + */
>>  case PHYSDEVOP_map_pirq:
>>  case PHYSDEVOP_unmap_pirq:
>> +break;
> 
> Nit: Imo such a comment ought to be indented like code (statements), not
> like the case labels.
Thanks, I will change in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v9 0/5] Support device passthrough when dom0 is PVH on Xen

2024-06-11 Thread Chen, Jiqian

On 2024/6/11 00:07, Jan Beulich wrote:
> On 07.06.2024 10:11, Jiqian Chen wrote:
>> Hi All,
>> This is v9 series to support passthrough when dom0 is PVH
>> v8->v9 changes:
>> * patch#1: Move pcidevs_unlock below write_lock, and remove 
>> "ASSERT(pcidevs_locked());" from vpci_reset_device_state;
>>Add pci_device_state_reset_type to distinguish the reset types.
>> * patch#2: Add a comment above PHYSDEVOP_map_pirq to describe why need this 
>> hypercall.
>>Change "!is_pv_domain(d)" to "is_hvm_domain(d)", and "map.domid 
>> == DOMID_SELF" to "d == current->domian".
>> * patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same 
>> checke in below.
> 
> Having looked at patch 3, what check(s) is (are) being talked about here?
> It feels as if to understand this revision log entry, one would still need
> to go back to the earlier version. Yet the purpose of these is that one
> (preferably) wouldn't need to do so.
Sorry, it should be:
patch#3: Remove the check of PHYSDEVOP_setup_gsi, since there is same check in 
below. Although their return values are different, this difference is 
acceptable for the sake of code consistency
 if ( !is_hardware_domain(currd) )
 return -ENOSYS;
 break;
I will change in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-05 Thread Chen, Jiqian

On 2024/6/5 18:09, Jan Beulich wrote:
> On 05.06.2024 09:04, Chen, Jiqian wrote:
>> On 2024/6/5 01:17, Jan Beulich wrote:
>>> On 04.06.2024 10:18, Chen, Jiqian wrote:
>>>> I tried to get more debug information from my environment. And I attach 
>>>> them here, maybe you can find some problems.
>>>> acpi_parse_madt_ioapic_entries
>>>>acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, 
>>>> acpi_parse_int_src_ovr, MAX_IRQ_SOURCES);
>>>>acpi_parse_int_src_ovr
>>>>mp_override_legacy_irq
>>>>only process two entries, irq 0 gsi 2 and irq 9 
>>>> gsi 9
>>>> There are only two entries whose type is ACPI_MADT_TYPE_INTERRUPT_OVERRIDE 
>>>> in MADT table. Is it normal?
>>>
>>> Yes, that's what you'd typically see (or just one such entry).
>> Ok, let me conclude that acpi_parse_int_src_ovr get two entries from MADT 
>> table and add them into mp_irqs. They are [irq, gsi][0, 2] and [irq, gsi][9, 
>> 9].
>> Then in the following function mp_config_acpi_legacy_irqs initializes the 
>> 1:1 mapping of irq and gsi [0~15 except 2 and 9], and add them into mp_irqs.
>> But for high GSIs(>= 16), no mapping processing.
>> Right?
> 
> On that specific system of yours - yes. In the general case high GSIs
> may have entries, too.
> 
>> Is it that the Xen hypervisor lacks some handling of high GSIs?
> 
> I don't think so. Unless you can point out something?
Ok, so the implementation is still to get mapping from mp_irqs, I will change 
in next version.
Thank you.

> 
>> For now, if hypervisor gets a high GSIs, it can't be transformed to irq, 
>> because there is no mapping between them.
> 
> No, in the absence of a source override (note the word "override") the
> default identity mapping applies.
What is identity mapping? Like the mp_config_acpi_legacy_irqs does?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-05 Thread Chen, Jiqian

On 2024/6/5 01:17, Jan Beulich wrote:
> On 04.06.2024 10:18, Chen, Jiqian wrote:
>> I tried to get more debug information from my environment. And I attach them 
>> here, maybe you can find some problems.
>> acpi_parse_madt_ioapic_entries
>>  acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, 
>> acpi_parse_int_src_ovr, MAX_IRQ_SOURCES);
>>  acpi_parse_int_src_ovr
>>  mp_override_legacy_irq
>>  only process two entries, irq 0 gsi 2 and irq 9 
>> gsi 9
>> There are only two entries whose type is ACPI_MADT_TYPE_INTERRUPT_OVERRIDE 
>> in MADT table. Is it normal?
> 
> Yes, that's what you'd typically see (or just one such entry).
Ok, let me conclude that acpi_parse_int_src_ovr get two entries from MADT table 
and add them into mp_irqs. They are [irq, gsi][0, 2] and [irq, gsi][9, 9].
Then in the following function mp_config_acpi_legacy_irqs initializes the 1:1 
mapping of irq and gsi [0~15 except 2 and 9], and add them into mp_irqs.
But for high GSIs(>= 16), no mapping processing.
Right?

Is it that the Xen hypervisor lacks some handling of high GSIs?
For now, if hypervisor gets a high GSIs, it can't be transformed to irq, 
because there is no mapping between them.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-04 Thread Chen, Jiqian

On 2024/6/4 14:36, Jan Beulich wrote:
> On 04.06.2024 08:33, Chen, Jiqian wrote:
>> On 2024/6/4 14:12, Jan Beulich wrote:
>>> On 04.06.2024 08:01, Chen, Jiqian wrote:
>>>> On 2024/6/4 13:55, Jan Beulich wrote:
>>>>> On 04.06.2024 05:04, Chen, Jiqian wrote:
>>>>>> On 2024/5/30 23:51, Jan Beulich wrote:
>>>>>>> On 30.05.2024 13:19, Chen, Jiqian wrote:
>>>>>>>> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
>>>>>>>> Other gsi can be considered 1:1 mapping with irq? Or are there other 
>>>>>>>> places reflect the mapping between irq and gsi?
>>>>>>>
>>>>>>> It may be uncommon to have overrides for higher GSIs, but I don't think 
>>>>>>> ACPI
>>>>>>> disallows that.
>>>>>> Do you suggest me to add overrides for higher GSIs into array mp_irqs?
>>>>>
>>>>> Why "add"? That's what mp_override_legacy_irq() already does, isn't it?
>>>> No. mp_override_legacy_irq only overrides for gsi < 16, but not for gsi >= 
>>>> 16(I dump all mappings from array mp_irqs).
>>>
>>> I assume you mean you observe so ...
>> No, after starting xen pvh dom0, I dump all entries from mp_irqs.
> 
> IOW really your answer is "yes" ...
> 
>>>> In my environment, gsi of my dGPU is 24.
>>>
>>> ... on one specific system?
> 
> ... to this question I raised. Whatever you dump on any number of
> systems, there's always the chance that there's another system
> where things are different.
> 
>>> The function is invoked from
>>> acpi_parse_int_src_ovr(), and I can't spot any restriction to
>>> IRQs less than 16 there.
>> I didn't see any restriction too, but from the dump results, there are only 
>> 16 entries, see previous email. 
> 
> Hence why I tried to point out that going from observations on a
> particular system isn't enough.
Anyway, I agree with you that I need to get mapping from mp_irqs.
I tried to get more debug information from my environment. And I attach them 
here, maybe you can find some problems.
acpi_parse_madt_ioapic_entries
acpi_table_parse_madt(ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, 
acpi_parse_int_src_ovr, MAX_IRQ_SOURCES);
acpi_parse_int_src_ovr
mp_override_legacy_irq
only process two entries, irq 0 gsi 2 and irq 9 
gsi 9
There are only two entries whose type is ACPI_MADT_TYPE_INTERRUPT_OVERRIDE in 
MADT table. Is it normal?
And
acpi_parse_madt_ioapic_entries
mp_config_acpi_legacy_irqs
process the other GSIs(< 16), so that the total number of 
mp_irqs is 16.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-04 Thread Chen, Jiqian

On 2024/6/4 14:12, Jan Beulich wrote:
> On 04.06.2024 08:01, Chen, Jiqian wrote:
>> On 2024/6/4 13:55, Jan Beulich wrote:
>>> On 04.06.2024 05:04, Chen, Jiqian wrote:
>>>> On 2024/5/30 23:51, Jan Beulich wrote:
>>>>> On 30.05.2024 13:19, Chen, Jiqian wrote:
>>>>>> I dump all mpc_config_intsrc of array mp_irqs, it shows:
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 0 
>>>>>> dstapic 33 dstirq 2
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 15 srcbus 0 srcbusirq 9 
>>>>>> dstapic 33 dstirq 9
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 1 
>>>>>> dstapic 33 dstirq 1
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 3 
>>>>>> dstapic 33 dstirq 3
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 4 
>>>>>> dstapic 33 dstirq 4
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 5 
>>>>>> dstapic 33 dstirq 5
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 6 
>>>>>> dstapic 33 dstirq 6
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 7 
>>>>>> dstapic 33 dstirq 7
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 8 
>>>>>> dstapic 33 dstirq 8
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 10 
>>>>>> dstapic 33 dstirq 10
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 11 
>>>>>> dstapic 33 dstirq 11
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 12 
>>>>>> dstapic 33 dstirq 12
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 13 
>>>>>> dstapic 33 dstirq 13
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 14 
>>>>>> dstapic 33 dstirq 14
>>>>>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 15 
>>>>>> dstapic 33 dstirq 15
>>>>>>
>>>>>> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
>>>>>> Other gsi can be considered 1:1 mapping with irq? Or are there other 
>>>>>> places reflect the mapping between irq and gsi?
>>>>>
>>>>> It may be uncommon to have overrides for higher GSIs, but I don't think 
>>>>> ACPI
>>>>> disallows that.
>>>> Do you suggest me to add overrides for higher GSIs into array mp_irqs?
>>>
>>> Why "add"? That's what mp_override_legacy_irq() already does, isn't it?
>> No. mp_override_legacy_irq only overrides for gsi < 16, but not for gsi >= 
>> 16(I dump all mappings from array mp_irqs).
> 
> I assume you mean you observe so ...
No, after starting xen pvh dom0, I dump all entries from mp_irqs.

> 
>> In my environment, gsi of my dGPU is 24.
> 
> ... on one specific system? The function is invoked from
> acpi_parse_int_src_ovr(), and I can't spot any restriction to
> IRQs less than 16 there.
I didn't see any restriction too, but from the dump results, there are only 16 
entries, see previous email. 

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-04 Thread Chen, Jiqian

On 2024/6/4 13:55, Jan Beulich wrote:
> On 04.06.2024 05:04, Chen, Jiqian wrote:
>> On 2024/5/30 23:51, Jan Beulich wrote:
>>> On 30.05.2024 13:19, Chen, Jiqian wrote:
>>>> On 2024/5/29 20:22, Jan Beulich wrote:
>>>>> On 29.05.2024 13:13, Chen, Jiqian wrote:
>>>>>> On 2024/5/29 15:10, Jan Beulich wrote:
>>>>>>> On 29.05.2024 08:56, Chen, Jiqian wrote:
>>>>>>>> On 2024/5/29 14:31, Jan Beulich wrote:
>>>>>>>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>>>>>>>> But I found in function init_irq_data:
>>>>>>>>>> for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>>>>>>>> {
>>>>>>>>>> int rc;
>>>>>>>>>>
>>>>>>>>>> desc = irq_to_desc(irq);
>>>>>>>>>> desc->irq = irq;
>>>>>>>>>>
>>>>>>>>>> rc = init_one_irq_desc(desc);
>>>>>>>>>> if ( rc )
>>>>>>>>>> return rc;
>>>>>>>>>> }
>>>>>>>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 
>>>>>>>>>> mapping?
>>>>>>>>>
>>>>>>>>> No, as explained before. I also don't see how you would derive that 
>>>>>>>>> from the code above.
>>>>>>>> Because here set desc->irq = irq, and it seems there is no other place 
>>>>>>>> to change this desc->irq, so, gsi 1 is considered to irq 1.
>>>>>>>
>>>>>>> What are you taking this from? The loop bound isn't nr_gsis, and the 
>>>>>>> iteration
>>>>>>> variable isn't in GSI space either; it's in IRQ numbering space. In 
>>>>>>> this loop
>>>>>>> we're merely leveraging that every GSI has a corresponding IRQ;
>>>>>>> there are no assumptions made about the mapping between the two. Afaics 
>>>>>>> at least.
>>>>>>>
>>>>>>>>> "nr_irqs_gsi" describes what its name says: The number of
>>>>>>>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI 
>>>>>>>>> (i.e.
>>>>>>>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>>>>>>>> mapping.
>>>>>>>>>
>>>>>>>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls 
>>>>>>>>>> mp_register_gsi,
>>>>>>>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get 
>>>>>>>>>> irq_desc directly.
>>>>>>>>>
>>>>>>>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>>>>>>>> practice (for reasons that would need working out).
>>>>>>>>>
>>>>>>>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi 
>>>>>>>>>> ?
>>>>>>>>>
>>>>>>>>> Again - no.
>>>>>>>> Since you are certain that they are not equal, could you tell me where 
>>>>>>>> show they are not equal or where build their mappings,
>>>>>>>> so that I can know how to do a conversion gsi from irq.
>>>>>>>
>>>>>>> I did point you at the ACPI Interrupt Source Override structure before.
>>>>>>> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
>>>>>>> start going from.
>>>>>> Oh! I think I know.
>>>>>> If I want to transform gsi to irq, I need to do below:
>>>>>>  int irq, entry, ioapic, pin;
>>>>>>
>>>>>>  ioapic = mp_find_ioapic(gsi);
>>>>>>  pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
>>>>>>  entry = find_irq_entry(ioapic, pin, mp_INT);
>>>>>>  irq = pin_2_irq(entry, ioapic, pin);
>>>>>>
>>>>>> Am I right?
>>>>>
>>>>> This looks plausible, yes.
>>>> I dump all mpc_config_intsrc of array mp_ir

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-06-03 Thread Chen, Jiqian

On 2024/5/30 23:51, Jan Beulich wrote:
> On 30.05.2024 13:19, Chen, Jiqian wrote:
>> On 2024/5/29 20:22, Jan Beulich wrote:
>>> On 29.05.2024 13:13, Chen, Jiqian wrote:
>>>> On 2024/5/29 15:10, Jan Beulich wrote:
>>>>> On 29.05.2024 08:56, Chen, Jiqian wrote:
>>>>>> On 2024/5/29 14:31, Jan Beulich wrote:
>>>>>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>>>>>> But I found in function init_irq_data:
>>>>>>>> for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>>>>>> {
>>>>>>>> int rc;
>>>>>>>>
>>>>>>>> desc = irq_to_desc(irq);
>>>>>>>> desc->irq = irq;
>>>>>>>>
>>>>>>>> rc = init_one_irq_desc(desc);
>>>>>>>> if ( rc )
>>>>>>>> return rc;
>>>>>>>> }
>>>>>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 
>>>>>>>> mapping?
>>>>>>>
>>>>>>> No, as explained before. I also don't see how you would derive that 
>>>>>>> from the code above.
>>>>>> Because here set desc->irq = irq, and it seems there is no other place 
>>>>>> to change this desc->irq, so, gsi 1 is considered to irq 1.
>>>>>
>>>>> What are you taking this from? The loop bound isn't nr_gsis, and the 
>>>>> iteration
>>>>> variable isn't in GSI space either; it's in IRQ numbering space. In this 
>>>>> loop
>>>>> we're merely leveraging that every GSI has a corresponding IRQ;
>>>>> there are no assumptions made about the mapping between the two. Afaics 
>>>>> at least.
>>>>>
>>>>>>> "nr_irqs_gsi" describes what its name says: The number of
>>>>>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI 
>>>>>>> (i.e.
>>>>>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>>>>>> mapping.
>>>>>>>
>>>>>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>>>>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get 
>>>>>>>> irq_desc directly.
>>>>>>>
>>>>>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>>>>>> practice (for reasons that would need working out).
>>>>>>>
>>>>>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>>>>>>
>>>>>>> Again - no.
>>>>>> Since you are certain that they are not equal, could you tell me where 
>>>>>> show they are not equal or where build their mappings,
>>>>>> so that I can know how to do a conversion gsi from irq.
>>>>>
>>>>> I did point you at the ACPI Interrupt Source Override structure before.
>>>>> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
>>>>> start going from.
>>>> Oh! I think I know.
>>>> If I want to transform gsi to irq, I need to do below:
>>>>int irq, entry, ioapic, pin;
>>>>
>>>>ioapic = mp_find_ioapic(gsi);
>>>>pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
>>>>entry = find_irq_entry(ioapic, pin, mp_INT);
>>>>irq = pin_2_irq(entry, ioapic, pin);
>>>>
>>>> Am I right?
>>>
>>> This looks plausible, yes.
>> I dump all mpc_config_intsrc of array mp_irqs, it shows:
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 0 dstapic 
>> 33 dstirq 2
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 15 srcbus 0 srcbusirq 9 
>> dstapic 33 dstirq 9
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 1 dstapic 
>> 33 dstirq 1
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 3 dstapic 
>> 33 dstirq 3
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 4 dstapic 
>> 33 dstirq 4
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 5 dstapic 
>> 33 dstirq 5
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 6 dstapic 
>> 33 dstirq 6
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 7 dstapic 
>> 33 dstirq 7
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 8 dstapic 
>> 33 dstirq 8
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 10 
>> dstapic 33 dstirq 10
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 11 
>> dstapic 33 dstirq 11
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 12 
>> dstapic 33 dstirq 12
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 13 
>> dstapic 33 dstirq 13
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 14 
>> dstapic 33 dstirq 14
>> (XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 15 
>> dstapic 33 dstirq 15
>>
>> It seems only Legacy irq and gsi[0:15] has a mapping in mp_irqs.
>> Other gsi can be considered 1:1 mapping with irq? Or are there other places 
>> reflect the mapping between irq and gsi?
> 
> It may be uncommon to have overrides for higher GSIs, but I don't think ACPI
> disallows that.
Do you suggest me to add overrides for higher GSIs into array mp_irqs?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-05-30 Thread Chen, Jiqian

On 2024/5/29 20:22, Jan Beulich wrote:
> On 29.05.2024 13:13, Chen, Jiqian wrote:
>> On 2024/5/29 15:10, Jan Beulich wrote:
>>> On 29.05.2024 08:56, Chen, Jiqian wrote:
>>>> On 2024/5/29 14:31, Jan Beulich wrote:
>>>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>>>> But I found in function init_irq_data:
>>>>>> for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>>>> {
>>>>>> int rc;
>>>>>>
>>>>>> desc = irq_to_desc(irq);
>>>>>> desc->irq = irq;
>>>>>>
>>>>>> rc = init_one_irq_desc(desc);
>>>>>> if ( rc )
>>>>>> return rc;
>>>>>> }
>>>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 
>>>>>> mapping?
>>>>>
>>>>> No, as explained before. I also don't see how you would derive that from 
>>>>> the code above.
>>>> Because here set desc->irq = irq, and it seems there is no other place to 
>>>> change this desc->irq, so, gsi 1 is considered to irq 1.
>>>
>>> What are you taking this from? The loop bound isn't nr_gsis, and the 
>>> iteration
>>> variable isn't in GSI space either; it's in IRQ numbering space. In this 
>>> loop
>>> we're merely leveraging that every GSI has a corresponding IRQ;
>>> there are no assumptions made about the mapping between the two. Afaics at 
>>> least.
>>>
>>>>> "nr_irqs_gsi" describes what its name says: The number of
>>>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
>>>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>>>> mapping.
>>>>>
>>>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get 
>>>>>> irq_desc directly.
>>>>>
>>>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>>>> practice (for reasons that would need working out).
>>>>>
>>>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>>>>
>>>>> Again - no.
>>>> Since you are certain that they are not equal, could you tell me where 
>>>> show they are not equal or where build their mappings,
>>>> so that I can know how to do a conversion gsi from irq.
>>>
>>> I did point you at the ACPI Interrupt Source Override structure before.
>>> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
>>> start going from.
>> Oh! I think I know.
>> If I want to transform gsi to irq, I need to do below:
>>  int irq, entry, ioapic, pin;
>>
>>  ioapic = mp_find_ioapic(gsi);
>>  pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
>>  entry = find_irq_entry(ioapic, pin, mp_INT);
>>  irq = pin_2_irq(entry, ioapic, pin);
>>
>> Am I right?
> 
> This looks plausible, yes.
I dump all mpc_config_intsrc of array mp_irqs, it shows:
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 0 dstapic 33 
dstirq 2
(XEN) find_irq_entry type 3 irqtype 0 irqflag 15 srcbus 0 srcbusirq 9 dstapic 
33 dstirq 9
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 1 dstapic 33 
dstirq 1
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 3 dstapic 33 
dstirq 3
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 4 dstapic 33 
dstirq 4
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 5 dstapic 33 
dstirq 5
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 6 dstapic 33 
dstirq 6
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 7 dstapic 33 
dstirq 7
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 8 dstapic 33 
dstirq 8
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 10 dstapic 
33 dstirq 10
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 11 dstapic 
33 dstirq 11
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 12 dstapic 
33 dstirq 12
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 13 dstapic 
33 dstirq 13
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 14 dstapic 
33 dstirq 14
(XEN) find_irq_entry type 3 irqtype 0 irqflag 0 srcbus 0 srcbusirq 15 dstapic 
33 dstirq 15

It seems

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-05-29 Thread Chen, Jiqian

On 2024/5/29 15:10, Jan Beulich wrote:
> On 29.05.2024 08:56, Chen, Jiqian wrote:
>> On 2024/5/29 14:31, Jan Beulich wrote:
>>> On 29.05.2024 04:41, Chen, Jiqian wrote:
>>>> Hi,
>>>> On 2024/5/17 19:50, Jan Beulich wrote:
>>>>> On 17.05.2024 13:14, Chen, Jiqian wrote:
>>>>>> On 2024/5/17 18:51, Jan Beulich wrote:
>>>>>>> On 17.05.2024 12:45, Chen, Jiqian wrote:
>>>>>>>> On 2024/5/16 22:01, Jan Beulich wrote:
>>>>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>>>> +if ( gsi >= nr_irqs_gsi )
>>>>>>>>>> +{
>>>>>>>>>> +ret = -EINVAL;
>>>>>>>>>> +break;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +if ( !irq_access_permitted(current->domain, gsi) ||
>>>>>>>>>
>>>>>>>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number 
>>>>>>>>> of
>>>>>>>>> source overrides may be surfaced by ACPI?
>>>>>>>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
>>>>>>>
>>>>>>> They are, but there's not necessarily a 1:1 mapping.
>>>>>> Oh, so do I need to add a new gsi_caps to store granted gsi?
>>>>>
>>>>> Probably not. You ought to be able to translate between GSI and IRQ,
>>>>> and then continue to record in / check against IRQ permissions.
>>>> But I found in function init_irq_data:
>>>> for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>>>> {
>>>> int rc;
>>>>
>>>> desc = irq_to_desc(irq);
>>>> desc->irq = irq;
>>>>
>>>> rc = init_one_irq_desc(desc);
>>>> if ( rc )
>>>> return rc;
>>>> }
>>>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
>>>
>>> No, as explained before. I also don't see how you would derive that from 
>>> the code above.
>> Because here set desc->irq = irq, and it seems there is no other place to 
>> change this desc->irq, so, gsi 1 is considered to irq 1.
> 
> What are you taking this from? The loop bound isn't nr_gsis, and the iteration
> variable isn't in GSI space either; it's in IRQ numbering space. In this loop
> we're merely leveraging that every GSI has a corresponding IRQ;
> there are no assumptions made about the mapping between the two. Afaics at 
> least.
> 
>>> "nr_irqs_gsi" describes what its name says: The number of
>>> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
>>> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
>>> mapping.
>>>
>>>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>>>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get 
>>>> irq_desc directly.
>>>
>>> Which may be wrong, while that wrong-ness may not have hit anyone in
>>> practice (for reasons that would need working out).
>>>
>>>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
>>>
>>> Again - no.
>> Since you are certain that they are not equal, could you tell me where show 
>> they are not equal or where build their mappings,
>> so that I can know how to do a conversion gsi from irq.
> 
> I did point you at the ACPI Interrupt Source Override structure before.
> We're parsing those in acpi_parse_int_src_ovr(), to give you a place to
> start going from.
Oh! I think I know.
If I want to transform gsi to irq, I need to do below:
int irq, entry, ioapic, pin;

ioapic = mp_find_ioapic(gsi);
pin = gsi - mp_ioapic_routing[ioapic].gsi_base;
entry = find_irq_entry(ioapic, pin, mp_INT);
irq = pin_2_irq(entry, ioapic, pin);

Am I right?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-05-29 Thread Chen, Jiqian

On 2024/5/29 14:31, Jan Beulich wrote:
> On 29.05.2024 04:41, Chen, Jiqian wrote:
>> Hi,
>> On 2024/5/17 19:50, Jan Beulich wrote:
>>> On 17.05.2024 13:14, Chen, Jiqian wrote:
>>>> On 2024/5/17 18:51, Jan Beulich wrote:
>>>>> On 17.05.2024 12:45, Chen, Jiqian wrote:
>>>>>> On 2024/5/16 22:01, Jan Beulich wrote:
>>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>> +if ( gsi >= nr_irqs_gsi )
>>>>>>>> +{
>>>>>>>> +ret = -EINVAL;
>>>>>>>> +break;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +if ( !irq_access_permitted(current->domain, gsi) ||
>>>>>>>
>>>>>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>>>>>>> source overrides may be surfaced by ACPI?
>>>>>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
>>>>>
>>>>> They are, but there's not necessarily a 1:1 mapping.
>>>> Oh, so do I need to add a new gsi_caps to store granted gsi?
>>>
>>> Probably not. You ought to be able to translate between GSI and IRQ,
>>> and then continue to record in / check against IRQ permissions.
>> But I found in function init_irq_data:
>> for ( irq = 0; irq < nr_irqs_gsi; irq++ )
>> {
>> int rc;
>>
>> desc = irq_to_desc(irq);
>> desc->irq = irq;
>>
>> rc = init_one_irq_desc(desc);
>> if ( rc )
>> return rc;
>> }
>> Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
> 
> No, as explained before. I also don't see how you would derive that from the 
> code above.
Because here set desc->irq = irq, and it seems there is no other place to 
change this desc->irq, so, gsi 1 is considered to irq 1.

> "nr_irqs_gsi" describes what its name says: The number of
> IRQs mapping to a (_some_) GSI. That's to tell them from the non-GSI (i.e.
> mainly MSI) ones. There's no implication whatsoever on the IRQ <-> GSI
> mapping.
> 
>> What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
>> and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc 
>> directly.
> 
> Which may be wrong, while that wrong-ness may not have hit anyone in
> practice (for reasons that would need working out).
> 
>> Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
> 
> Again - no.
Since you are certain that they are not equal, could you tell me where show 
they are not equal or where build their mappings,
so that I can know how to do a conversion gsi from irq.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-05-28 Thread Chen, Jiqian

Hi,
On 2024/5/17 19:50, Jan Beulich wrote:
> On 17.05.2024 13:14, Chen, Jiqian wrote:
>> On 2024/5/17 18:51, Jan Beulich wrote:
>>> On 17.05.2024 12:45, Chen, Jiqian wrote:
>>>> On 2024/5/16 22:01, Jan Beulich wrote:
>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>> +if ( gsi >= nr_irqs_gsi )
>>>>>> +{
>>>>>> +ret = -EINVAL;
>>>>>> +break;
>>>>>> +}
>>>>>> +
>>>>>> +if ( !irq_access_permitted(current->domain, gsi) ||
>>>>>
>>>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>>>>> source overrides may be surfaced by ACPI?
>>>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
>>>
>>> They are, but there's not necessarily a 1:1 mapping.
>> Oh, so do I need to add a new gsi_caps to store granted gsi?
> 
> Probably not. You ought to be able to translate between GSI and IRQ,
> and then continue to record in / check against IRQ permissions.
But I found in function init_irq_data:
for ( irq = 0; irq < nr_irqs_gsi; irq++ )
{
int rc;

desc = irq_to_desc(irq);
desc->irq = irq;

rc = init_one_irq_desc(desc);
if ( rc )
return rc;
}
Does it mean that when irq < nr_irqs_gsi, the gsi and irq is a 1:1 mapping?
What's more, when using PHYSDEVOP_setup_gsi, it calls mp_register_gsi,
and in mp_register_gsi, it uses " desc = irq_to_desc(gsi); " to get irq_desc 
directly.

Combining above, can we consider "gsi == irq" when irq < nr_irqs_gsi ?
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-05-17 Thread Chen, Jiqian

On 2024/5/17 18:51, Jan Beulich wrote:
> On 17.05.2024 12:45, Chen, Jiqian wrote:
>> On 2024/5/16 22:01, Jan Beulich wrote:
>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>> +if ( gsi >= nr_irqs_gsi )
>>>> +{
>>>> +ret = -EINVAL;
>>>> +break;
>>>> +}
>>>> +
>>>> +if ( !irq_access_permitted(current->domain, gsi) ||
>>>
>>> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
>>> source overrides may be surfaced by ACPI?
>> All irqs smaller than nr_irqs_gsi are gsi, aren't they?
> 
> They are, but there's not necessarily a 1:1 mapping.
Oh, so do I need to add a new gsi_caps to store granted gsi?
And Dom0 defaults to own all gsis permission?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device

2024-05-17 Thread Chen, Jiqian

On 2024/5/17 18:31, Jan Beulich wrote:
> On 17.05.2024 12:00, Chen, Jiqian wrote:
>> On 2024/5/17 17:50, Jan Beulich wrote:
>>> On 17.05.2024 11:28, Chen, Jiqian wrote:
>>>> On 2024/5/17 16:20, Jan Beulich wrote:
>>>>> On 17.05.2024 10:08, Chen, Jiqian wrote:
>>>>>> On 2024/5/16 21:08, Jan Beulich wrote:
>>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>>  struct physdev_pci_device {
>>>>>>>>  /* IN */
>>>>>>>>  uint16_t seg;
>>>>>>>
>>>>>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>>>>>> possible resets equal, and hence it doesn't need specifying what kind of
>>>>>>> reset was done? For example, other than FLR most reset variants reset 
>>>>>>> all
>>>>>>> functions in one go aiui. Imo that would better require only a single
>>>>>>> hypercall, just to avoid possible confusion. It also reads as if FLR 
>>>>>>> would
>>>>>>> not reset as many registers as other reset variants would.
>>>>>> If I understood correctly that you mean in this hypercall it needs to 
>>>>>> support resetting both one function and all functions of a slot(dev)?
>>>>>> But it can be done for caller to use a cycle to call this reset 
>>>>>> hypercall for each slot function.
>>>>>
>>>>> It could, yes, but since (aiui) there needs to be an indication of the
>>>>> kind of reset anyway, we can as well avoid relying on the caller doing
>>>>> so (and at the same time simplify what the caller needs to do).
>>>> Since the corresponding kernel patch has been merged into linux_next branch
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515=b272722511d5e8ae580f01830687b8a6b2717f01,
>>>> if it's not very mandatory and necessary, just let the caller handle it 
>>>> temporarily.
>>>
>>> As also mentioned for the other patch having a corresponding kernel one:
>>> The kernel patch would imo better not be merged until the new sub-op is
>>> actually finalized.
>> OK, what should I do next step?
>> Upstream a patch to revert the merged patch on kernel side?
>>
>>>
>>>> Or it can add a new hypercall to reset all functions in one go in future 
>>>> potential requirement, like PHYSDEVOP_pci_device_state_reset_all_func.
>>>
>>> I disagree. We shouldn't introduce incomplete sub-ops. At the very least,
>>> if you want to stick to the present form, I'd expect you to supply reasons
>>> why distinguishing different reset forms is not necessary (now or later).
>> OK, if want to distinguish different reset, is it acceptable to add a 
>> parameter, like "u8 flag", and reset every function if corresponding bit is 
>> 1?
> 
> I'm afraid a boolean won't do, at least not long term. I think it wants to
> be an enumeration (i.e. a set of enumeration-like #define-s). And just to
> stress it again: The extra argument is _not_ primarily for the looping over
> all functions. It is to convey the kind of reset that was done. The single
> vs all function(s) aspect is just a useful side effect this will have.
Do you mean, like:
enum RESET_DEVICE_STATE {
RESET_DEVICE_SINGLE_FUNC,
RESET_DEVICE_ALL_FUNC,
Others
};
If caller pass in RESET_DEVICE_SINGLE_FUNC, I call what I add in this patch, as 
for other types call other functions if added in future?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v8 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-05-17 Thread Chen, Jiqian

On 2024/5/16 22:01, Jan Beulich wrote:
> On 16.05.2024 11:52, Jiqian Chen wrote:
>> Some type of domain don't have PIRQ, like PVH, when
>> passthrough a device to guest on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will failed
>> at domain_pirq_to_irq.
>>
>> So, add a new hypercall to grant/revoke gsi permission
>> when dom0 is not PV or dom0 has not PIRQ flag.
> 
> Honestly I find this hard to follow, and thus not really making clear why
> no other existing mechanism could be used.
Sorry, I will describe more clearly in next version.

> 
>> Signed-off-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
> 
> Below here in an RFC patch you typically would want to put specific items
> you're seeking feedback on. Without that it's hard to tell why this is
> marked RFC.
OK, I will add " RFC: wait for the third patch on kernel side is accepted." in 
next version.

> 
>> --- a/xen/arch/x86/domctl.c
>> +++ b/xen/arch/x86/domctl.c
>> @@ -237,6 +237,37 @@ long arch_do_domctl(
>>  break;
>>  }
>>  
>> +case XEN_DOMCTL_gsi_permission:
>> +{
>> +unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +int allow = domctl->u.gsi_permission.allow_access;
> 
> bool?
Will change.

> 
>> +if ( is_pv_domain(current->domain) || has_pirq(current->domain) )
>> +{
>> +ret = -EOPNOTSUPP;
>> +break;
>> +}
> 
> Such a restriction imo wants explaining in a comment.
Will add in next version.

> 
>> +if ( gsi >= nr_irqs_gsi )
>> +{
>> +ret = -EINVAL;
>> +break;
>> +}
>> +
>> +if ( !irq_access_permitted(current->domain, gsi) ||
> 
> I.e. assuming IRQ == GSI? Is that a valid assumption when any number of
> source overrides may be surfaced by ACPI?
All irqs smaller than nr_irqs_gsi are gsi, aren't they?

> 
>> + xsm_irq_permission(XSM_HOOK, d, gsi, allow) )
> 
> Here I'm pretty sure you can't very well re-use an existing hook, as the
> value of interest is in a different numbering space, and a possible hook
> function has no way of knowing which one it is. Daniel?
> 
>> +{
>> +ret = -EPERM;
>> +break;
>> +}
>> +
>> +if ( allow )
>> +ret = irq_permit_access(d, gsi);
>> +else
>> +ret = irq_deny_access(d, gsi);
> 
> As above I'm afraid you can't assume IRQ == GSI.
> 
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -447,6 +447,13 @@ struct xen_domctl_irq_permission {
>>  };
>>  
>>  
>> +/* XEN_DOMCTL_gsi_permission */
>> +struct xen_domctl_gsi_permission {
>> +uint32_t gsi;
>> +uint8_t allow_access;/* flag to specify enable/disable of x86 gsi 
>> access */
>> +};
> 
> Explicit padding please, including a check that it's zero on input.
Thanks, I will add in next version.

> 
>> +
>> +
>>  /* XEN_DOMCTL_iomem_permission */
> 
> No double blank lines please. In fact you will want to break the double blank
> lines in leading context, inserting in the middle.
Will remove one.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device

2024-05-17 Thread Chen, Jiqian

Hi Juergen:

On 2024/5/17 18:03, Jürgen Groß wrote:
> On 17.05.24 11:50, Jan Beulich wrote:
>> On 17.05.2024 11:28, Chen, Jiqian wrote:
>>> On 2024/5/17 16:20, Jan Beulich wrote:
>>>> On 17.05.2024 10:08, Chen, Jiqian wrote:
>>>>> On 2024/5/16 21:08, Jan Beulich wrote:
>>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>>   struct physdev_pci_device {
>>>>>>>   /* IN */
>>>>>>>   uint16_t seg;
>>>>>>
>>>>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>>>>> possible resets equal, and hence it doesn't need specifying what kind of
>>>>>> reset was done? For example, other than FLR most reset variants reset all
>>>>>> functions in one go aiui. Imo that would better require only a single
>>>>>> hypercall, just to avoid possible confusion. It also reads as if FLR 
>>>>>> would
>>>>>> not reset as many registers as other reset variants would.
>>>>> If I understood correctly that you mean in this hypercall it needs to 
>>>>> support resetting both one function and all functions of a slot(dev)?
>>>>> But it can be done for caller to use a cycle to call this reset hypercall 
>>>>> for each slot function.
>>>>
>>>> It could, yes, but since (aiui) there needs to be an indication of the
>>>> kind of reset anyway, we can as well avoid relying on the caller doing
>>>> so (and at the same time simplify what the caller needs to do).
>>> Since the corresponding kernel patch has been merged into linux_next branch
>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515=b272722511d5e8ae580f01830687b8a6b2717f01,
>>> if it's not very mandatory and necessary, just let the caller handle it 
>>> temporarily.
>>
>> As also mentioned for the other patch having a corresponding kernel one:
>> The kernel patch would imo better not be merged until the new sub-op is
>> actually finalized.
> 
> Oh, sorry to have overlooked that the interfcae change isn't yet committed on
> Xen side.
> 
> I'll drop the patch from my linux-next branch.
Thanks. I will modify my code according to Jan's requirements and send a new 
version soon.

> 
> 
> Juergen
> 

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device

2024-05-17 Thread Chen, Jiqian

On 2024/5/17 17:50, Jan Beulich wrote:
> On 17.05.2024 11:28, Chen, Jiqian wrote:
>> On 2024/5/17 16:20, Jan Beulich wrote:
>>> On 17.05.2024 10:08, Chen, Jiqian wrote:
>>>> On 2024/5/16 21:08, Jan Beulich wrote:
>>>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>>>  struct physdev_pci_device {
>>>>>>  /* IN */
>>>>>>  uint16_t seg;
>>>>>
>>>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>>>> possible resets equal, and hence it doesn't need specifying what kind of
>>>>> reset was done? For example, other than FLR most reset variants reset all
>>>>> functions in one go aiui. Imo that would better require only a single
>>>>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>>>>> not reset as many registers as other reset variants would.
>>>> If I understood correctly that you mean in this hypercall it needs to 
>>>> support resetting both one function and all functions of a slot(dev)?
>>>> But it can be done for caller to use a cycle to call this reset hypercall 
>>>> for each slot function.
>>>
>>> It could, yes, but since (aiui) there needs to be an indication of the
>>> kind of reset anyway, we can as well avoid relying on the caller doing
>>> so (and at the same time simplify what the caller needs to do).
>> Since the corresponding kernel patch has been merged into linux_next branch
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515=b272722511d5e8ae580f01830687b8a6b2717f01,
>> if it's not very mandatory and necessary, just let the caller handle it 
>> temporarily.
> 
> As also mentioned for the other patch having a corresponding kernel one:
> The kernel patch would imo better not be merged until the new sub-op is
> actually finalized.
OK, what should I do next step?
Upstream a patch to revert the merged patch on kernel side?

> 
>> Or it can add a new hypercall to reset all functions in one go in future 
>> potential requirement, like PHYSDEVOP_pci_device_state_reset_all_func.
> 
> I disagree. We shouldn't introduce incomplete sub-ops. At the very least,
> if you want to stick to the present form, I'd expect you to supply reasons
> why distinguishing different reset forms is not necessary (now or later).
OK, if want to distinguish different reset, is it acceptable to add a 
parameter, like "u8 flag", and reset every function if corresponding bit is 1?

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device

2024-05-17 Thread Chen, Jiqian

On 2024/5/17 16:20, Jan Beulich wrote:
> On 17.05.2024 10:08, Chen, Jiqian wrote:
>> On 2024/5/16 21:08, Jan Beulich wrote:
>>> On 16.05.2024 11:52, Jiqian Chen wrote:
>>>>  struct physdev_pci_device {
>>>>  /* IN */
>>>>  uint16_t seg;
>>>
>>> Is re-using this struct for this new sub-op sufficient? IOW are all
>>> possible resets equal, and hence it doesn't need specifying what kind of
>>> reset was done? For example, other than FLR most reset variants reset all
>>> functions in one go aiui. Imo that would better require only a single
>>> hypercall, just to avoid possible confusion. It also reads as if FLR would
>>> not reset as many registers as other reset variants would.
>> If I understood correctly that you mean in this hypercall it needs to 
>> support resetting both one function and all functions of a slot(dev)?
>> But it can be done for caller to use a cycle to call this reset hypercall 
>> for each slot function.
> 
> It could, yes, but since (aiui) there needs to be an indication of the
> kind of reset anyway, we can as well avoid relying on the caller doing
> so (and at the same time simplify what the caller needs to do).
Since the corresponding kernel patch has been merged into linux_next branch
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?h=next-20240515=b272722511d5e8ae580f01830687b8a6b2717f01,
if it's not very mandatory and necessary, just let the caller handle it 
temporarily.
Or it can add a new hypercall to reset all functions in one go in future 
potential requirement, like PHYSDEVOP_pci_device_state_reset_all_func.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v8 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-05-17 Thread Chen, Jiqian

On 2024/5/16 21:49, Jan Beulich wrote:
> On 16.05.2024 11:52, Jiqian Chen wrote:
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -76,6 +76,11 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
>> arg)
>>  case PHYSDEVOP_unmap_pirq:
>>  break;
>>  
>> +case PHYSDEVOP_setup_gsi:
>> +if ( !is_hardware_domain(currd) )
>> +return -EOPNOTSUPP;
>> +break;
>> +
>>  case PHYSDEVOP_eoi:
>>  case PHYSDEVOP_irq_status_query:
>>  case PHYSDEVOP_get_free_pirq:
> 
> Below here we have a hardware-domain-only block already. Any reason not
> to add to there? Yes, that uses -ENOSYS. Imo that wants changing anyway,
> but even without that to me it would seem more consistent overall to have
> the new case simply added there.
Ah yes, I remembered you suggest me to use EOPNOTSUPP in v4, if change to 
ENOSYS is also fine, I will move to below in next version.

> 
> Just to double check: Is there a respective Linux patch already (if so,
> cross-linking the patches may be helpful)?
Yes, my corresponding kernel patch:
https://lore.kernel.org/lkml/20240515065011.13797-1-jiqian.c...@amd.com/T/#mc56b111562d7c67886da905e306f12b3ef8076b4
 
Do you mean I need to add this link into the commit message once the kernel 
patch is accepted?
> Or does PVH Linux invoke the sub-op already anyway, just that so far it 
> fails? 
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v8 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-05-17 Thread Chen, Jiqian

On 2024/5/16 21:29, Jan Beulich wrote:
> On 16.05.2024 11:52, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see
>> xen_pt_realize->xc_physdev_map_pirq and
>> pci_add_dm_done->xc_physdev_map_pirq.
> 
> xen_pt_realize() is in qemu, which imo wants saying here (for being a 
> different
> repo), the more that pci_add_dm_done() is in libxl.
OK, I will describe more here(in qemu and in libxl).

> 
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
>> arg)
>>  {
>>  case PHYSDEVOP_map_pirq:
>>  case PHYSDEVOP_unmap_pirq:
>> +break;
> 
> I think this could do with a comment as to why it's permitted as well as 
> giving
> a reference to where further restrictions are enforced (or simply mentioning
> the constraint of this only being permitted for management of other domains).
Thanks, will add
/* 
  * Only being permitted for management of other domains.
  * Further restrictions are enforced in do_physdev_op.
  */

> 
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -305,11 +305,23 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  case PHYSDEVOP_map_pirq: {
>>  physdev_map_pirq_t map;
>>  struct msi_info msi;
>> +struct domain *d;
>>  
>>  ret = -EFAULT;
>>  if ( copy_from_guest(, arg, 1) != 0 )
>>  break;
>>  
>> +d = rcu_lock_domain_by_any_id(map.domid);
>> +if ( d == NULL )
>> +return -ESRCH;
>> +/* If caller is the same HVM guest as current, check pirq flag */
> 
> The caller is always current. What I think you mean is "caller is same as
> the subject domain". 
Yes, I want to prevent self-map when subject domain(domU) doesn't have 
X86_EMU_USE_PIRQ flag.

> I'm also having trouble with seeing the usefulness of saying "check pirq 
> flag". Instead I think you want to state the
> restriction here that you actually mean to enforce (which would also mean
> mentioning PVH in some way, to distinguish from the "normal HVM" case).
Yes, PVH and the HVM without X86_EMU_USE_PIRQ flag,
If a HVM has X86_EMU_USE_PIRQ flag, map_pirq should be permitted.

I will change this comment to be:
/* Prevent self-map when domain has no X86_EMU_USE_PIRQ flag */

> 
>> +if ( !is_pv_domain(d) && !has_pirq(d) && map.domid == DOMID_SELF )
> 
> You exclude DOMID_SELF but not the domain's ID? Why not simply check d
> being current->domain, thus covering both cases? 
> Plus you could use rcu_lock_domain_by_id() to exclude DOMID_SELF, and you 
> could use
> rcu_lock_remote_domain_by_id() to exclude the local domain altogether.
But there is a case that hvm hold PIRQ flag and DOMID_SELF id will do this 
pirq_map, see code
physdev_map_pirq.
I think change to check d being current->domain is more suitable.

> Finally I'm not even sure you need the RCU lock here (else you could
> use knownalive_domain_from_domid()). But perhaps that's better to cover
> the qemu-in-stubdom case, which we have to consider potentially malicious.
Yes, for potential safety reasons, let's keep the RCU lock.

> 
> I'm also inclined to suggest to use is_hvm_domain() here in favor of
> !is_pv_domain().
OK, will change to is_hvm_domain in next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [XEN PATCH v8 1/5] xen/vpci: Clear all vpci status of device

2024-05-17 Thread Chen, Jiqian

On 2024/5/16 21:08, Jan Beulich wrote:
> On 16.05.2024 11:52, Jiqian Chen wrote:
>> @@ -67,6 +68,41 @@ ret_t pci_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  break;
>>  }
>>  
>> +case PHYSDEVOP_pci_device_state_reset: {
>> +struct physdev_pci_device dev;
>> +struct pci_dev *pdev;
>> +pci_sbdf_t sbdf;
>> +
>> +if ( !is_pci_passthrough_enabled() )
>> +return -EOPNOTSUPP;
>> +
>> +ret = -EFAULT;
>> +if ( copy_from_guest(, arg, 1) != 0 )
>> +break;
>> +sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
>> +
>> +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>> +if ( ret )
>> +break;
> 
> Daniel, is re-use of this hook appropriate here?
In the v2 of this series, Daniel and Roger have agreed that reusing 
xsm_resource_setup_pci is enough.

> 
>> +pcidevs_lock();
>> +pdev = pci_get_pdev(NULL, sbdf);
>> +if ( !pdev )
>> +{
>> +pcidevs_unlock();
>> +ret = -ENODEV;
>> +break;
>> +}
>> +
>> +write_lock(>domain->pci_lock);
>> +ret = vpci_reset_device_state(pdev);
>> +write_unlock(>domain->pci_lock);
>> +pcidevs_unlock();
> 
> Can't this be done right after the write_lock()?
You are right, I will move it in next version.

> 
>> +if ( ret )
>> +printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
>> );
> 
> s/PCI/vPCI/ (at least as long as nothing else is done here)
OK, will change in next version.

> 
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -115,6 +115,16 @@ int vpci_assign_device(struct pci_dev *pdev)
>>  
>>  return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev)
>> +{
>> +ASSERT(pcidevs_locked());
> 
> Is this necessary for ...
> 
>> +ASSERT(rw_is_write_locked(>domain->pci_lock));
>> +
>> +vpci_deassign_device(pdev);
>> +return vpci_assign_device(pdev);
> 
> ... either of these calls? Both functions do themselves only have the
> 2nd of the asserts you add.
I checked codes again; I will remove the two asserts here in next version.

> 
>> --- a/xen/include/public/physdev.h
>> +++ b/xen/include/public/physdev.h
>> @@ -296,6 +296,13 @@ DEFINE_XEN_GUEST_HANDLE(physdev_pci_device_add_t);
>>   */
>>  #define PHYSDEVOP_prepare_msix  30
>>  #define PHYSDEVOP_release_msix  31
>> +/*
>> + * Notify the hypervisor that a PCI device has been reset, so that any
>> + * internally cached state is regenerated.  Should be called after any
>> + * device reset performed by the hardware domain.
>> + */
>> +#define PHYSDEVOP_pci_device_state_reset 32
> 
> Nit: Just a single blank as a separator will do here, for going past the
> columnar arrangement of other #define-s anyway.
OK.
> 
>>  struct physdev_pci_device {
>>  /* IN */
>>  uint16_t seg;
> 
> Is re-using this struct for this new sub-op sufficient? IOW are all
> possible resets equal, and hence it doesn't need specifying what kind of
> reset was done? For example, other than FLR most reset variants reset all
> functions in one go aiui. Imo that would better require only a single
> hypercall, just to avoid possible confusion. It also reads as if FLR would
> not reset as many registers as other reset variants would.
If I understood correctly that you mean in this hypercall it needs to support 
resetting both one function and all functions of a slot(dev)?
But it can be done for caller to use a cycle to call this reset hypercall for 
each slot function.

> 
> This may seem to not matter right now, but this is a stable interface you
> add, and hence modifying it later will be cumbersome, if possible at all.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC KERNEL PATCH v7 2/2] xen/privcmd: Add new syscall to get gsi from dev

2024-05-16 Thread Chen, Jiqian

On 2024/5/16 06:42, Stefano Stabellini wrote:
> On Wed, 15 May 2024, Jiqian Chen wrote:
>> In PVH dom0, it uses the linux local interrupt mechanism,
>> when it allocs irq for a gsi, it is dynamic, and follow
>> the principle of applying first, distributing first. And
>> the irq number is alloced from small to large, but the
>> applying gsi number is not, may gsi 38 comes before gsi 28,
>> it causes the irq number is not equal with the gsi number.
>> And when passthrough a device, QEMU will use device's gsi
>> number to do pirq mapping, but the gsi number is got from
>> file /sys/bus/pci/devices//irq, irq!= gsi, so it will
>> fail when mapping.
>> And in current linux codes, there is no method to get gsi
>> for userspace.
>>
>> For above purpose, record gsi of pcistub devices when init
>> pcistub and add a new syscall into privcmd to let userspace
>> can get gsi when they have a need.
>>
>> Co-developed-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>>  drivers/xen/privcmd.c  | 28 ++
>>  drivers/xen/xen-pciback/pci_stub.c | 38 +++---
>>  include/uapi/xen/privcmd.h |  7 ++
>>  include/xen/acpi.h |  2 ++
>>  4 files changed, 72 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index 67dfa4778864..5953a03b5cb0 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
>> @@ -45,6 +45,9 @@
>>  #include 
>>  #include 
>>  #include 
>> +#ifdef CONFIG_ACPI
>> +#include 
>> +#endif
>>  
>>  #include "privcmd.h"
>>  
>> @@ -842,6 +845,27 @@ static long privcmd_ioctl_mmap_resource(struct file 
>> *file,
>>  return rc;
>>  }
>>  
>> +static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user 
>> *udata)
>> +{
>> +struct privcmd_gsi_from_dev kdata;
>> +
>> +if (copy_from_user(, udata, sizeof(kdata)))
>> +return -EFAULT;
>> +
>> +#ifdef CONFIG_ACPI
>> +kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
>> +if (kdata.gsi == -1)
>> +return -EINVAL;
>> +#else
>> +kdata.gsi = -1;
> 
> Should we return an error instead, like -EINVAL, to make the behavior
> more similar to the CONFIG_ACPI case?
OK, will return -EINVAL if not config acpi.
Like:
static long privcmd_ioctl_gsi_from_dev(struct file *file, void __user *udata)
{
#ifdef CONFIG_ACPI
struct privcmd_gsi_from_dev kdata;

if (copy_from_user(, udata, sizeof(kdata)))
return -EFAULT;

kdata.gsi = pcistub_get_gsi_from_sbdf(kdata.sbdf);
if (kdata.gsi == -1)
return -EINVAL;

if (copy_to_user(udata, , sizeof(kdata)))
return -EFAULT;

return 0;
#else
return -EINVAL;
#endif
}

> 
> 
>> +#endif
>> +
>> +if (copy_to_user(udata, , sizeof(kdata)))
>> +return -EFAULT;
>> +
>> +return 0;
>> +}
>> +
>>  #ifdef CONFIG_XEN_PRIVCMD_EVENTFD
>>  /* Irqfd support */
>>  static struct workqueue_struct *irqfd_cleanup_wq;
>> @@ -1529,6 +1553,10 @@ static long privcmd_ioctl(struct file *file,
>>  ret = privcmd_ioctl_ioeventfd(file, udata);
>>  break;
>>  
>> +case IOCTL_PRIVCMD_GSI_FROM_DEV:
>> +ret = privcmd_ioctl_gsi_from_dev(file, udata);
>> +break;
>> +
>>  default:
>>  break;
>>  }
>> diff --git a/drivers/xen/xen-pciback/pci_stub.c 
>> b/drivers/xen/xen-pciback/pci_stub.c
>> index 2b90d832d0a7..4b62b4d377a9 100644
>> --- a/drivers/xen/xen-pciback/pci_stub.c
>> +++ b/drivers/xen/xen-pciback/pci_stub.c
>> @@ -56,6 +56,9 @@ struct pcistub_device {
>>  
>>  struct pci_dev *dev;
>>  struct xen_pcibk_device *pdev;/* non-NULL if struct pci_dev is in use */
>> +#ifdef CONFIG_ACPI
>> +int gsi;
>> +#endif
>>  };
>>  
>>  /* Access to pcistub_devices & seized_devices lists and the 
>> initialize_devices
>> @@ -88,6 +91,9 @@ static struct pcistub_device *pcistub_device_alloc(struct 
>> pci_dev *dev)
>>  
>>  kref_init(>kref);
>>  spin_lock_init(>lock);
>> +#ifdef CONFIG_ACPI
>> +psdev->gsi = -1;
>> +#endif
>>  
>>  return psdev;
>>  }
>> @@ -220,6 +226,25 @@ static struct pci_dev 
>> *pcistub_device_get_pci_dev(struct xen_pcibk_device *pdev,
>>  return pci_dev;
>>  }
>>  
>> +#ifdef CONFIG_ACPI
>> +int pcistub_get_gsi_from_sbdf(unsigned int sbdf)
>> +{
>> +struct pcistub_device *psdev;
>> +int domain = sbdf >> 16;
>> +int bus = (sbdf >> 8) & 0xff;
>> +int slot = (sbdf >> 3) & 0x1f;
>> +int func = sbdf & 0x7;
> 
> you can use PCI_DEVFN PCI_SLOT PCI_FUNC pci_domain_nr instead of open
> coding.
Thanks, will change to use these in next version.
But pci_domain_nr requires passing in pci_dev.
Will change like:
int domain = (sbdf >> 16) & 0x;
int bus = PCI_BUS_NUM(sbdf);
int slot = PCI_SLOT(sbdf);
int func = PCI_FUNC(sbdf);

> 
> 
>> +
>> +psdev = pcistub_device_find(domain, bus, slot, func);
>> +
>> +if (!psdev)
>> +

Re: [RFC KERNEL PATCH v6 3/3] xen/privcmd: Add new syscall to get gsi from irq

2024-05-13 Thread Chen, Jiqian

Hi,
On 2024/5/10 17:06, Chen, Jiqian wrote:
> Hi,
> 
> On 2024/5/10 14:46, Jürgen Groß wrote:
>> On 19.04.24 05:36, Jiqian Chen wrote:
>>> In PVH dom0, it uses the linux local interrupt mechanism,
>>> when it allocs irq for a gsi, it is dynamic, and follow
>>> the principle of applying first, distributing first. And
>>> the irq number is alloced from small to large, but the
>>> applying gsi number is not, may gsi 38 comes before gsi 28,
>>> it causes the irq number is not equal with the gsi number.
>>> And when passthrough a device, QEMU will use device's gsi
>>> number to do pirq mapping, but the gsi number is got from
>>> file /sys/bus/pci/devices//irq, irq!= gsi, so it will
>>> fail when mapping.
>>> And in current linux codes, there is no method to translate
>>> irq to gsi for userspace.
>>>
>>> For above purpose, record the relationship of gsi and irq
>>> when PVH dom0 do acpi_register_gsi_ioapic for devices and
>>> adds a new syscall into privcmd to let userspace can get
>>> that translation when they have a need.
>>>
>>> Co-developed-by: Huang Rui 
>>> Signed-off-by: Jiqian Chen 
>>> ---
>>>   arch/x86/include/asm/apic.h  |  8 +++
>>>   arch/x86/include/asm/xen/pci.h   |  5 
>>>   arch/x86/kernel/acpi/boot.c  |  2 +-
>>>   arch/x86/pci/xen.c   | 21 +
>>>   drivers/xen/events/events_base.c | 39 
>>>   drivers/xen/privcmd.c    | 19 
>>>   include/uapi/xen/privcmd.h   |  7 ++
>>>   include/xen/events.h |  5 
>>>   8 files changed, 105 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
>>> index 9d159b771dc8..dd4139250895 100644
>>> --- a/arch/x86/include/asm/apic.h
>>> +++ b/arch/x86/include/asm/apic.h
>>> @@ -169,6 +169,9 @@ extern bool apic_needs_pit(void);
>>>     extern void apic_send_IPI_allbutself(unsigned int vector);
>>>   +extern int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
>>> +    int trigger, int polarity);
>>> +
>>>   #else /* !CONFIG_X86_LOCAL_APIC */
>>>   static inline void lapic_shutdown(void) { }
>>>   #define local_apic_timer_c2_ok    1
>>> @@ -183,6 +186,11 @@ static inline void apic_intr_mode_init(void) { }
>>>   static inline void lapic_assign_system_vectors(void) { }
>>>   static inline void lapic_assign_legacy_vector(unsigned int i, bool r) { }
>>>   static inline bool apic_needs_pit(void) { return true; }
>>> +static inline int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
>>> +    int trigger, int polarity)
>>> +{
>>> +    return (int)gsi;
>>> +}
>>>   #endif /* !CONFIG_X86_LOCAL_APIC */
>>>     #ifdef CONFIG_X86_X2APIC
>>> diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
>>> index 9015b888edd6..aa8ded61fc2d 100644
>>> --- a/arch/x86/include/asm/xen/pci.h
>>> +++ b/arch/x86/include/asm/xen/pci.h
>>> @@ -5,6 +5,7 @@
>>>   #if defined(CONFIG_PCI_XEN)
>>>   extern int __init pci_xen_init(void);
>>>   extern int __init pci_xen_hvm_init(void);
>>> +extern int __init pci_xen_pvh_init(void);
>>>   #define pci_xen 1
>>>   #else
>>>   #define pci_xen 0
>>> @@ -13,6 +14,10 @@ static inline int pci_xen_hvm_init(void)
>>>   {
>>>   return -1;
>>>   }
>>> +static inline int pci_xen_pvh_init(void)
>>> +{
>>> +    return -1;
>>> +}
>>>   #endif
>>>   #ifdef CONFIG_XEN_PV_DOM0
>>>   int __init pci_xen_initial_domain(void);
>>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>>> index 85a3ce2a3666..72c73458c083 100644
>>> --- a/arch/x86/kernel/acpi/boot.c
>>> +++ b/arch/x86/kernel/acpi/boot.c
>>> @@ -749,7 +749,7 @@ static int acpi_register_gsi_pic(struct device *dev, 
>>> u32 gsi,
>>>   }
>>>     #ifdef CONFIG_X86_LOCAL_APIC
>>> -static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
>>> +int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
>>>   int trigger, int polarity)
>>>   {
>>>   int irq = gsi;
>>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>>> index 652cd53e77f6..f056ab5c0a06 100644
>>> --- a/arch/x86/pci/xen.c

Re: [RFC KERNEL PATCH v6 3/3] xen/privcmd: Add new syscall to get gsi from irq

2024-05-10 Thread Chen, Jiqian

On 2024/5/10 19:27, Jürgen Groß wrote:
> On 10.05.24 12:32, Chen, Jiqian wrote:
>> On 2024/5/10 18:21, Jürgen Groß wrote:
>>> On 10.05.24 12:13, Chen, Jiqian wrote:
>>>> On 2024/5/10 17:53, Jürgen Groß wrote:
>>>>> On 10.05.24 11:06, Chen, Jiqian wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 2024/5/10 14:46, Jürgen Groß wrote:
>>>>>>> On 19.04.24 05:36, Jiqian Chen wrote:
>>>>>>>> +
>>>>>>>> +    info->type = IRQT_PIRQ;
>>>>>> I am considering whether I need to use a new type(like IRQT_GSI) here to 
>>>>>> distinguish with IRQT_PIRQ, because function restore_pirqs will process 
>>>>>> all IRQT_PIRQ.
>>>>>
>>>>> restore_pirqs() already considers gsi == 0 to be not GSI related. Isn't 
>>>>> this
>>>>> enough?
>>>> No, it is not enough.
>>>> xen_pvh_add_gsi_irq_map adds the mapping of gsi and irq, but the value of 
>>>> gsi is not 0,
>>>> once restore_pirqs is called, it will do PHYSDEVOP_map_pirq for that gsi, 
>>>> but in pvh dom0, we shouldn't do PHYSDEVOP_map_pirq.
>>>
>>> Okay, then add a new flag to info->u.pirq.flags for that purpose?
>> I feel like adding "new flag to info->u.pirq.flags" is not as good as adding 
>> " new type to info->type".
>> Because in restore_pirqs, it considers " info->type != IRQT_PIRQ", if adding 
>> " new flag to info->u.pirq.flags", we need to add a new condition in 
>> restore_pirqs.
>> And actually this mapping(gsi and irq of pvh) doesn't have pirq, so it is 
>> not suitable to add to u.pirq.flags.
> 
> Does this mean there is no other IRQT_PIRQ related activity relevant for 
> those GSIs/IRQs?
Yes, I think so.
> In that case I agree to add IRQT_GSI.
Thank you!
> 
> 
> Juergen

-- 
Best regards,
Jiqian Chen.

Re: [RFC KERNEL PATCH v6 3/3] xen/privcmd: Add new syscall to get gsi from irq

2024-05-10 Thread Chen, Jiqian

On 2024/5/10 18:21, Jürgen Groß wrote:
> On 10.05.24 12:13, Chen, Jiqian wrote:
>> On 2024/5/10 17:53, Jürgen Groß wrote:
>>> On 10.05.24 11:06, Chen, Jiqian wrote:
>>>> Hi,
>>>>
>>>> On 2024/5/10 14:46, Jürgen Groß wrote:
>>>>> On 19.04.24 05:36, Jiqian Chen wrote:
>>>>>> +
>>>>>> +    info->type = IRQT_PIRQ;
>>>> I am considering whether I need to use a new type(like IRQT_GSI) here to 
>>>> distinguish with IRQT_PIRQ, because function restore_pirqs will process 
>>>> all IRQT_PIRQ.
>>>
>>> restore_pirqs() already considers gsi == 0 to be not GSI related. Isn't this
>>> enough?
>> No, it is not enough.
>> xen_pvh_add_gsi_irq_map adds the mapping of gsi and irq, but the value of 
>> gsi is not 0,
>> once restore_pirqs is called, it will do PHYSDEVOP_map_pirq for that gsi, 
>> but in pvh dom0, we shouldn't do PHYSDEVOP_map_pirq.
> 
> Okay, then add a new flag to info->u.pirq.flags for that purpose?
I feel like adding "new flag to info->u.pirq.flags" is not as good as adding " 
new type to info->type".
Because in restore_pirqs, it considers " info->type != IRQT_PIRQ", if adding " 
new flag to info->u.pirq.flags", we need to add a new condition in 
restore_pirqs.
And actually this mapping(gsi and irq of pvh) doesn't have pirq, so it is not 
suitable to add to u.pirq.flags.

> 
> 
> Juergen
> 

-- 
Best regards,
Jiqian Chen.

Re: [RFC KERNEL PATCH v6 3/3] xen/privcmd: Add new syscall to get gsi from irq

2024-05-10 Thread Chen, Jiqian

On 2024/5/10 17:53, Jürgen Groß wrote:
> On 10.05.24 11:06, Chen, Jiqian wrote:
>> Hi,
>>
>> On 2024/5/10 14:46, Jürgen Groß wrote:
>>> On 19.04.24 05:36, Jiqian Chen wrote:
>>>> +
>>>> +    info->type = IRQT_PIRQ;
>> I am considering whether I need to use a new type(like IRQT_GSI) here to 
>> distinguish with IRQT_PIRQ, because function restore_pirqs will process all 
>> IRQT_PIRQ.
> 
> restore_pirqs() already considers gsi == 0 to be not GSI related. Isn't this
> enough?
No, it is not enough.
xen_pvh_add_gsi_irq_map adds the mapping of gsi and irq, but the value of gsi 
is not 0,
once restore_pirqs is called, it will do PHYSDEVOP_map_pirq for that gsi, but 
in pvh dom0, we shouldn't do PHYSDEVOP_map_pirq.

> 
> 
> Juergen

-- 
Best regards,
Jiqian Chen.

Re: [RFC KERNEL PATCH v6 3/3] xen/privcmd: Add new syscall to get gsi from irq

2024-05-10 Thread Chen, Jiqian

Hi,

On 2024/5/10 14:46, Jürgen Groß wrote:
> On 19.04.24 05:36, Jiqian Chen wrote:
>> In PVH dom0, it uses the linux local interrupt mechanism,
>> when it allocs irq for a gsi, it is dynamic, and follow
>> the principle of applying first, distributing first. And
>> the irq number is alloced from small to large, but the
>> applying gsi number is not, may gsi 38 comes before gsi 28,
>> it causes the irq number is not equal with the gsi number.
>> And when passthrough a device, QEMU will use device's gsi
>> number to do pirq mapping, but the gsi number is got from
>> file /sys/bus/pci/devices//irq, irq!= gsi, so it will
>> fail when mapping.
>> And in current linux codes, there is no method to translate
>> irq to gsi for userspace.
>>
>> For above purpose, record the relationship of gsi and irq
>> when PVH dom0 do acpi_register_gsi_ioapic for devices and
>> adds a new syscall into privcmd to let userspace can get
>> that translation when they have a need.
>>
>> Co-developed-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>>   arch/x86/include/asm/apic.h  |  8 +++
>>   arch/x86/include/asm/xen/pci.h   |  5 
>>   arch/x86/kernel/acpi/boot.c  |  2 +-
>>   arch/x86/pci/xen.c   | 21 +
>>   drivers/xen/events/events_base.c | 39 
>>   drivers/xen/privcmd.c    | 19 
>>   include/uapi/xen/privcmd.h   |  7 ++
>>   include/xen/events.h |  5 
>>   8 files changed, 105 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
>> index 9d159b771dc8..dd4139250895 100644
>> --- a/arch/x86/include/asm/apic.h
>> +++ b/arch/x86/include/asm/apic.h
>> @@ -169,6 +169,9 @@ extern bool apic_needs_pit(void);
>>     extern void apic_send_IPI_allbutself(unsigned int vector);
>>   +extern int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
>> +    int trigger, int polarity);
>> +
>>   #else /* !CONFIG_X86_LOCAL_APIC */
>>   static inline void lapic_shutdown(void) { }
>>   #define local_apic_timer_c2_ok    1
>> @@ -183,6 +186,11 @@ static inline void apic_intr_mode_init(void) { }
>>   static inline void lapic_assign_system_vectors(void) { }
>>   static inline void lapic_assign_legacy_vector(unsigned int i, bool r) { }
>>   static inline bool apic_needs_pit(void) { return true; }
>> +static inline int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
>> +    int trigger, int polarity)
>> +{
>> +    return (int)gsi;
>> +}
>>   #endif /* !CONFIG_X86_LOCAL_APIC */
>>     #ifdef CONFIG_X86_X2APIC
>> diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
>> index 9015b888edd6..aa8ded61fc2d 100644
>> --- a/arch/x86/include/asm/xen/pci.h
>> +++ b/arch/x86/include/asm/xen/pci.h
>> @@ -5,6 +5,7 @@
>>   #if defined(CONFIG_PCI_XEN)
>>   extern int __init pci_xen_init(void);
>>   extern int __init pci_xen_hvm_init(void);
>> +extern int __init pci_xen_pvh_init(void);
>>   #define pci_xen 1
>>   #else
>>   #define pci_xen 0
>> @@ -13,6 +14,10 @@ static inline int pci_xen_hvm_init(void)
>>   {
>>   return -1;
>>   }
>> +static inline int pci_xen_pvh_init(void)
>> +{
>> +    return -1;
>> +}
>>   #endif
>>   #ifdef CONFIG_XEN_PV_DOM0
>>   int __init pci_xen_initial_domain(void);
>> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
>> index 85a3ce2a3666..72c73458c083 100644
>> --- a/arch/x86/kernel/acpi/boot.c
>> +++ b/arch/x86/kernel/acpi/boot.c
>> @@ -749,7 +749,7 @@ static int acpi_register_gsi_pic(struct device *dev, u32 
>> gsi,
>>   }
>>     #ifdef CONFIG_X86_LOCAL_APIC
>> -static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
>> +int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
>>   int trigger, int polarity)
>>   {
>>   int irq = gsi;
>> diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
>> index 652cd53e77f6..f056ab5c0a06 100644
>> --- a/arch/x86/pci/xen.c
>> +++ b/arch/x86/pci/xen.c
>> @@ -114,6 +114,21 @@ static int acpi_register_gsi_xen_hvm(struct device 
>> *dev, u32 gsi,
>>    false /* no mapping of GSI to PIRQ */);
>>   }
>>   +static int acpi_register_gsi_xen_pvh(struct device *dev, u32 gsi,
>> +    int trigger, int polarity)
>> +{
>> +    int irq;
>> +
>> +    irq = acpi_register_gsi_ioapic(dev, gsi, trigger, polarity);
>> +    if (irq < 0)
>> +    return irq;
>> +
>> +    if (xen_pvh_add_gsi_irq_map(gsi, irq) == -EEXIST)
>> +    printk(KERN_INFO "Already map the GSI :%u and IRQ: %d\n", gsi, irq);
>> +
>> +    return irq;
>> +}
>> +
>>   #ifdef CONFIG_XEN_PV_DOM0
>>   static int xen_register_gsi(u32 gsi, int triggering, int polarity)
>>   {
>> @@ -558,6 +573,12 @@ int __init pci_xen_hvm_init(void)
>>   return 0;
>>   }
>>   +int __init pci_xen_pvh_init(void)
>> +{
>> +    __acpi_register_gsi = acpi_register_gsi_xen_pvh;
> 
> No support for unregistering the gsi again?

Re: [RFC KERNEL PATCH v6 2/3] xen/pvh: Setup gsi for passthrough device

2024-05-10 Thread Chen, Jiqian

Hi,

On 2024/5/10 15:48, Juergen Gross wrote:
> On 19.04.24 05:36, Jiqian Chen wrote:
>> In PVH dom0, the gsis don't get registered, but the gsi of
>> a passthrough device must be configured for it to be able to be
>> mapped into a domU.
>>
>> When assign a device to passthrough, proactively setup the gsi
>> of the device during that process.
>>
>> Co-developed-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> Reviewed-by: Stefano Stabellini 
> 
> This patch is breaking the build.
> 
> On Arm I get:
> 
> In file included from 
> /home/gross/korg/src/drivers/xen/xen-pciback/pci_stub.c:23:0:
> /home/gross/korg/src/include/xen/acpi.h: In function 
> 'xen_acpi_sleep_register':
> /home/gross/korg/src/include/xen/acpi.h:67:3: error: 'acpi_suspend_lowlevel' 
> undeclared (first use in this function); did you mean 
> 'xen_acpi_suspend_lowlevel'?
>    acpi_suspend_lowlevel = xen_acpi_suspend_lowlevel;
>    ^
>    xen_acpi_suspend_lowlevel
> /home/gross/korg/src/include/xen/acpi.h:67:3: note: each undeclared 
> identifier is reported only once for each function it appears in
> make[6]: *** [/home/gross/korg/src/scripts/Makefile.build:244: 
> drivers/xen/xen-pciback/pci_stub.o] Error 1
> make[5]: *** [/home/gross/korg/src/scripts/Makefile.build:485: 
> drivers/xen/xen-pciback] Error 2
> make[4]: *** [/home/gross/korg/src/scripts/Makefile.build:485: drivers/xen] 
> Error 2
Thanks for testing on Arm, it seems I should use macro "CONFIG_X86" to isolate 
the modifications to this file.

> 
> Additionally I'm seeing this warning on x86_64:
> 
> /home/gross/korg/src/arch/x86/xen/enlighten_pvh.c:97:5: warning: no previous 
> prototype for ‘xen_pvh_passthrough_gsi’ [-Wmissing-prototypes]
>  int xen_pvh_passthrough_gsi(struct pci_dev *dev)
I think I need to add " #include  " in file enlighten_pvh.c.

> 
> 
> Juergen

-- 
Best regards,
Jiqian Chen.

Re: [RFC KERNEL PATCH v4 3/3] PCI/sysfs: Add gsi sysfs for pci_dev

2024-04-08 Thread Chen, Jiqian

Hi Bjorn,
It has been almost two months since we received your reply last time.
This series are blocking on this patch, since there are patches on Xen and Qemu 
side depending on it.
Do you still have any confusion about this patch? Or do you have other 
suggestions?
If no, may I get your Reviewed-by?

On 2024/3/1 15:57, Chen, Jiqian wrote:
> Hi Bjorn,
> Looking forward to getting your more inputs and suggestions.
> It seems /sys/bus/acpi/devices/PNP0A03:00/ is not a good place to create gsi 
> sysfs.
> 
> On 2024/2/15 16:37, Roger Pau Monné wrote:
>> On Mon, Feb 12, 2024 at 01:18:58PM -0600, Bjorn Helgaas wrote:
>>> On Mon, Feb 12, 2024 at 10:13:28AM +0100, Roger Pau Monné wrote:
>>>> On Fri, Feb 09, 2024 at 03:05:49PM -0600, Bjorn Helgaas wrote:
>>>>> On Thu, Feb 01, 2024 at 09:39:49AM +0100, Roger Pau Monné wrote:
>>>>>> On Wed, Jan 31, 2024 at 01:00:14PM -0600, Bjorn Helgaas wrote:
>>>>>>> On Wed, Jan 31, 2024 at 09:58:19AM +0100, Roger Pau Monné wrote:
>>>>>>>> On Tue, Jan 30, 2024 at 02:44:03PM -0600, Bjorn Helgaas wrote:
>>>>>>>>> On Tue, Jan 30, 2024 at 10:07:36AM +0100, Roger Pau Monné wrote:
>>>>>>>>>> On Mon, Jan 29, 2024 at 04:01:13PM -0600, Bjorn Helgaas wrote:
>>>>>>>>>>> On Thu, Jan 25, 2024 at 07:17:24AM +, Chen, Jiqian wrote:
>>>>>>>>>>>> On 2024/1/24 00:02, Bjorn Helgaas wrote:
>>>>>>>>>>>>> On Tue, Jan 23, 2024 at 10:13:52AM +, Chen, Jiqian wrote:
>>>>>>>>>>>>>> On 2024/1/23 07:37, Bjorn Helgaas wrote:
>>>>>>>>>>>>>>> On Fri, Jan 05, 2024 at 02:22:17PM +0800, Jiqian Chen wrote:
>>>>>>>>>>>>>>>> There is a need for some scenarios to use gsi sysfs.
>>>>>>>>>>>>>>>> For example, when xen passthrough a device to dumU, it will
>>>>>>>>>>>>>>>> use gsi to map pirq, but currently userspace can't get gsi
>>>>>>>>>>>>>>>> number.
>>>>>>>>>>>>>>>> So, add gsi sysfs for that and for other potential scenarios.
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I don't know enough about Xen to know why it needs the GSI in
>>>>>>>>>>>>>>> userspace.  Is this passthrough brand new functionality that 
>>>>>>>>>>>>>>> can't be
>>>>>>>>>>>>>>> done today because we don't expose the GSI yet?
>>>>>>>>>>>
>>>>>>>>>>> I assume this must be new functionality, i.e., this kind of
>>>>>>>>>>> passthrough does not work today, right?
>>>>>>>>>>>
>>>>>>>>>>>>>> has ACPI support and is responsible for detecting and controlling
>>>>>>>>>>>>>> the hardware, also it performs privileged operations such as the
>>>>>>>>>>>>>> creation of normal (unprivileged) domains DomUs. When we give to 
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> DomU direct access to a device, we need also to route the 
>>>>>>>>>>>>>> physical
>>>>>>>>>>>>>> interrupts to the DomU. In order to do so Xen needs to setup and 
>>>>>>>>>>>>>> map
>>>>>>>>>>>>>> the interrupts appropriately.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What kernel interfaces are used for this setup and mapping?
>>>>>>>>>>>>
>>>>>>>>>>>> For passthrough devices, the setup and mapping of routing physical
>>>>>>>>>>>> interrupts to DomU are done on Xen hypervisor side, hypervisor only
>>>>>>>>>>>> need userspace to provide the GSI info, see Xen code:
>>>>>>>>>>>> xc_physdev_map_pirq require GSI and then will call hypercall to 
>>>>>>>>>>>> pass
>>>>>>>>>>>> GSI into hypervisor and then hypervisor will do th

Re: [RFC XEN PATCH v6 4/5] libxl: Use gsi instead of irq for mapping pirq

2024-03-28 Thread Chen, Jiqian


On 2024/3/29 01:32, Anthony PERARD wrote:
> On Thu, Mar 28, 2024 at 02:34:01PM +0800, Jiqian Chen wrote:
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index 96cb4da0794e..2cec83e0b734 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1478,8 +1478,14 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  fclose(f);
>>  if (!pci_supp_legacy_irq())
>>  goto out_no_irq;
>> -sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>> +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
>>  pci->bus, pci->dev, pci->func);
>> +r = access(sysfs_path, F_OK);
>> +if (r && errno == ENOENT) {
>> +/* To compitable with old version of kernel, still need to use irq 
>> */
> 
> There's a typo, this would be "To be compatible ...". Also maybe
> something like "Fallback to "/irq" for compatibility with old version of
> the kernel." might sound better.
> 
>> +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>> +   pci->bus, pci->dev, pci->func);
>> +}
>>  f = fopen(sysfs_path, "r");
>>  if (f == NULL) {
>>  LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
>> @@ -2229,9 +2235,15 @@ skip_bar:
>>  if (!pci_supp_legacy_irq())
>>  goto skip_legacy_irq;
>>  
>> -sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>> +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
>> pci->bus, pci->dev, pci->func);
>>  
>> +rc = access(sysfs_path, F_OK);
> 
> Please, don't use the variable `rc` here, this one is reserved for libxl
> error/return code in libxl. Introduce `int r` instead.
> 
>> +if (rc && errno == ENOENT) {
>> +/* To compitable with old version of kernel, still need to use irq 
>> */
>> +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>> +   pci->bus, pci->dev, pci->func);
>> +}
>>  f = fopen(sysfs_path, "r");
>>  if (f == NULL) {
>>  LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
> 
> With those two things fixed: Reviewed-by: Anthony PERARD 
> 
Thank you very much!
I will fix those two in next version.

> 
> Thanks,
> 

-- 
Best regards,
Jiqian Chen.

Re: [RFC KERNEL PATCH v4 3/3] PCI/sysfs: Add gsi sysfs for pci_dev

2024-02-29 Thread Chen, Jiqian

Hi Bjorn,
Looking forward to getting your more inputs and suggestions.
It seems /sys/bus/acpi/devices/PNP0A03:00/ is not a good place to create gsi 
sysfs.

On 2024/2/15 16:37, Roger Pau Monné wrote:
> On Mon, Feb 12, 2024 at 01:18:58PM -0600, Bjorn Helgaas wrote:
>> On Mon, Feb 12, 2024 at 10:13:28AM +0100, Roger Pau Monné wrote:
>>> On Fri, Feb 09, 2024 at 03:05:49PM -0600, Bjorn Helgaas wrote:
>>>> On Thu, Feb 01, 2024 at 09:39:49AM +0100, Roger Pau Monné wrote:
>>>>> On Wed, Jan 31, 2024 at 01:00:14PM -0600, Bjorn Helgaas wrote:
>>>>>> On Wed, Jan 31, 2024 at 09:58:19AM +0100, Roger Pau Monné wrote:
>>>>>>> On Tue, Jan 30, 2024 at 02:44:03PM -0600, Bjorn Helgaas wrote:
>>>>>>>> On Tue, Jan 30, 2024 at 10:07:36AM +0100, Roger Pau Monné wrote:
>>>>>>>>> On Mon, Jan 29, 2024 at 04:01:13PM -0600, Bjorn Helgaas wrote:
>>>>>>>>>> On Thu, Jan 25, 2024 at 07:17:24AM +, Chen, Jiqian wrote:
>>>>>>>>>>> On 2024/1/24 00:02, Bjorn Helgaas wrote:
>>>>>>>>>>>> On Tue, Jan 23, 2024 at 10:13:52AM +, Chen, Jiqian wrote:
>>>>>>>>>>>>> On 2024/1/23 07:37, Bjorn Helgaas wrote:
>>>>>>>>>>>>>> On Fri, Jan 05, 2024 at 02:22:17PM +0800, Jiqian Chen wrote:
>>>>>>>>>>>>>>> There is a need for some scenarios to use gsi sysfs.
>>>>>>>>>>>>>>> For example, when xen passthrough a device to dumU, it will
>>>>>>>>>>>>>>> use gsi to map pirq, but currently userspace can't get gsi
>>>>>>>>>>>>>>> number.
>>>>>>>>>>>>>>> So, add gsi sysfs for that and for other potential scenarios.
>>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>>>> I don't know enough about Xen to know why it needs the GSI in
>>>>>>>>>>>>>> userspace.  Is this passthrough brand new functionality that 
>>>>>>>>>>>>>> can't be
>>>>>>>>>>>>>> done today because we don't expose the GSI yet?
>>>>>>>>>>
>>>>>>>>>> I assume this must be new functionality, i.e., this kind of
>>>>>>>>>> passthrough does not work today, right?
>>>>>>>>>>
>>>>>>>>>>>>> has ACPI support and is responsible for detecting and controlling
>>>>>>>>>>>>> the hardware, also it performs privileged operations such as the
>>>>>>>>>>>>> creation of normal (unprivileged) domains DomUs. When we give to a
>>>>>>>>>>>>> DomU direct access to a device, we need also to route the physical
>>>>>>>>>>>>> interrupts to the DomU. In order to do so Xen needs to setup and 
>>>>>>>>>>>>> map
>>>>>>>>>>>>> the interrupts appropriately.
>>>>>>>>>>>>
>>>>>>>>>>>> What kernel interfaces are used for this setup and mapping?
>>>>>>>>>>>
>>>>>>>>>>> For passthrough devices, the setup and mapping of routing physical
>>>>>>>>>>> interrupts to DomU are done on Xen hypervisor side, hypervisor only
>>>>>>>>>>> need userspace to provide the GSI info, see Xen code:
>>>>>>>>>>> xc_physdev_map_pirq require GSI and then will call hypercall to pass
>>>>>>>>>>> GSI into hypervisor and then hypervisor will do the mapping and
>>>>>>>>>>> routing, kernel doesn't do the setup and mapping.
>>>>>>>>>>
>>>>>>>>>> So we have to expose the GSI to userspace not because userspace 
>>>>>>>>>> itself
>>>>>>>>>> uses it, but so userspace can turn around and pass it back into the
>>>>>>>>>> kernel?
>>>>>>>>>
>>>>>>>>> No, the point is to pass it back to Xen, which doesn't know the
>>>>>>>>> mapping between GSIs and PCI devices because it can't execute the ACPI
>>>>

Re: [RFC XEN PATCH v5 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-02-25 Thread Chen, Jiqian

On 2024/2/23 23:59, Anthony PERARD wrote:
> On Thu, Feb 22, 2024 at 07:22:45AM +0000, Chen, Jiqian wrote:
>> On 2024/2/21 23:03, Anthony PERARD wrote:
>>> On Fri, Jan 12, 2024 at 02:13:17PM +0800, Jiqian Chen wrote:
>>>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>>>> index a1c6e82631e9..4136a860a048 100644
>>>> --- a/tools/libs/light/libxl_pci.c
>>>> +++ b/tools/libs/light/libxl_pci.c
>>>> @@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>  uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>>>>  uint32_t domainid = domid;
>>>>  bool isstubdom = libxl_is_stubdom(ctx, domid, );
>>>> +int gsi;
>>>> +bool has_gsi = true;
>>>
>>> Why is this function has "gsi" variable, and "gsi = irq" but the next
>>> function pci_remove_detached() does only have "irq" variable?
>> Because in pci_add_dm_done, it calls " r = xc_physdev_map_pirq(ctx->xch, 
>> domid, irq, ); ", it passes the pointer of irq in, and then irq will be 
>> changed, so I need to use gsi to save the origin value.
> 
> I'm sorry but unconditional "gsi = irq" looks very wrong, that line
Make sense, I should add a condition and some comments here. Maybe:
/* if has gsi sysfs node, then fscanf() read from the gsi sysfs and 
store it in irq variable, we need to save the original gsi value, because irq 
will be changed in xc_physdev_map_pirq */
if (has_gsi)
gsi = irq;

> needs to be removed or changed, otherwise we have two functions doing the
> same thing just after that (xc_domain_irq_permission and
> xc_domain_gsi_permission). Instead, my guess is that the
> arguments of xc_physdev_map_pirq() needs to be changes. What does
> contain `irq` after the map_pirq() call? A "pirq" of some kind? Maybe
Yes, pirq.

> xc_physdev_map_pirq(ctx->xch, domid, irq, ) would make things more
> clear about what's going on.
> 
> 
> And BTW, maybe renaming the variable "has_gsi" to "is_gsi" might make
> things a bit clearer as well, as in: "the variable 'irq' $is_gsi",
> instead of a variable that have a meaning of "the system $has_gsi"
> without necessarily meaning that the code is using it.
Gsi is a new sysfs node added by my kernel patch, and it is still in discussion 
with PCI Maintainer.
And for compatible with old version of kernel that didn't has gsi sysfs node, 
we still need to use irq here.
So, I named this variable to has_gsi. Is it clearer that changing it to 
has_gsi_sysfs_node?
Maybe I need to add some comments in code to make the usage of gsi clear.

> 
> Maybe using (abusing?) the variable name "irq" might be a bit wrong now?
> What does "GSI" stand for anyway? And about "PIRQ"? This is just some
GSI is x86 specific concept, it is related to IOAPIC-PIN. PIRQ is used to route 
interrupts in Xen.

> question to figure out if there's potentially a better name for the
> variables names.
> 
> Thanks,
> 

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v5 3/5] x86/pvh: Add PHYSDEVOP_setup_gsi for PVH dom0

2024-02-25 Thread Chen, Jiqian


On 2024/2/23 08:44, Stefano Stabellini wrote:
> On Fri, 12 Jan 2024, Jiqian Chen wrote:
>> On PVH dom0, the gsis don't get registered, but
>> the gsi of a passthrough device must be configured for it to
>> be able to be mapped into a hvm domU.
>> On Linux kernel side, it calles PHYSDEVOP_setup_gsi for
>> passthrough devices to register gsi when dom0 is PVH.
>> So, add PHYSDEVOP_setup_gsi for above purpose.
>>
>> Co-developed-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>>  xen/arch/x86/hvm/hypercall.c | 6 ++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 493998b42ec5..46f51ee459f6 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -76,6 +76,12 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
>> arg)
>>  case PHYSDEVOP_unmap_pirq:
>>  break;
>>  
>> +case PHYSDEVOP_setup_gsi:
>> +if ( !is_hardware_domain(currd) )
>> +return -EOPNOTSUPP;
>> +ASSERT(!has_pirq(currd));
> 
> Do we really need this assert? I understand that the use case right now
> is for !has_pirq(currd) but in general it doesn't seem to me that
> PHYSDEVOP_setup_gsi and !has_pirq should be tied together.
Make sense, thanks for explanation, I will delete this in next version.

> 
> Aside from that, it looks fine.
> 
> 
>> +break;
>> +
>>  case PHYSDEVOP_eoi:
>>  case PHYSDEVOP_irq_status_query:
>>  case PHYSDEVOP_get_free_pirq:
>> -- 
>> 2.34.1
>>

-- 
Best regards,
Jiqian Chen.

Re: [RFC KERNEL PATCH v4 2/3] xen/pvh: Setup gsi for passthrough device

2024-02-22 Thread Chen, Jiqian

On 2024/2/23 08:23, Stefano Stabellini wrote:
> On Fri, 5 Jan 2024, Jiqian Chen wrote:
>> In PVH dom0, the gsis don't get registered, but the gsi of
>> a passthrough device must be configured for it to be able to be
>> mapped into a domU.
>>
>> When assign a device to passthrough, proactively setup the gsi
>> of the device during that process.
>>
>> Co-developed-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>>  arch/x86/xen/enlighten_pvh.c   | 90 ++
>>  drivers/acpi/pci_irq.c |  2 +-
>>  drivers/xen/xen-pciback/pci_stub.c |  8 +++
>>  include/linux/acpi.h   |  1 +
>>  include/xen/acpi.h |  6 ++
>>  5 files changed, 106 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/xen/enlighten_pvh.c b/arch/x86/xen/enlighten_pvh.c
>> index ada3868c02c2..ecadd966c684 100644
>> --- a/arch/x86/xen/enlighten_pvh.c
>> +++ b/arch/x86/xen/enlighten_pvh.c
>> @@ -1,6 +1,7 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #include 
>>  
>> @@ -25,6 +26,95 @@
>>  bool __ro_after_init xen_pvh;
>>  EXPORT_SYMBOL_GPL(xen_pvh);
>>  
>> +typedef struct gsi_info {
>> +int gsi;
>> +int trigger;
>> +int polarity;
>> +} gsi_info_t;
>> +
>> +struct acpi_prt_entry {
>> +struct acpi_pci_id  id;
>> +u8  pin;
>> +acpi_handle link;
>> +u32 index;  /* GSI, or link _CRS index */
>> +};
>> +
>> +static int xen_pvh_get_gsi_info(struct pci_dev *dev,
>> +gsi_info_t 
>> *gsi_info)
>> +{
>> +int gsi;
>> +u8 pin = 0;
>> +struct acpi_prt_entry *entry;
>> +int trigger = ACPI_LEVEL_SENSITIVE;
>> +int polarity = acpi_irq_model == ACPI_IRQ_MODEL_GIC ?
>> +  ACPI_ACTIVE_HIGH : ACPI_ACTIVE_LOW;
>> +
>> +if (dev)
>> +pin = dev->pin;
> 
> This is minor, but you can just move the pin = dev->pin after the !dev
> check below.
Will change in next version.

> 
> With that change, and assuming the Xen-side and QEMU-side patches are
> accepted:
> 
> Reviewed-by: Stefano Stabellini 
Thank you very much!

> 
> 
> 
> 
>> +if (!dev || !pin || !gsi_info)
>> +return -EINVAL;
>> +
>> +entry = acpi_pci_irq_lookup(dev, pin);
>> +if (entry) {
>> +if (entry->link)
>> +gsi = acpi_pci_link_allocate_irq(entry->link,
>> + entry->index,
>> + , ,
>> + NULL);
>> +else
>> +gsi = entry->index;
>> +} else
>> +gsi = -1;
>> +
>> +if (gsi < 0)
>> +return -EINVAL;
>> +
>> +gsi_info->gsi = gsi;
>> +gsi_info->trigger = trigger;
>> +gsi_info->polarity = polarity;
>> +
>> +return 0;
>> +}
>> +
>> +static int xen_pvh_setup_gsi(gsi_info_t *gsi_info)
>> +{
>> +struct physdev_setup_gsi setup_gsi;
>> +
>> +if (!gsi_info)
>> +return -EINVAL;
>> +
>> +setup_gsi.gsi = gsi_info->gsi;
>> +setup_gsi.triggering = (gsi_info->trigger == ACPI_EDGE_SENSITIVE ? 0 : 
>> 1);
>> +setup_gsi.polarity = (gsi_info->polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
>> +
>> +return HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, _gsi);
>> +}
>> +
>> +int xen_pvh_passthrough_gsi(struct pci_dev *dev)
>> +{
>> +int ret;
>> +gsi_info_t gsi_info;
>> +
>> +if (!dev)
>> +return -EINVAL;
>> +
>> +ret = xen_pvh_get_gsi_info(dev, _info);
>> +if (ret) {
>> +xen_raw_printk("Fail to get gsi info!\n");
>> +return ret;
>> +}
>> +
>> +ret = xen_pvh_setup_gsi(_info);
>> +if (ret == -EEXIST) {
>> +xen_raw_printk("Already setup the GSI :%d\n", gsi_info.gsi);
>> +ret = 0;
>> +} else if (ret)
>> +xen_raw_printk("Fail to setup GSI (%d)!\n", gsi_info.gsi);
>> +
>> +return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(xen_pvh_passthrough_gsi);
>> +
>>  void __init xen_pvh_init(struct boot_params *boot_params)
>>  {
>>  u32 msr;
>> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
>> index ff30ceca2203..630fe0a34bc6 100644
>> --- a/drivers/acpi/pci_irq.c
>> +++ b/drivers/acpi/pci_irq.c
>> @@ -288,7 +288,7 @@ static int acpi_reroute_boot_interrupt(struct pci_dev 
>> *dev,
>>  }
>>  #endif /* CONFIG_X86_IO_APIC */
>>  
>> -static struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int 
>> pin)
>> +struct acpi_prt_entry *acpi_pci_irq_lookup(struct pci_dev *dev, int pin)
>>  {
>>  struct acpi_prt_entry *entry = NULL;
>>  struct pci_dev *bridge;
>> diff --git a/drivers/xen/xen-pciback/pci_stub.c 
>> b/drivers/xen/xen-pciback/pci_stub.c
>> index 46c40ec8a18e..22d4380d2b04 100644
>> --- a/drivers/xen/xen-pciback/pci_stub.c
>> +++ b/drivers/xen/xen-pciback/pci_stub.c
>> @@

Re: [RFC XEN PATCH v5 2/5] x86/pvh: Allow (un)map_pirq when dom0 is PVH

2024-02-22 Thread Chen, Jiqian

On 2024/2/23 08:40, Stefano Stabellini wrote:
> On Fri, 12 Jan 2024, Jiqian Chen wrote:
>> If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
>> a passthrough device by using gsi, see
>> xen_pt_realize->xc_physdev_map_pirq and
>> pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
>> will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
>> is not allowed because currd is PVH dom0 and PVH has no
>> X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
>>
>> So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
>> PHYSDEVOP_unmap_pirq for the failed path to unmap pirq. And
>> add a new check to prevent self map when caller has no PIRQ
>> flag.
>>
>> Co-developed-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>>  xen/arch/x86/hvm/hypercall.c |  2 ++
>>  xen/arch/x86/physdev.c   | 22 ++
>>  2 files changed, 24 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c
>> index 6ad5b4d5f11f..493998b42ec5 100644
>> --- a/xen/arch/x86/hvm/hypercall.c
>> +++ b/xen/arch/x86/hvm/hypercall.c
>> @@ -74,6 +74,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) 
>> arg)
>>  {
>>  case PHYSDEVOP_map_pirq:
>>  case PHYSDEVOP_unmap_pirq:
>> +break;
>> +
>>  case PHYSDEVOP_eoi:
>>  case PHYSDEVOP_irq_status_query:
>>  case PHYSDEVOP_get_free_pirq:
>> diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
>> index 47c4da0af7e1..7f2422c2a483 100644
>> --- a/xen/arch/x86/physdev.c
>> +++ b/xen/arch/x86/physdev.c
>> @@ -303,11 +303,22 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  case PHYSDEVOP_map_pirq: {
>>  physdev_map_pirq_t map;
>>  struct msi_info msi;
>> +struct domain *d;
>>  
>>  ret = -EFAULT;
>>  if ( copy_from_guest(, arg, 1) != 0 )
>>  break;
>>  
>> +d = rcu_lock_domain_by_any_id(map.domid);
>> +if ( d == NULL )
>> +return -ESRCH;
>> +if ( !is_pv_domain(d) && !has_pirq(d) )
> 
> I think this could just be:
> 
> if ( !has_pirq(d) )
> 
> Right?
No. In the beginning, I only set this condition here, but it destroyed PV dom0.
Because  PV has no pirq flag too, it can match this condition and return 
-EOPNOTSUPP, PHYSDEVOP_map_pirq will fail.

> 
> 
>> +{
>> +rcu_unlock_domain(d);
>> +return -EOPNOTSUPP;
>> +}
>> +rcu_unlock_domain(d);
>> +
>>  switch ( map.type )
>>  {
>>  case MAP_PIRQ_TYPE_MSI_SEG:
>> @@ -341,11 +352,22 @@ ret_t do_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  
>>  case PHYSDEVOP_unmap_pirq: {
>>  struct physdev_unmap_pirq unmap;
>> +struct domain *d;
>>  
>>  ret = -EFAULT;
>>  if ( copy_from_guest(, arg, 1) != 0 )
>>  break;
>>  
>> +d = rcu_lock_domain_by_any_id(unmap.domid);
>> +if ( d == NULL )
>> +return -ESRCH;
>> +if ( !is_pv_domain(d) && !has_pirq(d) )
> 
> same here
> 
> 
> Other than that, everything looks fine to me
> 
> 
>> +{
>> +rcu_unlock_domain(d);
>> +return -EOPNOTSUPP;
>> +}
>> +rcu_unlock_domain(d);
>> +
>>  ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
>>  break;
>>  }
>> -- 
>> 2.34.1
>>

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v5 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

2024-02-21 Thread Chen, Jiqian

Hi Anthony,

On 2024/2/21 23:03, Anthony PERARD wrote:
> On Fri, Jan 12, 2024 at 02:13:17PM +0800, Jiqian Chen wrote:
>> diff --git a/tools/libs/ctrl/xc_domain.c b/tools/libs/ctrl/xc_domain.c
>> index f2d9d14b4d9f..448ba2c59ae1 100644
>> --- a/tools/libs/ctrl/xc_domain.c
>> +++ b/tools/libs/ctrl/xc_domain.c
>> @@ -1394,6 +1394,21 @@ int xc_domain_irq_permission(xc_interface *xch,
>>  return do_domctl(xch, );
>>  }
>>  
>> +int xc_domain_gsi_permission(xc_interface *xch,
>> + uint32_t domid,
>> + uint32_t gsi,
>> + bool allow_access)
>> +{
>> +struct xen_domctl domctl = {};
>> +
>> +domctl.cmd = XEN_DOMCTL_gsi_permission;
>> +domctl.domain = domid;
>> +domctl.u.gsi_permission.gsi = gsi;
>> +domctl.u.gsi_permission.allow_access = allow_access;
> 
> This could be written as:
> struct xen_domctl domctl = {
> .cmd = XEN_DOMCTL_gsi_permission,
> .domain = domid,
> .u.gsi_permission.gsi = gsi,
> .u.gsi_permission.allow_access = allow_access,
> };
> 
That's fine, I will change to this in next version.

> but your change is fine too.
> 
> 
>> +
>> +return do_domctl(xch, );
>> +}
>> +
>>  int xc_domain_iomem_permission(xc_interface *xch,
>> uint32_t domid,
>> unsigned long first_mfn,
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index a1c6e82631e9..4136a860a048 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1421,6 +1421,8 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
>>  uint32_t domainid = domid;
>>  bool isstubdom = libxl_is_stubdom(ctx, domid, );
>> +int gsi;
>> +bool has_gsi = true;
> 
> Why is this function has "gsi" variable, and "gsi = irq" but the next
> function pci_remove_detached() does only have "irq" variable?
Because in pci_add_dm_done, it calls " r = xc_physdev_map_pirq(ctx->xch, domid, 
irq, ); ", it passes the pointer of irq in, and then irq will be changed, 
so I need to use gsi to save the origin value.

> 
> So, gsi can only be positive integer, right? Could we forgo `has_gsi` and
> just set "gsi = -1" when there's no gsi?
No, we can't forgo 'has_gsi', because we need to set gsi = irq to save the 
original val.

> 
>>  /* Convenience aliases */
>>  bool starting = pas->starting;
>> @@ -1482,6 +1484,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  pci->bus, pci->dev, pci->func);
>>  
>>  if ( access(sysfs_path, F_OK) != 0 ) {
>> +has_gsi = false;
>>  if ( errno == ENOENT )
>>  sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", 
>> pci->domain,
>>  pci->bus, pci->dev, pci->func);
>> @@ -1497,6 +1500,7 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  goto out_no_irq;
>>  }
>>  if ((fscanf(f, "%u", ) == 1) && irq) {
>> +gsi = irq;
> 
> Why do you do this, that that mean that `gsi` and `irq` are the same? Or
> does `irq` happen to sometime contain a `gsi` value?
Above reason.

> 
>>  r = xc_physdev_map_pirq(ctx->xch, domid, irq, );
>>  if (r < 0) {
>>  LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
>> @@ -1505,7 +1509,10 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  rc = ERROR_FAIL;
>>  goto out;
>>  }
>> -r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>> +if ( has_gsi )
>> +r = xc_domain_gsi_permission(ctx->xch, domid, gsi, 1);
>> +if ( !has_gsi || r == -EOPNOTSUPP )
>> +r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
>>  if (r < 0) {
>>  LOGED(ERROR, domainid,
>>"xc_domain_irq_permission irq=%d (error=%d)", irq, r);
> 
> Looks like this error message could be wrong, I think we want to know if
> which call between xc_domain_irq_permission() and
> xc_domain_gsi_permission() has failed.
You are right, I will separate them in next version.

> 
>> @@ -2185,6 +2192,7 @@ static void pci_remove_detached(libxl__egc *egc,
>>  FILE *f;
>>  uint32_t domainid = prs->domid;
>>  bool isstubdom;
>> +bool has_gsi = true;
>>  
>>  /* Convenience aliases */
>>  libxl_device_pci *const pci = >pci;
>> @@ -2244,6 +2252,7 @@ skip_bar:
>> pci->bus, pci->dev, pci->func);
>>  
>>  if ( access(sysfs_path, F_OK) != 0 ) {
>> +has_gsi = false;
>>  if ( errno == ENOENT )
>>  sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", 
>> pci->domain,
>>  pci->bus, pci->dev, pci->func);
>> @@ -2270,7 +2279,10 @@ skip_bar:
>>   */
>>  LOGED(ERROR, domid, "xc_physdev_unmap_pirq irq=%d", irq);
>>  }
>> -rc =

Re: [RFC XEN PATCH v5 4/5] libxl: Use gsi instead of irq for mapping pirq

2024-02-21 Thread Chen, Jiqian

Hi Anthony,

On 2024/2/21 21:38, Anthony PERARD wrote:
> Hi Jiqian,
> 
> On Fri, Jan 12, 2024 at 02:13:16PM +0800, Jiqian Chen wrote:
>> In PVH dom0, it uses the linux local interrupt mechanism,
>> when it allocs irq for a gsi, it is dynamic, and follow
>> the principle of applying first, distributing first. And
>> the irq number is alloced from small to large, but the
>> applying gsi number is not, may gsi 38 comes before gsi
>> 28, that causes the irq number is not equal with the gsi
>> number. And when passthrough a device, xl wants to use
>> gsi to map pirq, see pci_add_dm_done->xc_physdev_map_pirq,
>> but the gsi number is got from file
>> /sys/bus/pci/devices//irq in current code, so it
>> will fail when mapping.
>>
>> So, use real gsi number read from gsi sysfs.
>>
>> Co-developed-by: Huang Rui 
> 
> The "Co-developed-by" tag isn't really defined in Xen, it's fine to keep
> I guess, but it mean that another tag is missing, I'm pretty sure you
> need to add "Signed-off-by: Huang Rui "
Ok, will add this in next version.

> 
> Beyond that, the patch looks good, I've only coding style issues.
There are two encoding styles in the current file, and I was also struggling 
with which one to follow when I write my code, it seems that I made the wrong 
choice.
Thank you very much for your suggestions. I will fix all coding style issues 
that you pointed out in next version.

> 
>> Signed-off-by: Jiqian Chen 
>> Reviewed-by: Stefano Stabellini 
>> ---
>>  tools/libs/light/libxl_pci.c | 25 +++--
>>  1 file changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
>> index 96cb4da0794e..a1c6e82631e9 100644
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1478,8 +1478,19 @@ static void pci_add_dm_done(libxl__egc *egc,
>>  fclose(f);
>>  if (!pci_supp_legacy_irq())
>>  goto out_no_irq;
>> -sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", pci->domain,
>> +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/gsi", pci->domain,
>>  pci->bus, pci->dev, pci->func);
>> +
>> +if ( access(sysfs_path, F_OK) != 0 ) {
> 
> So, the coding style in libxl is a bit different, first could you store
> the return value of access() in 'r', then check that value? Then
> "if (r)" will be enough, no need for "!= 0".
> 
> Second, there's no around space the condition in the if statement, so
> just "if (r)".
> 
>> +if ( errno == ENOENT )
>> +sysfs_path = GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/irq", 
>> pci->domain,
>> +pci->bus, pci->dev, pci->func);
> 
> Has the else part of this if..else is in a {}-block, could you add { }
> here, for the if part? To quote the CODING_STYLE libxl document: "To
> avoid confusion, either all the blocks in an if...else chain have
> braces, or none of them do.
> 
>> +else {
>> +LOGED(ERROR, domainid, "Can't access %s", sysfs_path);
> 
> This error message is kind of redundant, we could could simply let the code
> try fopen() on this "/gsi" path, even if we know that it's not going to
> work, as the error check on fopen() will produce the same error message.
> ;-)
> 
> And without that else part, the code could be simplified to:
> 
> /* If /gsi path doesn't exist, fallback to /irq */
> if (r && errno == ENOENT)
> sysfs_path = "/irq";
> 
> 
> 
> And those comments also apply to the changes in pci_remove_detached().
> 
> Thanks,
> 

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v5 1/5] xen/vpci: Clear all vpci status of device

2024-02-21 Thread Chen, Jiqian

Hi Stewart,

On 2024/2/10 02:02, Stewart Hildebrand wrote:
> On 1/12/24 01:13, Jiqian Chen wrote:
>> When a device has been reset on dom0 side, the vpci on Xen
>> side won't get notification, so the cached state in vpci is
>> all out of date compare with the real device state.
>> To solve that problem, add a new hypercall to clear all vpci
>> device state. When the state of device is reset on dom0 side,
>> dom0 can call this hypercall to notify vpci.
>>
>> Co-developed-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
> 
> Reviewed-by: Stewart Hildebrand 
Thanks, I will add in next version.

> 
> If you send another version, the RFC tag may be dropped.
Does only this one patch, or all patches of this series, need to drop RFC tag?

> 
> One thing to keep an eye out for below (not requesting any changes).
Thanks for reminding me, I will always keep rebasing my code from latest 
staging branch before sending new version.

> 
>> ---
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 72ef277c4f8e..c6df2c6a9561 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -107,6 +107,16 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>  
>>  return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev)
>> +{
>> +ASSERT(pcidevs_locked());
>> +ASSERT(rw_is_write_locked(>domain->pci_lock));
>> +
>> +vpci_remove_device(pdev);
>> +return vpci_add_handlers(pdev);
> 
> Note that these two functions may be renamed soon by the patch at [1].
> Whichever patch goes in later will need to be rebased to account for the
> rename.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2024-02/msg00134.html
> 
>> +}
>> +
>>  #endif /* __XEN__ */
>>  
>>  static int vpci_register_cmp(const struct vpci_register *r1,
> 

-- 
Best regards,
Jiqian Chen.

Re: [RFC KERNEL PATCH v4 3/3] PCI/sysfs: Add gsi sysfs for pci_dev

2024-01-24 Thread Chen, Jiqian

On 2024/1/24 00:02, Bjorn Helgaas wrote:
> On Tue, Jan 23, 2024 at 10:13:52AM +0000, Chen, Jiqian wrote:
>> On 2024/1/23 07:37, Bjorn Helgaas wrote:
>>> On Fri, Jan 05, 2024 at 02:22:17PM +0800, Jiqian Chen wrote:
>>>> There is a need for some scenarios to use gsi sysfs.
>>>> For example, when xen passthrough a device to dumU, it will
>>>> use gsi to map pirq, but currently userspace can't get gsi
>>>> number.
>>>> So, add gsi sysfs for that and for other potential scenarios.
>> ...
> 
>>> I don't know enough about Xen to know why it needs the GSI in
>>> userspace.  Is this passthrough brand new functionality that can't be
>>> done today because we don't expose the GSI yet?
>>
>> In Xen architecture, there is a privileged domain named Dom0 that
>> has ACPI support and is responsible for detecting and controlling
>> the hardware, also it performs privileged operations such as the
>> creation of normal (unprivileged) domains DomUs. When we give to a
>> DomU direct access to a device, we need also to route the physical
>> interrupts to the DomU. In order to do so Xen needs to setup and map
>> the interrupts appropriately.
> 
> What kernel interfaces are used for this setup and mapping?
For passthrough devices, the setup and mapping of routing physical interrupts 
to DomU are done on Xen hypervisor side, hypervisor only need userspace to 
provide the GSI info, see Xen code: xc_physdev_map_pirq require GSI and then 
will call hypercall to pass GSI into hypervisor and then hypervisor will do the 
mapping and routing, kernel doesn't do the setup and mapping.
For devices on PVH Dom0, Dom0 setups interrupts for devices as the baremetal 
Linux kernel does, through using acpi_pci_irq_enable-> acpi_register_gsi-> 
__acpi_register_gsi->acpi_register_gsi_ioapic.

-- 
Best regards,
Jiqian Chen.

Re: [RFC KERNEL PATCH v4 3/3] PCI/sysfs: Add gsi sysfs for pci_dev

2024-01-23 Thread Chen, Jiqian

On 2024/1/23 07:37, Bjorn Helgaas wrote:
> On Fri, Jan 05, 2024 at 02:22:17PM +0800, Jiqian Chen wrote:
>> There is a need for some scenarios to use gsi sysfs.
>> For example, when xen passthrough a device to dumU, it will
>> use gsi to map pirq, but currently userspace can't get gsi
>> number.
>> So, add gsi sysfs for that and for other potential scenarios.
> 
> Isn't GSI really an ACPI-specific concept?
I also added the Maintains of ACPI to get some inputs.
Hi Rafael J. Wysocki and Len Brown, do you have any suggestions about this 
patch?

> 
> I don't know enough about Xen to know why it needs the GSI in
> userspace.  Is this passthrough brand new functionality that can't be
> done today because we don't expose the GSI yet?
In Xen architecture, there is a privileged domain named Dom0 that has ACPI 
support and is responsible for detecting and controlling the hardware, also it 
performs privileged operations such as the creation of normal (unprivileged) 
domains DomUs. When we give to a DomU direct access to a device, we need also 
to route the physical interrupts to the DomU. In order to do so Xen needs to 
setup and map the interrupts appropriately. For the case of GSI interrupts, 
since Xen does not have support to get the ACPI routing info in the hypervisor 
itself, it needs to get this info from Dom0. One way would be for this info to 
be exposed in sysfs and the xen toolstack that runs in Dom0's userspace to get 
this info reading sysfs and then pass it to Xen.

And I have tried another approach in the past version patches that keeping irq 
to gsi mappings and then xen tool was consulting the map via a syscall and was 
passing the info to Xen. But it was rejected by Xen maintainers because they 
thought the mappings and translations were all Linux internal actions, and has 
nothing to do with Xen, so they suggested me to expose the GSI in sysfs because 
it is cleaner and easier to retrieve it in userspace.
This is my past version:
Kernel: 
https://lore.kernel.org/lkml/20231124103123.3263471-1-jiqian.c...@amd.com/T/#m8d20edd326cf7735c2804f0371e8a63b6beec60c
Xen: 
https://lore.kernel.org/xen-devel/20231124104136.3263722-1-jiqian.c...@amd.com/T/#m9f9068d558822af0a5b28cd241cab4d779e36974

> 
> How does userspace use the GSI?  I see "to map pirq", but I think we
> should have more concrete details about exactly what is needed and how
> it is used before adding something new in sysfs.
As above reason.

> 
> Is there some more generic kernel interface we could use
> for this?
No, there is no method for now, I think.

> 
> s/dumU/DomU/ ?  (I dunno, but https://www.google.com/search?q=xen+dumu
> suggests it :))
> 
>> Co-developed-by: Huang Rui 
>> Signed-off-by: Jiqian Chen 
>> ---
>>  drivers/acpi/pci_irq.c  |  1 +
>>  drivers/pci/pci-sysfs.c | 11 +++
>>  include/linux/pci.h |  2 ++
>>  3 files changed, 14 insertions(+)
>>
>> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
>> index 630fe0a34bc6..739a58755df2 100644
>> --- a/drivers/acpi/pci_irq.c
>> +++ b/drivers/acpi/pci_irq.c
>> @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
>>  kfree(entry);
>>  return 0;
>>  }
>> +dev->gsi = gsi;
>>  
>>  rc = acpi_register_gsi(>dev, gsi, triggering, polarity);
>>  if (rc < 0) {
>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>> index 2321fdfefd7d..c51df88d079e 100644
>> --- a/drivers/pci/pci-sysfs.c
>> +++ b/drivers/pci/pci-sysfs.c
>> @@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
>>  }
>>  static DEVICE_ATTR_RO(irq);
>>  
>> +static ssize_t gsi_show(struct device *dev,
>> +struct device_attribute *attr,
>> +char *buf)
>> +{
>> +struct pci_dev *pdev = to_pci_dev(dev);
>> +
>> +return sysfs_emit(buf, "%u\n", pdev->gsi);
>> +}
>> +static DEVICE_ATTR_RO(gsi);
>> +
>>  static ssize_t broken_parity_status_show(struct device *dev,
>>   struct device_attribute *attr,
>>   char *buf)
>> @@ -596,6 +606,7 @@ static struct attribute *pci_dev_attrs[] = {
>>  _attr_revision.attr,
>>  _attr_class.attr,
>>  _attr_irq.attr,
>> +_attr_gsi.attr,
>>  _attr_local_cpus.attr,
>>  _attr_local_cpulist.attr,
>>  _attr_modalias.attr,
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index dea043bc1e38..0618d4a87a50 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -529,6 +529,8 @@ struct pci_dev {
>>  
>>  /* These methods index pci_reset_fn_methods[] */
>>  u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
>> +
>> +unsigned intgsi;
>>  };
>>  
>>  static inline struct pci_dev *pci_physfn(struct pci_dev *dev)
>> -- 
>> 2.34.1
>>
>>

-- 
Best regards,
Jiqian Chen.

Re: [RFC KERNEL PATCH v4 3/3] PCI/sysfs: Add gsi sysfs for pci_dev

2024-01-21 Thread Chen, Jiqian

Hi Bjorn Helgaas,

Do you have any comments on this patch?

On 2024/1/5 14:22, Chen, Jiqian wrote:
> There is a need for some scenarios to use gsi sysfs.
> For example, when xen passthrough a device to dumU, it will
> use gsi to map pirq, but currently userspace can't get gsi
> number.
> So, add gsi sysfs for that and for other potential scenarios.
> 
> Co-developed-by: Huang Rui 
> Signed-off-by: Jiqian Chen 
> ---
>  drivers/acpi/pci_irq.c  |  1 +
>  drivers/pci/pci-sysfs.c | 11 +++
>  include/linux/pci.h |  2 ++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> index 630fe0a34bc6..739a58755df2 100644
> --- a/drivers/acpi/pci_irq.c
> +++ b/drivers/acpi/pci_irq.c
> @@ -449,6 +449,7 @@ int acpi_pci_irq_enable(struct pci_dev *dev)
>   kfree(entry);
>   return 0;
>   }
> + dev->gsi = gsi;
>  
>   rc = acpi_register_gsi(>dev, gsi, triggering, polarity);
>   if (rc < 0) {
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 2321fdfefd7d..c51df88d079e 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -71,6 +71,16 @@ static ssize_t irq_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(irq);
>  
> +static ssize_t gsi_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + return sysfs_emit(buf, "%u\n", pdev->gsi);
> +}
> +static DEVICE_ATTR_RO(gsi);
> +
>  static ssize_t broken_parity_status_show(struct device *dev,
>struct device_attribute *attr,
>char *buf)
> @@ -596,6 +606,7 @@ static struct attribute *pci_dev_attrs[] = {
>   _attr_revision.attr,
>   _attr_class.attr,
>   _attr_irq.attr,
> + _attr_gsi.attr,
>   _attr_local_cpus.attr,
>   _attr_local_cpulist.attr,
>   _attr_modalias.attr,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index dea043bc1e38..0618d4a87a50 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -529,6 +529,8 @@ struct pci_dev {
>  
>   /* These methods index pci_reset_fn_methods[] */
>   u8 reset_methods[PCI_NUM_RESET_METHODS]; /* In priority order */
> +
> + unsigned intgsi;
>  };
>  
>  static inline struct pci_dev *pci_physfn(struct pci_dev *dev)

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v4 4/5] domctl: Use gsi to grant/revoke irq permission

2024-01-10 Thread Chen, Jiqian

On 2024/1/11 04:09, Stewart Hildebrand wrote:
> On 1/5/24 02:09, Jiqian Chen wrote:
>> diff --git a/xen/common/domctl.c b/xen/common/domctl.c
>> index f5a71ee5f78d..eeb975bd0194 100644
>> --- a/xen/common/domctl.c
>> +++ b/xen/common/domctl.c
>> @@ -653,12 +653,20 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
>> u_domctl)
>>  unsigned int pirq = op->u.irq_permission.pirq, irq;
>>  int allow = op->u.irq_permission.allow_access;
>>  
>> -if ( pirq >= current->domain->nr_pirqs )
>> +if ( pirq >= nr_irqs_gsi )
> 
> This doesn't build on ARM, as nr_irqs_gsi is x86 only. This is a wild guess: 
> we may want keep the existing current->domain->nr_pirqs check, then add the 
> new nr_irqs_gsi check wrapped in #ifdef CONFIG_X86.
As I will add a new hypercall in next version, then this change will not exist, 
thank you.

> 
>>  {
>>  ret = -EINVAL;
>>  break;
>>  }
>> -irq = pirq_access_permitted(current->domain, pirq);
>> +
>> +if ( irq_access_permitted(current->domain, pirq) )
>> +irq = pirq;
>> +else
>> +{
>> +ret = -EPERM;
>> +break;
>> +}
>> +
>>  if ( !irq || xsm_irq_permission(XSM_HOOK, d, irq, allow) )
>>  ret = -EPERM;
>>  else if ( allow )

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v4 1/5] xen/vpci: Clear all vpci status of device

2024-01-10 Thread Chen, Jiqian

On 2024/1/10 22:09, Stewart Hildebrand wrote:
> On 1/10/24 01:24, Chen, Jiqian wrote:
>> On 2024/1/9 23:24, Stewart Hildebrand wrote:
>>> On 1/5/24 02:09, Jiqian Chen wrote:
>>>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
>>>> index 42db3e6d133c..552ccbf747cb 100644
>>>> --- a/xen/drivers/pci/physdev.c
>>>> +++ b/xen/drivers/pci/physdev.c
>>>> @@ -67,6 +68,39 @@ ret_t pci_physdev_op(int cmd, 
>>>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>>>  break;
>>>>  }
>>>>  
>>>> +case PHYSDEVOP_pci_device_state_reset: {
>>>> +struct physdev_pci_device dev;
>>>> +struct pci_dev *pdev;
>>>> +pci_sbdf_t sbdf;
>>>> +
>>>> +if ( !is_pci_passthrough_enabled() )
>>>> +return -EOPNOTSUPP;
>>>> +
>>>> +ret = -EFAULT;
>>>> +if ( copy_from_guest(, arg, 1) != 0 )
>>>> +break;
>>>> +sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
>>>> +
>>>> +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>>>> +if ( ret )
>>>> +break;
>>>> +
>>>> +pcidevs_lock();
>>>> +pdev = pci_get_pdev(NULL, sbdf);
>>>> +if ( !pdev )
>>>> +{
>>>> +pcidevs_unlock();
>>>> +ret = -ENODEV;
>>>> +break;
>>>> +}
>>>> +
>>>
>>> write_lock(>domain->pci_lock);
>>>
>>>> +ret = vpci_reset_device_state(pdev);
>>>
>>> write_unlock(>domain->pci_lock);
>> vpci_reset_device_state only reset the vpci state of pdev without deleting 
>> pdev from domain, and here has held pcidevs_lock, it has no need to lock 
>> pci_lock?
> 
> Strictly speaking, it is not enforced yet. However, an upcoming change [1] 
> will expand the scope of d->pci_lock to include protecting the pdev->vpci 
> structure to an extent, so it will be required once that change is committed. 
> In my opinion there is no harm in adding the additional lock now. If you 
> prefer to wait I would not object, but in this case I would at least ask for 
> a TODO comment to help remind us to address it later.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2024-01/msg00446.html
> 
Ok, I see. I will add pci_lock in next version, thank you for reminding me.

>>
>>>
>>>> +pcidevs_unlock();
>>>> +if ( ret )
>>>> +printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
>>>> );
>>>> +break;
>>>> +}
>>>> +
>>>>  default:
>>>>  ret = -ENOSYS;
>>>>  break;
>>>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>>>> index 72ef277c4f8e..3c64cb10ccbb 100644
>>>> --- a/xen/drivers/vpci/vpci.c
>>>> +++ b/xen/drivers/vpci/vpci.c
>>>> @@ -107,6 +107,15 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>>>  
>>>>  return rc;
>>>>  }
>>>> +
>>>> +int vpci_reset_device_state(struct pci_dev *pdev)
>>>> +{
>>>> +ASSERT(pcidevs_locked());
>>>
>>> ASSERT(rw_is_write_locked(>domain->pci_lock));
>>>
>>>> +
>>>> +vpci_remove_device(pdev);
>>>> +return vpci_add_handlers(pdev);
>>>> +}
>>>> +
>>>>  #endif /* __XEN__ */
>>>>  
>>>>  static int vpci_register_cmp(const struct vpci_register *r1,
>>

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v4 4/5] domctl: Use gsi to grant/revoke irq permission

2024-01-10 Thread Chen, Jiqian

Thank Jan and Roger, I may know how to add a new hypercall 
XEN_DOMCTL_gsi_permission, I will implement it in next version.

On 2024/1/9 18:46, Jan Beulich wrote:
> On 09.01.2024 11:16, Chen, Jiqian wrote:
>> On 2024/1/9 17:38, Jan Beulich wrote:
>>> On 09.01.2024 09:18, Chen, Jiqian wrote:
>>>> A new hypercall using for granting gsi? If so, how does the caller know to 
>>>> call which hypercall to grant permission, XEN_DOMCTL_irq_permission or 
>>>> that new hypercall?
>>>
>>> Either we add a feature indicator, or the caller simply tries the
>>> new GSI interface first.
>> I am still not sure how to use and implement it.
>> Taking pci_add_dm_done as an example, for now its implementation is:
>> pci_add_dm_done
>>  xc_physdev_map_pirq
>>  xc_domain_irq_permission(,,pirq,)
>>  XEN_DOMCTL_irq_permission
>>
>> And assume the new hypercall is XEN_DOMCTL_gsi_permission, do you mean:
>> pci_add_dm_done
>>  xc_physdev_map_pirq
>>  ret = xc_domain_gsi_permission(,,gsi,)
>>  XEN_DOMCTL_gsi_permission
>>  if ( ret != 0 )
>>  xc_domain_irq_permission(,,pirq,)
>>  XEN_DOMCTL_irq_permission
> 
> No, falling back shouldn't be "blind". Fallback should only happen
> when the new sub-op isn't implemented (hence why a feature indicator
> may be necessary), and only if calling the existing sub-op promises
> to be useful (which iirc would limit that to the PV Dom0 case).
> 
>> But if so, I have a question that in XEN_DOMCTL_gsi_permission, when to fail 
>> and when to success?
> 
> I'm afraid I don't understand the question. Behavior there isn't to
> be fundamentally different from that for XEN_DOMCTL_irq_permission.
> It's just that the incoming value is in another value space.
> 
>> Or do you mean:
>> pci_add_dm_done
>>  xc_physdev_map_pirq
>>  ret = xc_domain_irq_permission(,,pirq,)
>>  XEN_DOMCTL_irq_permission
>>  if ( ret != 0 )
>>  xc_domain_gsi_permission(,,gsi,)
>>  XEN_DOMCTL_gsi_permission
> 
> No, this looks the wrong way round.
> 
>> And in XEN_DOMCTL_gsi_permission, as long as the current domain has the 
>> access of gsi, then granting gsi to caller should be successful. Right?
> 
> I think so; see above.
> 
> Jan

-- 
Best regards,
Jiqian Chen.

Re: [RFC XEN PATCH v4 1/5] xen/vpci: Clear all vpci status of device

2024-01-09 Thread Chen, Jiqian

On 2024/1/9 23:24, Stewart Hildebrand wrote:
> On 1/5/24 02:09, Jiqian Chen wrote:
>> diff --git a/xen/drivers/pci/physdev.c b/xen/drivers/pci/physdev.c
>> index 42db3e6d133c..552ccbf747cb 100644
>> --- a/xen/drivers/pci/physdev.c
>> +++ b/xen/drivers/pci/physdev.c
>> @@ -67,6 +68,39 @@ ret_t pci_physdev_op(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  break;
>>  }
>>  
>> +case PHYSDEVOP_pci_device_state_reset: {
>> +struct physdev_pci_device dev;
>> +struct pci_dev *pdev;
>> +pci_sbdf_t sbdf;
>> +
>> +if ( !is_pci_passthrough_enabled() )
>> +return -EOPNOTSUPP;
>> +
>> +ret = -EFAULT;
>> +if ( copy_from_guest(, arg, 1) != 0 )
>> +break;
>> +sbdf = PCI_SBDF(dev.seg, dev.bus, dev.devfn);
>> +
>> +ret = xsm_resource_setup_pci(XSM_PRIV, sbdf.sbdf);
>> +if ( ret )
>> +break;
>> +
>> +pcidevs_lock();
>> +pdev = pci_get_pdev(NULL, sbdf);
>> +if ( !pdev )
>> +{
>> +pcidevs_unlock();
>> +ret = -ENODEV;
>> +break;
>> +}
>> +
> 
> write_lock(>domain->pci_lock);
> 
>> +ret = vpci_reset_device_state(pdev);
> 
> write_unlock(>domain->pci_lock);
vpci_reset_device_state only reset the vpci state of pdev without deleting pdev 
from domain, and here has held pcidevs_lock, it has no need to lock pci_lock?

> 
>> +pcidevs_unlock();
>> +if ( ret )
>> +printk(XENLOG_ERR "%pp: failed to reset PCI device state\n", 
>> );
>> +break;
>> +}
>> +
>>  default:
>>  ret = -ENOSYS;
>>  break;
>> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
>> index 72ef277c4f8e..3c64cb10ccbb 100644
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -107,6 +107,15 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>  
>>  return rc;
>>  }
>> +
>> +int vpci_reset_device_state(struct pci_dev *pdev)
>> +{
>> +ASSERT(pcidevs_locked());
> 
> ASSERT(rw_is_write_locked(>domain->pci_lock));
> 
>> +
>> +vpci_remove_device(pdev);
>> +return vpci_add_handlers(pdev);
>> +}
>> +
>>  #endif /* __XEN__ */
>>  
>>  static int vpci_register_cmp(const struct vpci_register *r1,

-- 
Best regards,
Jiqian Chen.

1 2 >

1 - 100 of 159 matches

Mail list logo