> From: Yishai Hadas <[email protected]>
> Sent: Wednesday, December 13, 2023 8:25 PM
>
> On 13/12/2023 10:23, Tian, Kevin wrote:
> >> From: Yishai Hadas <[email protected]>
> >> Sent: Thursday, December 7, 2023 6:28 PM
> >>
> >> +
> >> +static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev,
> >> + char __user *buf, size_t count,
> >> + loff_t *ppos)
> >> +{
> >> + struct virtiovf_pci_core_device *virtvdev = container_of(
> >> + core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> + size_t register_offset;
> >> + loff_t copy_offset;
> >> + size_t copy_count;
> >> + __le32 val32;
> >> + __le16 val16;
> >> + u8 val8;
> >> + int ret;
> >> +
> >> + ret = vfio_pci_core_read(core_vdev, buf, count, ppos);
> >> + if (ret < 0)
> >> + return ret;
> >> +
> >> + if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16),
> >> + ©_offset, ©_count,
> >> ®ister_offset)) {
> >> + val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET);
> >> + if (copy_to_user(buf + copy_offset, (void *)&val16 +
> >> register_offset, copy_count))
> >> + return -EFAULT;
> >> + }
> >> +
> >> + if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) &&
> >> + range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16),
> >> + ©_offset, ©_count,
> >> ®ister_offset)) {
> >> + if (copy_from_user((void *)&val16 + register_offset, buf +
> >> copy_offset,
> >> + copy_count))
> >> + return -EFAULT;
> >> + val16 |= cpu_to_le16(PCI_COMMAND_IO);
> >> + if (copy_to_user(buf + copy_offset, (void *)&val16 +
> >> register_offset,
> >> + copy_count))
> >> + return -EFAULT;
> >> + }
> >
> > the write handler calls vfio_pci_core_write() for PCI_COMMAND so
> > the core vconfig should have the latest copy of the IO bit value which
> > is copied to the user buffer by vfio_pci_core_read(). then not necessary
> > to update it again.
>
> You assume the the 'vconfig' mechanism/flow is always applicable for
> that specific field, this should be double-checked.
> However, as for now the driver doesn't rely / use the vconfig for other
> fields as it doesn't match and need a big refactor, I prefer to not rely
> on it at all and have it here.
iiuc this driver does relies on vconfig for other fields. It first calls
vfio_pci_core_read() and then modifies selected fields which needs
special tweak in this driver.
btw what virtio-net specific tweak is applied to PCI_COMMAND? You
basically cache the cmd value and then set PCI_COMMAND_IO if
it's set in the cached value. But this is already covered by pci
vconfig. If anything is broken there then we already have a big
trouble.
The trick for bar0 makes sense as it doesn't exist physically then
the vconfig mechanism may not give the expected result. But
I didn't see the same rationale for PCI_COMMAND.
> >> +
> >> +static ssize_t
> >> +virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user
> >> *buf,
> >> + size_t count, loff_t *ppos)
> >> +{
> >> + struct virtiovf_pci_core_device *virtvdev = container_of(
> >> + core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
> >> + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
> >> +
> >> + if (!count)
> >> + return 0;
> >> +
> >> + if (index == VFIO_PCI_CONFIG_REGION_INDEX) {
> >> + size_t register_offset;
> >> + loff_t copy_offset;
> >> + size_t copy_count;
> >> +
> >> + if (range_intersect_range(pos, count, PCI_COMMAND,
> >> sizeof(virtvdev->pci_cmd),
> >> + ©_offset, ©_count,
> >> + ®ister_offset)) {
> >> + if (copy_from_user((void *)&virtvdev->pci_cmd +
> >> register_offset,
> >> + buf + copy_offset,
> >> + copy_count))
> >> + return -EFAULT;
> >> + }
> >> +
> >> + if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0,
> >> + sizeof(virtvdev->pci_base_addr_0),
> >> + ©_offset, ©_count,
> >> + ®ister_offset)) {
> >> + if (copy_from_user((void *)&virtvdev-
> >>> pci_base_addr_0 + register_offset,
> >> + buf + copy_offset,
> >> + copy_count))
> >> + return -EFAULT;
> >> + }
> >> + }
> >
> > wrap above into virtiovf_pci_write_config() to be symmetric with
> > the read path.
>
> Upon the read path, we do the full flow and return to the user. Here we
> just save some data and continue to call the core, so I'm not sure that
> this worth a dedicated function.
I don't see a real difference.
for the read path you first call vfio_pci_core_read() then modifies some
fields.
for the write path you save some data then call vfio_pci_core_write().
>
> However, this can be done, do you still suggest it for V8 ?
yes
> >> +static int virtiovf_pci_init_device(struct vfio_device *core_vdev)
> >> +{
> >> + struct virtiovf_pci_core_device *virtvdev = container_of(
> >> + core_vdev, struct virtiovf_pci_core_device, core_device.vdev);
> >> + struct pci_dev *pdev;
> >> + int ret;
> >> +
> >> + ret = vfio_pci_core_init_dev(core_vdev);
> >> + if (ret)
> >> + return ret;
> >> +
> >> + pdev = virtvdev->core_device.pdev;
> >> + ret = virtiovf_read_notify_info(virtvdev);
> >> + if (ret)
> >> + return ret;
> >> +
> >> + /* Being ready with a buffer that supports MSIX */
> >> + virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) +
> >> + virtiovf_get_device_config_size(pdev-
> >>> device);
> >
> > which code is relevant to MSIX?
>
> The buffer size must include the MSIX part to match the virtio-net
> specification.
>
> As part of virtiovf_issue_legacy_rw_cmd() we may use the full buffer
> upon read/write.
at least mention that MSIX is in the device config region otherwise
it's not helpful to people w/o virtio background.
> >> +
> >> +static const struct vfio_device_ops virtiovf_vfio_pci_ops = {
> >> + .name = "virtio-vfio-pci",
> >> + .init = vfio_pci_core_init_dev,
> >> + .release = vfio_pci_core_release_dev,
> >> + .open_device = virtiovf_pci_open_device,
> >
> > could be vfio_pci_core_open_device(). Given virtiovf specific init func
> > is not called virtiovf_pci_open_device() is essentially same as the
> > core func.
>
> We don't have today vfio_pci_core_open_device() as an exported function.
>
> The virtiovf_pci_open_device() matches both cases so I don't see a real
> reason to export it now.
>
> By the way, it follows other drivers in vfio, see for example here [1].
>
> [1]
> https://elixir.bootlin.com/linux/v6.7-
> rc5/source/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c#L1383
ah, yes.
> >> +
> >> +static int virtiovf_pci_probe(struct pci_dev *pdev,
> >> + const struct pci_device_id *id)
> >> +{
> >> + const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops;
> >> + struct virtiovf_pci_core_device *virtvdev;
> >> + int ret;
> >> +
> >> + if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) &&
> >> + !virtiovf_bar0_exists(pdev))
> >> + ops = &virtiovf_vfio_pci_tran_ops;
> >
> > I have a confusion here.
> >
> > why do we want to allow this driver binding to non-matching VF or
> > even PF?
>
> The intention is to allow the binding of any virtio-net device (i.e. PF,
> VF which is not transitional capable) to have a single driver over VFIO
> for all virtio-net devices.
>
> This enables any user space application to bind and use any virtio-net
> device without the need to care.
>
> In case the device is not transitional capable, it will simply use the
> generic vfio functionality.
Is it useful to print a message here?
>
> >
> > if that is the intention then the naming/description should be adjusted
> > to not specific to vf throughout this patch.
> >
> > e.g. don't use "virtiovf_" prefix...
>
> The main functionality is to supply the transitional device to user
> space for the VF, hence the prefix and the description for that driver
> refers to VF.
>
> Let's stay with that.
ok
>
> >
> > the config option is generic:
> >
> > +config VIRTIO_VFIO_PCI
> > + tristate "VFIO support for VIRTIO NET PCI devices"
> >
> > but the description is specific to vf:
> >
> > + This provides support for exposing VIRTIO NET VF devices which
> support
> > + legacy IO access, using the VFIO framework that can work with a
> legacy
> > + virtio driver in the guest.
> >
> > then the module description is generic again:
> >
> > +MODULE_DESCRIPTION(
> > + "VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");
> >
>
> Yes, as the binding allows that, it looks fine to me.
>
what about revising the kconfig message to make it clear that it's
for all virtio-net device with special trick to make VF as a
transitional device?