On Wed, Apr 27, 2022 at 01:58:23AM +0300, Max Gurtovoy wrote:
> Introduce the concept of a management and a managed device and add
> example of using this concept to manage resources.
> 
> A management device supports the VIRTIO_ADMIN_DEVICE_MGMT and
> VIRTIO_ADMIN_DEVICE_MGMT_ATTRS admin commands to manage some resources
> of a managed device.
> 
> A typical cloud provider SR-IOV use case is to create many VFs for use
> by guest VMs. The VFs may not be assigned to a VM until a user requests
> a VM of a certain size, e.g., number of CPUs. A VF may need MSI-X
> vectors proportional to the number of CPUs in the VM, but there is no
> standard way today in the spec to change the number of MSI-X vectors
> supported by a VF, although there are some operating systems that
> support this.
> 
> The new admin mechanism manages the MSI-X interrupt vectors assignments
> of a managed PCI device (i.e. VF) by its management devices (i.e. its
> parent PF) but can easily extended to any other generic resource
> management.
> 
> Reviewed-by: Parav Pandit <pa...@nvidia.com>
> Signed-off-by: Max Gurtovoy <mgurto...@nvidia.com>


I'd like to see msix and the concept of type 1 group
in a separate patch from MSIX.

I am not sure MSIX things are ready but the grouping part looks mostly
ok to me.

> ---
>  admin.tex        | 132 +++++++++++++++++++++++++++++++++++++++++++++--
>  content.tex      |  81 +++++++++++++++++++++++++++++
>  introduction.tex |  32 +++++++++++-
>  3 files changed, 241 insertions(+), 4 deletions(-)
> 
> diff --git a/admin.tex b/admin.tex
> index d09683d..5b54743 100644
> --- a/admin.tex
> +++ b/admin.tex
> @@ -79,12 +79,20 @@ \section{Administration command set}\label{sec:Basic 
> Facilities of a Virtio Devi
>  \hline
>  0001h   & VIRTIO_ADMIN_DEVICE_CAPS_ACCEPT    & M  \\
>  \hline
> -0002h - 7FFFh   & Generic admin cmds    & -  \\
> +0002h   & VIRTIO_ADMIN_DEVICE_MGMT    & O  \\
> +\hline
> +0003h   & VIRTIO_ADMIN_DEVICE_MGMT_ATTRS    & O  \\
> +\hline
> +0004h - 7FFFh   & Generic admin cmds    & -  \\
>  \hline
>  8000h - FFFFh   & Reserved    & - \\
>  \hline
>  \end{tabular}
>  
> +\begin{note}
> +{The following commands are mandatory for management devices: 
> VIRTIO_ADMIN_DEVICE_MGMT and VIRTIO_ADMIN_DEVICE_MGMT_ATTRS.}
> +\end{note}
> +
>  \subsection{VIRTIO ADMIN DEVICE CAPS IDENTIFY command}\label{sec:Basic 
> Facilities of a Virtio Device / Admin command set / VIRTIO ADMIN DEVICE CAPS 
> IDENTIFY command}
>  
>  The VIRTIO_ADMIN_DEVICE_CAPS_IDENTIFY command has no command specific data 
> set by the driver.
> @@ -102,13 +110,20 @@ \subsection{VIRTIO ADMIN DEVICE CAPS IDENTIFY 
> command}\label{sec:Basic Facilitie
>         le64 attrs_mask;
>         /* This field indicates which of the below admin
>          * capabilities are supported by the device:
> -        * Bits 0 - 63 - reserved for future capabilities.
> +        * Bit 0 - if set, the device is a management device
> +        * Bit 1 - if set, the device is a type 1 management device that 
> supports
> +        *         MSI-X vector mgmt of its type 1 managed devices
> +        * Bits 2 - 63 - reserved for future capabilities.
>          */
>         le64 device_admin_caps;
>         u8 reserved[112];
>  };
>  \end{lstlisting}
>  
> +\begin{note}
> +{For more details on MSI-X vector management support see section 
> \ref{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI-specific Admin 
> command set / MSI-X vector management}.}
> +\end{note}
> +
>  \subsection{VIRTIO ADMIN DEVICE CAPS ACCEPT command}\label{sec:Basic 
> Facilities of a Virtio Device / Admin command set / VIRTIO ADMIN DEVICE CAPS 
> ACCEPT command}
>  
>  The VIRTIO_ADMIN_DEVICE_CAPS_ACCEPT command is used by the driver to 
> acknowledge those admin capabilities it understands and wishes to use.
> @@ -125,13 +140,124 @@ \subsection{VIRTIO ADMIN DEVICE CAPS ACCEPT 
> command}\label{sec:Basic Facilities
>         le64 attrs_mask;
>         /* This field indicates which of the below admin
>          * capabilities are supported by the driver:
> -        * Bits 0 - 63 - reserved for future capabilities.
> +        * Bit 0 - if set, the driver accepted the device as a management 
> device
> +        * Bit 1 - if set, the driver accepted the device as a type 1 
> management device
> +        *         that supports MSI-X vector mgmt of its type 1 managed 
> devices
> +        * Bits 2 - 63 - reserved for future capabilities.
>          */
>         le64 driver_admin_caps;
>         u8 reserved[112];
>  };
>  \end{lstlisting}
>  
> +\subsection{VIRTIO ADMIN DEVICE MGMT command}\label{sec:Basic Facilities of 
> a Virtio Device / Admin command set / VIRTIO ADMIN DEVICE MGMT command}
> +
> +The VIRTIO_ADMIN_DEVICE_MGMT command is used by a management device to 
> manage resources of managed virtio devices.
> +The \field{command} is set to VIRTIO_ADMIN_DEVICE_MGMT by the driver.
> +
> +The command specific data set by the driver is of form:
> +\begin{lstlisting}
> +struct virtio_admin_device_mgmt_data {
> +        /*
> +         * 0 - reserved
> +         * 1 - assign resource to the designated vdev_id
> +         * 2 - query resource of the designated vdev_id
> +         * 3 - 255 are reserved
> +         */
> +        u8 operation;
> +        /*
> +         * 0 - MSI-X vector
> +         * 1 - 65535 are reserved
> +         */
> +        le16 resource;
> +        /*
> +         * The value to the given resource:
> +         * if resource = 0 (MSI-X vector), it's a 1-based count.
> +         */
> +        le64 resource_val;
> +        u8 reserved[5];
> +};
> +\end{lstlisting}
> +
> +The following table describes the command specific error codes codes:
> +
> +\begin{tabular}{|l|l|l|}
> +\hline
> +Opcode & Status & Description \\
> +\hline \hline
> +00h   & VIRTIO_ADMIN_CS_ERR_VDEV_IN_USE    & designated device is in use, 
> operation failed   \\
> +\hline
> +01h   & VIRTIO_ADMIN_CS_RSC_VAL_INVALID    & resource value is invalid  \\
> +\hline
> +02h   & VIRTIO_ADMIN_CS_RSC_UNSUPPORTED    & unsupported or invalid resource 
>  \\
> +\hline
> +03h   & VIRTIO_ADMIN_CS_OP_UNSUPPORTED    & unsupported or invalid operation 
>  \\
> +\hline
> +04h - FFh   & Reserved    & -  \\
> +\hline
> +\end{tabular}
> +
> +The device, upon success, returns a result that describes the information 
> according to the requested operation.
> +This result is of form:
> +\begin{lstlisting}
> +struct virtio_admin_device_mgmt_result {
> +        le64 resource_val;
> +        u8 reserved[8];
> +};
> +\end{lstlisting}
> +
> +If the requested operation by the driver was "assign resource to the 
> designated vdev_id", the device will return the resource_val of the assigned
> +resources to the designated vdev_id. Upon success, this value should be 
> equal to the \field{resource_val} of the virtio_admin_device_mgmt_data
> +structure set by the driver. In case of a failure, the value of this field 
> is undefined and will be ignored by the driver.
> +
> +If the requested operation by the driver was "query resource of the 
> designated vdev_id", the device will return resource_val of the currently 
> assigned
> +resources to the designated vdev_id upon success. In case of a failure, the 
> value of this field is undefined and will be ignored by the driver.
> +
> +\begin{note}
> +{MSI-X vector resource type is valid only for PCI devices. 
> VIRTIO_ADMIN_CS_RSC_UNSUPPORTED error is
> +returned by the device when the designated vdev_id is not a PCI device.}
> +\end{note}
> +
> +\begin{note}
> +{For this command, if driver is setting \field{resource} to MSI-X vector 
> type, the \field{vdev_id} can't be associated with a Virtual Function with
> +VF index greater than NumVFs value as defined in the PCI specification or 
> smaller than 1. An error is returned by the device when \field{vdev_id} is 
> out of the range.}
> +\end{note}
> +
> +\subsection{VIRTIO ADMIN DEVICE MGMT ATTRS command}\label{sec:Basic 
> Facilities of a Virtio Device / Admin command set / VIRTIO ADMIN DEVICE MGMT 
> ATTRS command}
> +
> +The VIRTIO_ADMIN_DEVICE_MGMT_ATTRS command has no command specific data set 
> by the driver.
> +The \field{command} is set to VIRTIO_ADMIN_DEVICE_MGMT_ATTRS.
> +
> +The device, upon success, returns a result that describes the management 
> device attributes.
> +This result is of form:
> +\begin{lstlisting}
> +struct virtio_admin_device_mgmt_attrs_result {
> +        /* Indicates which of the below fields were returned
> +         * (1 means that field was returned):
> +         * Bit 0 - vfs_total_msix_count
> +         * Bit 1 - vfs_assigned_msix_count
> +         * Bit 2 - per_vf_max_msix_count
> +         * Bits 3 - 63 - reserved for future fields
> +         */
> +        le64 attrs_mask;
> +
> +        /* Total number of msix vectors for the total number of VFs */
> +        le32 vfs_total_msix_count;
> +        /* Assigned number of msix vectors for the enabled VFs */
> +        le32 vfs_assigned_msix_count;
> +        /* Max number of msix vectors that can be assigned for a single VF */
> +        le16 per_vf_max_msix_count;
> +
> +        u8 reserved[110];
> +};
> +\end{lstlisting}
> +
> +\begin{note}
> +{The \field{vfs_total_msix_count}, \field{vfs_assigned_msix_count} and 
> \field{per_vf_max_msix_count} returned by the device if the
> +designated vdev_id is a management device that can allocate/deallocate MSI-X 
> resources for PCI VFs devices. Otherwise,
> +the associated bits in \field{attrs_mask} are zeroed by the device.}
> +\end{note}
> +
>  \section{Admin Virtqueues}\label{sec:Basic Facilities of a Virtio Device / 
> Admin Virtqueues}
>  
>  An admin virtqueue is a management interface of a device that can be used to 
> send administrative
> diff --git a/content.tex b/content.tex
> index 0c1d44f..81e5850 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -451,6 +451,18 @@ \section{Exporting Objects}\label{sec:Basic Facilities 
> of a Virtio Device / Expo
>  
>  \input{admin.tex}
>  
> +\section{Device management}\label{sec:Basic Facilities of a Virtio Device / 
> Device management}
> +
> +A device group might consist of one or more virtio devices. For example, 
> virtio PCI SR-IOV PF and its VFs compose a type 1 device group.
> +A capable PCI SR-IOV PF virtio device might act as the management device in 
> this group, and its PCI SR-IOV VFs are the managed devices.
> +A management device might have various management capabilities and 
> attributes to manage its managed devices.

This makes my eyes glaze over.
Please, find all instances which say "manage" more than once and
rephrase.

> The capabilities exposed
> +in the result of VIRTIO_ADMIN_DEVICE_CAPS_IDENTIFY command (see section 
> \ref{sec:Basic Facilities of a Virtio Device / Admin command set / VIRTIO 
> ADMIN DEVICE CAPS IDENTIFY command}
> +for more details) and the attributes exposed in the result of 
> VIRTIO_ADMIN_DEVICE_MGMT_ATTRS command
> +(see section \ref{sec:Basic Facilities of a Virtio Device / Admin command 
> set / VIRTIO ADMIN DEVICE MGMT ATTRS command} for more details).
> +
> +The management device will use the VIRTIO_ADMIN_DEVICE_MGMT admin command to 
> manage its managed devices (see section
> +\ref{sec:Basic Facilities of a Virtio Device / Admin command set / VIRTIO 
> ADMIN DEVICE MGMT command} for more details).
> +
>  \chapter{General Initialization And Device Operation}\label{sec:General 
> Initialization And Device Operation}
>  
>  We start with an overview of device initialization, then expand on the
> @@ -1763,6 +1775,75 @@ \subsubsection{Driver Handling 
> Interrupts}\label{sec:Virtio Transport Options /
>      \end{itemize}
>  \end{itemize}
>  
> +\subsection{PCI-specific Admin capabilities}\label{sec:Virtio Transport 
> Options / Virtio Over PCI Bus / PCI-specific Admin capabilities}
> +
> +This documents the group of admin capabilities for PCI virtio devices. Each 
> capability is
> +implemented using one or more Admin commands.
> +
> +\subsubsection{MSI-X vector management}\label{sec:Virtio Transport Options / 
> Virtio Over PCI Bus / PCI-specific Admin command set / MSI-X vector 
> management}
> +
> +This capability enables a virtio management device to control the assignment 
> of MSI-X interrupt vectors
> +for its managed devices. In PCI, a management device can be the PF device 
> and the managed device can be the VF (for example in a type 1 device group).
> +Capable management devices will need to implement VIRTIO_ADMIN_DEVICE_MGMT 
> and VIRTIO_ADMIN_DEVICE_MGMT_ATTRS admin commands, report the MSI-X 
> attributes in the result of
> +VIRTIO_ADMIN_DEVICE_MGMT_ATTRS and report that MSI-X vector resource 
> management is supported in the result of VIRTIO_ADMIN_DEVICE_CAPS_IDENTIFY 
> admin command.
> +See sections \ref{sec:Basic Facilities of a Virtio Device / Admin command 
> set / VIRTIO ADMIN DEVICE CAPS IDENTIFY command} and
> +\ref{sec:Basic Facilities of a Virtio Device / Admin command set / VIRTIO 
> ADMIN DEVICE MGMT ATTRS command} for more details.
> +
> +In the result of VIRTIO_ADMIN_DEVICE_MGMT_ATTRS admin command, a capable 
> management device will return the total number of
> +msix vectors for its VFs in \field{vfs_total_msix_count} field, the number 
> of already assigned msix vectors for its VFs in
> +\field{vfs_assigned_msix_count} field and also the maximal number of msix 
> vectors that can be assigned for a single VF in
> +\field{per_vf_max_msix_count} field. In addition, bit 0, bit 1 and bit 2 are 
> set to indicate on the validity of the other 3
> +fields in the \field{attrs_mask} field of the result buffer.
> +See section \ref{sec:Basic Facilities of a Virtio Device / Admin command set 
> / VIRTIO ADMIN DEVICE MGMT ATTRS command} for more details.
> +
> +The default assignment of the MSI-X vectors for managed devices is out of 
> the scope of this specification.
> +A driver, using VIRTIO_ADMIN_DEVICE_MGMT can update the MSI-X assignment for 
> a specific managed device.
> +In the data of VIRTIO_ADMIN_DEVICE_MGMT admin command, a driver set the 
> \field{resource} type to be MSI-X vector and the
> +amount of MSI-X interrupt vectors to configure to the designated managed 
> device in \field{resource_val}. The managed device id is set to 
> \field{vdev_id} field.
> +
> +A successful operation guarantees that the requested amount of MSI-X 
> interrupt vectors was assigned to the designated device.
> +This value is also returned in the virtio_admin_device_mgmt_result structure.
> +Also, a successful operation guarantees that the MSI-X capability access by 
> the designated PCI device defined by the PCI specification must reflect
> +the new configuration in all relevant fields. For example, by default if the 
> PCI VF has been assigned 4 MSI-X vectors, and VIRTIO_ADMIN_DEVICE_MGMT
> +increases the MSI-X vectors to 8. On this change, reading Table size field 
> of the MSI-X message control register will reflect a value of 7.
> +
> +It is beyond the scope of the virtio specification to define
> necessary synchronization in system software to ensure that a virtio
> PCI VF device +interrupt configuration modification is reflected in
> the PCI device.

IMHO it is very much in scope of the specification. The scope of the
specification is to allow device interoperability and this very much
fits the bill.

> However, it is expected that any modern system software implementing
> virtio +drivers and PCI subsystem will ensure that any changes
> occurring in the VF interrupt configuration is either updated in the
> PCI VF device or +such configuration fails.

OK. Anything more? What exactly does "interrupt configuration" mean here?

> For example, one way to
> implement that is to make sure that there is no driver bounded to the
> virtio PCI SR-IOV VF during +this operation.

bounded in what sense?

And why do you say VF? Is this command limited to type 1? You only
limit it to PCI above.

same elsewhere

> +
> +To query amount of MSI-X interrupt vectors that is currently assigned to a 
> managed device, the driver issue VIRTIO_ADMIN_DEVICE_MGMT with 
> \field{operation} set to

issues

lots of grammar error like this elsewhere, pls find and correct.

> +"query resource of the designated vdev_id" value (== 2). The driver also set 
> the \field{resource} type to be MSI-X vector and the managed device id is set 
> to \field{vdev_id}
> +field. In the result of a successful operation,

meaning "in case"?

> the amount of MSI-X interrupt vectors that is currently assigned to the 
> designated managed device is
> +returned by the device in \field{resource_val} field of the 
> virtio_admin_device_mgmt_result structure.
> +See section \ref{sec:Basic Facilities of a Virtio Device / Admin command set 
> / VIRTIO ADMIN DEVICE MGMT command} for more details.
> +
> +\paragraph{MSI-X configuration sequence example}\label{sec:Virtio Transport 
> Options / Virtio Over PCI Bus / PCI-specific Admin command set / VF MSI-X 
> control / MSI-X configuration sequence example }
> +
> +A typical sequence for configuring MSI-X vectors for PCI VFs using MSI-X 
> vector management mechanism is following:

rephrase to simplify

The driver uses the following sequence for configuring MSI-X vectors
....



> +
> +\begin{enumerate}
> +\item Ensure that VF driver doesn't run and it is safe to change MSI-X (e.g. 
> disable sriov auto probing)
> +
> +\item Load the PF driver
> +
> +\item Enable SR-IOV by following the PCI specification
> +
> +\item Query the management device capabilities using commands 
> VIRTIO_ADMIN_DEVICE_IDENTIFY and VIRTIO_ADMIN_DEVICE_MGMT_ATTRS
> +
> +\item Find the managed VF vdev_id (for type 1 device group the vdev_id of 
> PCI VF is equal to vf number)
> +
> +\item Query the VF MSI-X configuration using command 
> VIRTIO_ADMIN_DEVICE_MGMT (query operation)
> +
> +\item Assign desired MSI-X configuration for the VF using command 
> VIRTIO_ADMIN_DEVICE_MGMT (assign operation)
> +
> +\item After successful completion of the assignment, load the VF driver
> +
> +\item Assign the VF to a VM
> +
> +\end{enumerate}
> +
>  \section{Virtio Over MMIO}\label{sec:Virtio Transport Options / Virtio Over 
> MMIO}
>  
>  Virtual environments without PCI support (a common situation in
> diff --git a/introduction.tex b/introduction.tex
> index 4358ab1..bfc5498 100644
> --- a/introduction.tex
> +++ b/introduction.tex
> @@ -164,9 +164,39 @@ \subsection{Device group}\label{sec:Introduction / 
> Terminology / Device group}
>  For now, the supported device groups are:
>  \begin{enumerate}
>  \item Type 1 - A virtio PCI SR-IOV physical function (PF) and its PCI SR-IOV 
> virtual functions (VFs). For this group type, the PF device has vdev_id that 
> is equal to 0
> -and the VF devices have vdev_id's that are equal to their vf_number 
> (according to the PCI SR-IOV specification).
> +and the VF devices have vdev_id's that are equal to their vf_number 
> (according to the PCI SR-IOV specification). A PCI SR-IOV PF device can act 
> as a management device for
> +type 1 group. A PCI SR-IOV VF device can act as a managed device for type 1 
> group (see \ref{sec:Introduction / Terminology / Virtio management device} and
> +\ref{sec:Introduction / Terminology / Virtio managed device} for more 
> information).
>  \end{enumerate}
>  
> +\subsection{Virtio management device}\label{sec:Introduction / Terminology / 
> Virtio management device}
> +
> +A virtio device that supports VIRTIO_ADMIN_DEVICE_MGMT and 
> VIRTIO_ADMIN_DEVICE_MGMT_ATTRS admin commands (see
> +\ref{sec:Basic Facilities of a Virtio Device / Admin command set / VIRTIO 
> ADMIN DEVICE MGMT command} and
> +\ref{sec:Basic Facilities of a Virtio Device / Admin command set / VIRTIO 
> ADMIN DEVICE MGMT ATTRS command} for more information).
> +This device can manage a virtio managed device. A device group may contain 
> zero or more management devices.
> +
> +A PCI SR-IOV Physical Function based virtio device is an example of a 
> possible virtio management device (for type 1 device group).
> +
> +\subsection{Virtio type 1 management device}\label{sec:Introduction / 
> Terminology / Virtio type 1 management device}
> +
> +A virtio management device for type 1 device group. This device is a PCI 
> SR-IOV PF that can set \field{dst_type} to 1 (other virtio device in the same 
> device group),
> +and set \field{vdev_id} to an id that corresponds with one of its managed 
> virtio devices (PCI SR-IOV VFs) for the VIRTIO_ADMIN_DEVICE_MGMT admin 
> command.
> +
> +A type 1 device group may contain zero or one management devices.
> +
> +\subsection{virtio managed device}\label{sec:Introduction / Terminology / 
> Virtio managed device}
> +
> +A virtio device that can be managed by a virtio management device.
> +A device group may contain zero or more managed devices.
> +
> +A PCI SR-IOV Virtual Function based virtio device is an example of a 
> possible virtio managed device (for type 1 group).
> +
> +\subsection{virtio type 1 managed device}\label{sec:Introduction / 
> Terminology / Virtio type 1 managed device}
> +
> +A virtio managed device for type 1 device group. This device is a PCI SR-IOV 
> VF and is managed by a virtio type 1 management device (virtio PCI SR-IOV PF).
> +It is implied that all the virtio PCI SR-IOV VFs related to a virtio PCI 
> SR-IOV PF that is virtio type 1 management device are type 1 managed devices.
> +
>  \section{Structure Specifications}\label{sec:Structure Specifications}
>  
>  Many device and driver in-memory structure layouts are documented using
> -- 
> 2.21.0


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org

Reply via email to