On Sun, Aug 28 2022, Dmitry Fomichev <dmitry.fomic...@wdc.com> wrote:
> Introduce support for Zoned Block Devices to virtio. > > Zoned Block Devices (ZBDs) aim to achieve a better capacity, latency > and/or cost characteristics compared to commonly available block > devices by getting the entire LBA space of the device divided to block > regions that are much larger than the LBA size. These regions are > called zones and they can only be written sequentially. More details > about ZBDs can be found at > > https://zonedstorage.io/docs/introduction/zoned-storage . > > In its current form, the virtio protocol for block devices (virtio-blk) > is not aware of ZBDs but it allows the driver to successfully scan a > host-managed drive provided by the virtio block device. As the result, > the host-managed drive is recognized by virtio driver as a regular, > non-zoned drive that will operate erroneously under the most common > write workloads. Host-aware ZBDs are currently usable, but their > performance may not be optimal because the driver can only see them as > non-zoned block devices. > > To fix this, the virtio-blk protocol needs to be extended to add the > capabilities to convey the zone characteristics of ZBDs at the device > side to the driver and to provide support for ZBD-specific commands - > Report Zones, four zone operations (Open, Close, Finish and Reset) and > (optionally) Zone Append. The proposed standard extension aims to > define this new functionality. > > This patch extends the virtio-blk section of virtio specification with > the minimum set of requirements that are necessary to support ZBDs. > The resulting device model is a subset of the models defined in ZAC/ZBC > and ZNS standards documents. The included functionality mirrors > the existing Linux kernel block layer ZBD support and should be > sufficient to handle the host-managed and host-aware HDDs that are on > the market today as well as ZNS SSDs that are entering the market at > the time of submission of this patch. > > I would like to thank the following people for their useful feedback > and suggestions while working on the initial iterations of this patch. > > Damien Le Moal <damien.lem...@opensource.wdc.com> > Matias Bjørling <matias.bjorl...@wdc.com> > Niklas Cassel <niklas.cas...@wdc.com> > Hans Holmberg <hans.holmb...@wdc.com> > > Fixes: https://github.com/oasis-tcs/virtio-spec/issues/143 > Signed-off-by: Dmitry Fomichev <dmitry.fomic...@wdc.com> (...) I'm commenting from a more-or-less pure spec point of view; I'll leave any block-related details to others more familiar with the matter. > @@ -4589,6 +4596,75 @@ \subsection{Device configuration > layout}\label{sec:Device Types / Block Device / > \field{max_secure_erase_sectors} \field{secure_erase_sector_alignment} are > expressed > in 512-byte units if the VIRTIO_BLK_F_SECURE_ERASE feature bit is negotiated. > > +If the VIRTIO_BLK_F_ZONED feature is negotiated, then in > +\field{virtio_blk_zoned_characteristics}, > +\begin{itemize} > +\item \field{zone_sectors} value is expressed in 512-byte sectors. > +\item \field{max_append_sectors} value is expressed in 512-byte sectors. > +\item \field{write_granularity} value is expressed in bytes. > +\end{itemize} > + > +The \field{model} field in \field{zoned} may have the following values: > +VIRTIO_BLK_Z_NONE(0), VIRTIO_BLK_Z_HM(1) and VIRTIO_BLK_Z_HA(2). I think you can drop that line, as you define the values below. > + > +\begin{lstlisting} > +#define VIRTIO_BLK_Z_NONE 0 > +#define VIRTIO_BLK_Z_HM 1 > +#define VIRTIO_BLK_Z_HA 2 > +\end{lstlisting} > + (...) > @@ -4701,6 +4790,30 @@ \subsection{Device Initialization}\label{sec:Device > Types / Block Device / Devic > The driver MUST NOT read \field{writeback} before setting > the FEATURES_OK \field{device status} bit. > > +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_ZONED feature if they are incapable > +of supporting devices with the VIRTIO_BLK_Z_HM or VIRTIO_BLK_Z_HA zoned > model. > + > +If the VIRTIO_BLK_F_ZONED feature is offered by the device with the > +VIRTIO_BLK_Z_HM zone model, then > +\begin{itemize} > +\item the VIRTIO_BLK_F_DISCARD feature must not be offered by the driver. s/must not/MUST NOT/ Also, s/driver/device/, I guess? And it needs to go into the device normative section? > + > +\item if the driver sets both VIRTIO_BLK_F_DISCARD and VIRTIO_BLK_F_ZONED > + feature bits, the device must fail by not setting FEATURES_OK bit. Hm, shouldn't the device fail already if the driver wants to negotiate a feature that it had not offered in the first place? (Also, s/must/MUST/ if we decide to keep it, and needs to go into the device normative section. The correct terminology would be "the device MUST fail setting the FEATURES_OK bit.") > +\end{itemize} Is it ok for the device to offer both F_DISCARD and F_ZONED with Z_HA or Z_NONE? In that case, is it ok for the driver to negotiate both F_DISCARD and F_ZONED, or does it need to pick at most one of them? > + > +If the VIRTIO_BLK_F_ZONED feature is negotiated, then > +\begin{itemize} > +\item if the driver that can not support host-managed zoned devices > + reads VIRTIO_BLK_Z_HM from the \field{model} field of \field{zoned}, the > + driver MUST NOT set FEATURES_OK flag and instead set the FAILED bit. [I'll skip commenting on "negotiated" vs "offered", as we need to clarify that comprehensively in the whole document.] Could the driver simply reject F_ZONED in that case? The device can always fail setting FEATURES_OK if the driver does not accept a feature that is needed. Maybe make this a "SHOULD NOT"? > + > +\item if the driver that can not support support zoned devices reads > + VIRTIO_BLK_Z_HA from the \field{model} field of \field{zoned}, the driver > + MAY handle the device as a non-zoned device. In this case, the > + driver SHOULD ignore all other fields in \field{zoned}. I would expect a driver that does not support zoned devices at all to always reject F_ZONED (including older drivers). Should we mandate that the driver does not accept F_ZONED if it cannot handle it? > +\end{itemize} > + > \devicenormative{\subsubsection}{Device Initialization}{Device Types / Block > Device / Device Initialization} > > Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it > @@ -4712,6 +4825,74 @@ \subsection{Device Initialization}\label{sec:Device > Types / Block Device / Devic > The device MUST initialize padding bytes \field{unused0} and > \field{unused1} to 0. > > +If the device that is being initialized is a not a zoned device, the device > +SHOULD NOT offer the VIRTIO_BLK_F_ZONED feature. > + > +If the VIRTIO_BLK_F_ZONED bit is not set by the driver, If it is not accepted, I guess? > +\begin{itemize} > +\item the device with the VIRTIO_BLK_Z_HA zone model SHOULD proceed with the > + initialization while setting all zoned characteristics fields to zero. > + > +\item the device with the VIRTIO_BLK_Z_HM zone model MUST fail to set the > + FEATURES_OK device status bit when the driver writes the Device Status > + field. > +\end{itemize} > + > +If the VIRTIO_BLK_F_ZONED feature is negotiated, then the \field{model} > field in > +\field{zoned} struct in the configuration space MUST be set by the device > +\begin{itemize} > +\item to the value of VIRTIO_BLK_Z_NONE if it operates as a drive-managed > + zoned block device or a non-zoned block device. > + > +\item to the value of VIRTIO_BLK_Z_HM if it operates as a host-managed zoned > + block device. > + > +\item to the value of VIRTIO_BLK_Z_HA if it operates as a host-aware zoned > + block device. > +\end{itemize} > + > +If the VIRTIO_BLK_F_ZONED feature is negotiated, > +\begin{itemize} > +\item the \field{zone_sectors} field of \field{zoned} MUST be set by the > device > + to the size of a single zone on the device. All zones of the device have > the > + same size indicated by \field{zone_sectors} except for the last zone that > + MAY be smaller than all other zones. The driver can calculate the number > of > + zones on the device as > + \begin{lstlisting} > + nr_zones = (capacity + zone_sectors - 1) / zone_sectors; > + \end{lstlisting} > + and the size of the last zone as > + \begin{lstlisting} > + zs_last = capacity - (nr_zones - 1) * zone_sectors; > + \end{lstlisting} > + > +\item The \field{max_open_zones} field of the \field{zoned} structure MUST be > + set by the device to the maximum number of zones that can be open on the > + device (zones in the implicit open or explicit open state). A value > + of zero indicates that the device does not have any limit on the number > of > + open zones. > + > +\item The \field{max_active_zones} field of the \field{zoned} structure MUST > + be set by the device to the maximum number zones that can be active on > the > + device (zones in the implicit open, explicit open or closed state). A > value > + of zero indicates that the device does not have any limit on the number > of > + active zones. > + > +\item the \field{max_append_sectors} field of \field{zoned} MUST be set by > + the device to the maximum data size of a VIRTIO_BLK_T_ZONE_APPEND request > + that can be successfully issued to the device. The value of this field > MUST > + NOT exceed the \field{seg_max} * \field{size_max} value. A device MAY set > + the \field{max_append_sectors} to zero if it doesn't support > + VIRTIO_BLK_T_ZONE_APPEND requests. > + > +\item the \field{write_granularity} field of \field{zoned} MUST be set by the > + device to the offset and size alignment constraint for VIRTIO_BLK_T_OUT > + and VIRTIO_BLK_T_ZONE_APPEND requests issued to a sequential zone of the > + device. > + > +\item the device MUST initialize padding bytes \field{unused2} to 0. > +\end{itemize} As this depends on FEATURES_OK a lot, I guess that F_ZONED cannot be implemented for legacy devices. Do we need to be explicit about that? > + > \subsubsection{Legacy Interface: Device Initialization}\label{sec:Device > Types / Block Device / Device Initialization / Legacy Interface: Device > Initialization} > > Because legacy devices do not have FEATURES_OK, transitional devices (...) > +The zone descriptor field \field{z_type} \field{virtio_blk_zone_descriptor} > +indicates the type of the zone. The available types are > VIRTIO_BLK_ZT_CONV(1), > +VIRTIO_BLK_ZT_SWR(2) or VIRTIO_BLK_ZT_SWP(3). "The following types are available:" ? > + > +\begin{lstlisting} > +#define VIRTIO_BLK_ZT_CONV 1 > +#define VIRTIO_BLK_ZT_SWR 2 > +#define VIRTIO_BLK_ZT_SWP 3 > +\end{lstlisting} > + > +Read and write operations into zones with the VIRTIO_BLK_ZT_CONV > (Conventional) > +type have the same behavior as read and write operations on a regular block > +device. Any block in a conventional zone can be read or written at any time > and > +in any order. > + > +Zones with VIRTIO_BLK_ZT_SWR can be read randomly, but must be written > +sequentially at a certain point in the zone called the Write Pointer (WP). > With > +every write, the Write Pointer is incremented by the number of sectors > written. > + > +Zones with VIRTIO_BLK_ZT_SWP can be read randomly and should be written > +sequentially, similarly to SWR zones. However, SWP zones can accept random > write > +operations, that is, VIRTIO_BLK_T_OUT requests with a start sector different > +from the zone write pointer position. > + > +The field \field{z_state} of \field{virtio_blk_zone_descriptor} indicates the > +state of the device zone. The available zone states are > VIRTIO_BLK_ZS_NOT_WP(0), > +VIRTIO_BLK_ZS_EMPTY(1), VIRTIO_BLK_ZS_IOPEN(2), VIRTIO_BLK_ZS_EOPEN(3), > +VIRTIO_BLK_ZS_CLOSED(4), VIRTIO_BLK_ZS_RDONLY(13), VIRTIO_BLK_ZS_FULL(14) and > +VIRTIO_BLK_ZS_OFFLINE(15). "The following zone states are available:" ? > + > +\begin{lstlisting} > +#define VIRTIO_BLK_ZS_NOT_WP 0 > +#define VIRTIO_BLK_ZS_EMPTY 1 > +#define VIRTIO_BLK_ZS_IOPEN 2 > +#define VIRTIO_BLK_ZS_EOPEN 3 > +#define VIRTIO_BLK_ZS_CLOSED 4 > +#define VIRTIO_BLK_ZS_RDONLY 13 > +#define VIRTIO_BLK_ZS_FULL 14 > +#define VIRTIO_BLK_ZS_OFFLINE 15 > +\end{lstlisting} (...) > +The device SHALL report all other error conditions related to zoned block > model s/SHALL report/reports/ ? This is outside of a normative section if I'm not confused. > +operation by setting the VIRTIO_BLK_S_ZONE_INVALID_CMD value in > +\field{status} of \field{virtio_blk_req} structure. > + > \drivernormative{\subsubsection}{Device Operation}{Device Types / Block > Device / Device Operation} > > A driver MUST NOT submit a request which would cause a read or write --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org