On Thu, Jun 09, 2022 at 08:43:21PM -0400, Dmitry Fomichev wrote: > +Host-managed zoned block devices have their LBA range divided to Sequential > +Write Required (SWR) zones that require some additional handling from the > host > +for sustainable operation. All write requests to SWR zones must be sequential > +and the zones with some data need to be reset before that data can be > rewritten. > +Host-managed devices support a set of ZBD-specific I/O requests that can be > used > +by the host to manage device zones. Host-managed devices report > VIRTIO_BLK_Z_HM > +value in the \field{model} field in \field{zoned}.
One of "report the VIRTIO_BLK_Z_HM value in ..." "report a VIRTIO_BLK_Z_HM value in ..." "report VIRTIO_BLK_Z_HM in ..." reads more naturally. > + > +Host-aware zoned block devices have their LBA range divided to Sequential > +Write Preferred (SWP) zones that support the random write access, similar to > +regular non-zoned devices. However, the device I/O performance might not be > +optimal if SWP zones are used in a random I/O pattern. SWP zones also support > +the same set of ZBD-specific I/O requests as host-managed devices that allow > +host-aware devices to be managed by any host that supports zoned block > devices > +to achieve its optimum performance. Host-aware devices report VIRTIO_BLK_Z_HA > +value in the \field{model} field in \field{zoned}. Same as above. > + > +During device operation, SWR and SWP zones can be in one of the following > states: > +empty, implicitly-open, explicitly-open, closed and full. The state machine > that > +governs the transitions between these states is described later in this > document. > + > +SWR and SWP zones consume volatile device resources while being in certain > +states and the device may set limits on the number of zones that can be in > these > +states simultaneously. > + > +Zoned block devices use two internal counters to account for the device > +resources in use, the number of currently open zones and the number of > currently > +active zones. > + > +Any zone state transition from a state that doesn't consume a zone resource > to a > +state that consumes the same resource increments the internal device counter > for > +that resource. Any zone transition out of a state that consumes a zone > resource > +to a state that doesn't consume the same resource decrements the counter. Any > +request that causes the device to exceed the reported zone resource limits is > +terminated by the device with a "zone resources exceeded" error as defined > for > +specific commands later. > + > \begin{lstlisting} > struct virtio_blk_config { > le64 capacity; > @@ -4623,6 +4694,15 @@ \subsection{Device configuration > layout}\label{sec:Device Types / Block Device / > le32 max_secure_erase_sectors; > le32 max_secure_erase_seg; > le32 secure_erase_sector_alignment; > + struct virtio_blk_zoned_characteristics { > + le32 zone_sectors; > + le32 max_open_zones; > + le32 max_active_zones; > + le32 max_append_sectors; > + le32 write_granularity; > + u8 model; > + u8 unused2[3]; > + } zoned; > }; > \end{lstlisting} > > @@ -4686,6 +4766,10 @@ \subsection{Device Initialization}\label{sec:Device > Types / Block Device / Devic > \field{secure_erase_sector_alignment} can be used by OS when splitting a > request based on alignment. > > +\item If the VIRTIO_BLK_F_ZONED feature is negotiated, the fields in > + \field{zoned} can be read by the driver to determine the zone > + characteristics of the device. All \field{zoned} fields are read-only. > + > \end{enumerate} > > \drivernormative{\subsubsection}{Device Initialization}{Device Types / Block > Device / Device Initialization} > @@ -4701,6 +4785,33 @@ \subsection{Device Initialization}\label{sec:Device > Types / Block Device / Devic > The driver MUST NOT read \field{writeback} before setting > the FEATURES_OK \field{device status} bit. > > +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_ZONED feature if they are incapable > +of supporting devices with the VIRTIO_BLK_Z_HM or VIRTIO_BLK_Z_HA zoned > model. > + > +Drivers MAY operate with VIRTIO_BLK_F_ZONED feature negotiated when the > device > +reports VIRTIO_BLK_Z_NONE zoned model for testing and development. Specifying a particular use case ("testing and development") could be interpreted to imply that other use cases MAY NOT operate with VIRTIO_BLK_F_ZONED. I suggest dropping the use case from the sentence or moving that sub-point outside the normative section to where the text describes VIRTIO_BLK_Z_NONE. Something like "Devices that offer VIRTIO_BLK_F_ZONED with VIRTIO_BLK_Z_NONE commonly do so for testing and development purposes". > +Each request is of form: > + > +\begin{lstlisting} > +struct virtio_blk_zoned_req { > + le32 type; > + le32 reserved; > + le64 sector; > + struct { > + /* ALL zone operation flag */ > + le32 mgmt_send_all:1; Bit-fields cannot be used in external interfaces (like hardware interfaces) because the C standard says (N1570 6.7.2.1 11): The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. So this C syntax does not define a specific binary representation, it's up to the compiler or system ABI. Please use a le32 field and define a bit number: #define VIRTIO_BLK_ZONE_MGMT_F_SEND_ALL (1u << 0) le32 flags; > + le32 reserved:31; > + } zone; > + u8 data[]; > + le64 append_sector; > + u8 reserved1[3]; > + u8 status; > +}; > +\end{lstlisting} Now that the status field is at the end of the struct the zoned commands can be treated like any other struct virtio_blk_req. The contents of the data[] field for ZONE_OPEN, ZONE_CLOSE, ZONE_FINISH, and ZONE_RESET requests is: #define VIRTIO_BLK_ZONE_MGMT_F_SEND_ALL (1u << 0) struct virtio_blk_zoned_mgmt_send { le32 flags; }; The contents of the data[] field for ZONE_APPEND is: u8 payload[]; /* driver -> device */ followed by: le64 append_sector; /* device -> driver */ The contents of the data[] field for ZONE_REPORT is struct virtio_blk_zone_report. The reason I'm pushing for getting rid of struct virtio_blk_zoned_req is that the zbd-specific fields are not common to all ZONE_* requests. I don't see a reason to define a common request layout anymore. Also, it would make the zoned command design similar to struct virtio_blk_discard_write_zeroes where request metadata is a separate struct that is located in struct virtio_blk_req's data[] field instead of a new top-level struct virtio_blk_discard_write_zeroes_request that duplicates struct virtio_blk_req fields. Stefan
signature.asc
Description: PGP signature