On Tue, 9 Jul 2019 13:41:34 +0530 Pankaj Gupta <pagu...@redhat.com> wrote:
> This patch proposes a virtio specification for new > virtio pmem device. Virtio pmem is a paravirtualized > device which solves two problems: > > - Provides emulation of persistent memory on host regular > (non NVDIMM) storage. > - Allows the guest to bypass the page cache. > > This is changed version from previous RFC [1], incorporated > changes suggested by Stefan, Michael & Cornerlia. > > [1] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html > > Signed-off-by: Pankaj Gupta <pagu...@redhat.com> > --- > conformance.tex | 23 ++++++++++- > content.tex | 1 + > virtio-pmem.tex | 118 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 140 insertions(+), 2 deletions(-) > create mode 100644 virtio-pmem.tex (...) > diff --git a/virtio-pmem.tex b/virtio-pmem.tex > new file mode 100644 > index 0000000..fa64c6b > --- /dev/null > +++ b/virtio-pmem.tex > @@ -0,0 +1,118 @@ > +\section{PMEM Device}\label{sec:Device Types / PMEM Device} > + > +virtio pmem is an emulated persistent memory device. The > +device is plugged to the guest using virtio bus. "virtio bus" sounds too Linux-specific. Maybe just say "virtio pmem is an emulated persistent memory devices using virtio."? > + > +Device works as fake nvdimm device when emulated on host regular s/Device/The devices/ (more of these below) > +(non NVDIMM) device. Device provides a virtio based asynchronous > +flush mechanism to persists the guest writes. This avoids the s/persists/persist/ > +need of separate caching inside the guest and host side caching > +is used. Under memory pressure, host makes efficient memory reclaim s/host/the host/ > +decisions on uniform view of memory. > + > +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID} > + 27 > + > +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues} > +\begin{description} > +\item[0] req_vq > +\end{description} > + > +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature > bits} > + > +There are currently no feature bits defined for this device. > + > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM > Device / Device configuration layout} > + > +\begin{lstlisting} > +struct virtio_pmem_config { > + uint64_t start; > + uint64_t size; > +}; > +\end{lstlisting} > + > +\field{start} contains the physical address of the start of the persistent > memory range. > +\field{size} in bytes length of address range. "contains the length of the address range in bytes"? > + > +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / > Device Initialization} > + > +Device hot-plugs physical memory to guest address space. Persistent memory > device > +is emulated at host side. > + > +\begin{enumerate} > + \item The driver reads the physical start address from \field{start}. > + \item The driver reads the length of the persistent memory range from > \field{size}. > + \end{enumerate} > + > +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM > Device / Device Initialization} > + > +Host memory region MUST be mapped to guest address space in a s/Host memory region/The host memory region/ > +way so that updates are visible to other processes mapping the > +same memory region. > + > +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / > Driver Initialization} > + > +Memory stores to the persistent memory range are not guaranteed to be > +persistent without further action. An explicit flush command is > +required to ensure persistence. The req_vq is used to perform flush > +commands. > + > +\drivernormative{\subsubsection}{Driver Initialization: flush > callback}{Device Types / PMEM Driver / Driver Initialization / flush callback} > + > +Driver MUST implement a virtio based flush callback, otherwise driver s/Driver/The driver/ > +will fail to register. I'm not sure what a "virtio based flush callback" is supposed to mean, though? Also, what does it mean if a driver "fails to register"? I don't think you can translate this concept to the spec, it seems to be an implementation detail. > + > +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver > Operation} > + > +The VIRTIO_PMEM_REQ_TYPE_FLUSH command persists memory writes that were > performed Maybe "all memory writes to the persistent memory range", to be more clear? > +before the command was submitted. Once the command completes the s/the/those/, maybe? > +writes are guaranteed to be persistent. > + > +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device > Types / PMEM Driver / Driver Operation / Virtqueue command} > + > +The driver MUST submit a VIRTIO_PMEM_REQ_TYPE_FLUSH command after performing > +memory writes that require persistence. > + > +The driver MUST wait for the VIRTIO_PMEM_REQ_TYPE_FLUSH command to complete > before > +assuming previous writes are persistent. > + > +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device > Operation} > + > +\devicenormative{\subsubsection}{Device Operations: Virtqueue flush}{Device > Types / PMEM Device / Device Operation / Virtqueue flush} > + > +Device SHOULD handle multiple flush requests simultaneously using > +corresponding host flush mechanisms. > + > +\devicenormative{\subsubsection}{Device operations: Virtqueue return}{Device > Types / PMEM Device / Device Operation / Virtqueue return} > + > +Device SHOULD return integer '0' for success and '1' for failure. > +These errors are converted to corresponding error codes by guest as > +per architecture. I think what the driver actually does with the 0/1 return values is already an implementation detail. Also, why 'SHOULD'? The driver needs to be sure that it won't interpret success as failure, and vice versa. At least "return 0 on success, !0 otherwise" needs to be a MUST, I think. > + > +\subsection{Possible security implications}\label{sec:Device Types / PMEM > Device / Possible Security Implications} > + > +If two devices actually share the same memory, this creates a Maybe "Two devices actually sharing the same memory creates a..."? > +potential information leak as access patterns of one driver could > +be observable by another driver. > + > +This can happen for example if two devices are implemented in software > +by a hyper-visor, and two drivers are parts of VMs running on the > +hyper-visor. In this case, the timing of access to device memory I'm not sure whether hyper-visor or hypervisor is the more common spelling here. > +might leak information about access patterns from one VM to another. > + > +This can include, but might not be limited to: > +\begin{enumerate} > +\item Configurations sharing a single region of device memory (even in a > read-only configuration) > +\item Configurations with a shared cache between devices (e.g. linux page > cache) s/linux/Linux/ > +\item Configurations with memory deduplication techniques such as KSM > similar side-channels > + might be present if the device memory is shared with another system, e.g. > information about > + the hyper-visor/host page cache might leak into a VM guest. I'm not quite able to make sense of this sentence -- is there a semicolon missing after 'KSM'? > +\end{enumerate} > + > +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible > Security Implications / Countermeasures} > +Solution is to avoid sharing resources between devices. > +\begin{enumerate} > +\item Each VM must have its own device memory, not shared with any other VM > or process. > +\item If VM workload is a special application and there is no risk, its okay > to share the device memory. s/VM workload/the VM workload/ s/its/it is/ > +\item Don't allow host cache eviction from VM when device memory is shared > with other VM or host process. > +\end{enumerate} Generally, I think this has moved a lot into the right direction; some general nits from my side, and some remaining things that sound a bit too Linux-specific. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org