Re: [Xen-devel] [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains

2017-10-26 Thread Haozhong Zhang
On 10/27/17 11:26 +0800, Chao Peng wrote:
> On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> > Overview
> > ==
> > 
> > > (RFC v2 can be found at https://lists.xen.org/archives/html/xen-
> devel/2017-03/msg02401.html)
> > 
> > Well, this RFC v3 changes and inflates a lot from previous versions.
> > The primary changes are listed below, most of which are to simplify
> > the first implementation and avoid additional inflation.
> > 
> > 1. Drop the support to maintain the frametable and M2P table of PMEM
> >    in RAM. In the future, we may add this support back.
> 
> I don't find any discussion in v2 about this, but I'm thinking putting
> those Xen data structures in RAM sometimes is useful (e.g. when
> performance is important). It's better not making hard restriction on
> this.

Well, this is to reduce the complexity, as you see the current patch
size is already too big. In addition, the size of NVDIMM can be very
large, e.g. several tera-bytes or even more, which would require a
large RAM space to store its frametable and M2P (~10 MB per 1 GB) and
leave fewer RAM for guest usage.

> 
> > 
> > 2. Hide host NFIT and deny access to host PMEM from Dom0. In other
> >    words, the kernel NVDIMM driver is loaded in Dom 0 and existing
> >    management utilities (e.g. ndctl) do not work in Dom0 anymore. This
> >    is to workaround the inferences of PMEM access between Dom0 and Xen
> >    hypervisor. In the future, we may add a stub driver in Dom0 which
> >    will hold the PMEM pages being used by Xen hypervisor and/or other
> >    domains.
> > 
> > 3. As there is no NVDIMM driver and management utilities in Dom0 now,
> > >    we cannot easily specify an area of host NVDIMM (e.g., by
> /dev/pmem0)
> >    and manage NVDIMM in Dom0 (e.g., creating labels).  Instead, we
> >    have to specify the exact MFNs of host PMEM pages in xl domain
> >    configuration files and the newly added Xen NVDIMM management
> >    utility xen-ndctl.
> > 
> >    If there are indeed some tasks that have to be handled by existing
> >    driver and management utilities, such as recovery from hardware
> >    failures, they have to be accomplished out of Xen environment.
> 
> What kind of recovery can happen and does the recovery can happen at
> runtime? For example, can we recover a portion of NVDIMM assigned to a
> certain VM while keep other VMs still using NVDIMM?

For example, evaluate ACPI _DSM (maybe vendor specific) for error
recovery and/or scrubbing bad blocks, etc.

> 
> > 
> >    After 2. is solved in the future, we would be able to make existing
> >    driver and management utilities work in Dom0 again.
> 
> Is there any reason why we can't do it now? If existing ndctl (with
> additional patches) can work then we don't need introduce xen-ndctl
> anymore? I think that keeps user interface clearer.

The simple reason is I want to reduce the components (Xen/kernel/QEMU)
touched by the first patchset (whose primary target is to implement
the basic functionality, i.e. mapping host NVDIMM to guest as a
virtual NVDIMM). As you said, leaving a driver (the nvdimm driver
and/or a stub driver) in Dom0 would make the user interface
clearer. Let's see what I can get in the next version.

Thanks,
Haozhong

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains

2017-10-26 Thread Chao Peng
On Mon, 2017-09-11 at 12:37 +0800, Haozhong Zhang wrote:
> Overview
> ==
> 
> > (RFC v2 can be found at https://lists.xen.org/archives/html/xen-
devel/2017-03/msg02401.html)
> 
> Well, this RFC v3 changes and inflates a lot from previous versions.
> The primary changes are listed below, most of which are to simplify
> the first implementation and avoid additional inflation.
> 
> 1. Drop the support to maintain the frametable and M2P table of PMEM
>    in RAM. In the future, we may add this support back.

I don't find any discussion in v2 about this, but I'm thinking putting
those Xen data structures in RAM sometimes is useful (e.g. when
performance is important). It's better not making hard restriction on
this.

> 
> 2. Hide host NFIT and deny access to host PMEM from Dom0. In other
>    words, the kernel NVDIMM driver is loaded in Dom 0 and existing
>    management utilities (e.g. ndctl) do not work in Dom0 anymore. This
>    is to workaround the inferences of PMEM access between Dom0 and Xen
>    hypervisor. In the future, we may add a stub driver in Dom0 which
>    will hold the PMEM pages being used by Xen hypervisor and/or other
>    domains.
> 
> 3. As there is no NVDIMM driver and management utilities in Dom0 now,
> >    we cannot easily specify an area of host NVDIMM (e.g., by
/dev/pmem0)
>    and manage NVDIMM in Dom0 (e.g., creating labels).  Instead, we
>    have to specify the exact MFNs of host PMEM pages in xl domain
>    configuration files and the newly added Xen NVDIMM management
>    utility xen-ndctl.
> 
>    If there are indeed some tasks that have to be handled by existing
>    driver and management utilities, such as recovery from hardware
>    failures, they have to be accomplished out of Xen environment.

What kind of recovery can happen and does the recovery can happen at
runtime? For example, can we recover a portion of NVDIMM assigned to a
certain VM while keep other VMs still using NVDIMM?

> 
>    After 2. is solved in the future, we would be able to make existing
>    driver and management utilities work in Dom0 again.

Is there any reason why we can't do it now? If existing ndctl (with
additional patches) can work then we don't need introduce xen-ndctl
anymore? I think that keeps user interface clearer.

Chao___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC XEN PATCH v3 00/39] Add vNVDIMM support to HVM domains

2017-09-10 Thread Haozhong Zhang
Overview
==

(RFC v2 can be found at 
https://lists.xen.org/archives/html/xen-devel/2017-03/msg02401.html)

Well, this RFC v3 changes and inflates a lot from previous versions.
The primary changes are listed below, most of which are to simplify
the first implementation and avoid additional inflation.

1. Drop the support to maintain the frametable and M2P table of PMEM
   in RAM. In the future, we may add this support back.

2. Hide host NFIT and deny access to host PMEM from Dom0. In other
   words, the kernel NVDIMM driver is loaded in Dom 0 and existing
   management utilities (e.g. ndctl) do not work in Dom0 anymore. This
   is to workaround the inferences of PMEM access between Dom0 and Xen
   hypervisor. In the future, we may add a stub driver in Dom0 which
   will hold the PMEM pages being used by Xen hypervisor and/or other
   domains.

3. As there is no NVDIMM driver and management utilities in Dom0 now,
   we cannot easily specify an area of host NVDIMM (e.g., by /dev/pmem0)
   and manage NVDIMM in Dom0 (e.g., creating labels).  Instead, we
   have to specify the exact MFNs of host PMEM pages in xl domain
   configuration files and the newly added Xen NVDIMM management
   utility xen-ndctl.

   If there are indeed some tasks that have to be handled by existing
   driver and management utilities, such as recovery from hardware
   failures, they have to be accomplished out of Xen environment.

   After 2. is solved in the future, we would be able to make existing
   driver and management utilities work in Dom0 again.

All patches can be found at
  Xen:  https://github.com/hzzhan9/xen.git nvdimm-rfc-v3
  QEMU: https://github.com/hzzhan9/qemu.git xen-nvdimm-rfc-v3


How to Test
==

1. Build and install this patchset with the associated QEMU patches.

2. Use xen-ndctl to get a list of PMEM regions detected by Xen
   hypervisor, e.g.
   
 # xen-ndctl list --raw
 Raw PMEM regions:
  0: MFN 0x48 - 0x88, PXM 3

   which indicates a PMEM region is present at MFN 0x48 - 0x88.

3. Setup a management area to manage the guest data areas.

 # xen-ndctl setup-mgmt 0x48 0x4c
 # xen-ndctl list --mgmt
 Management PMEM regions:
  0: MFN 0x48 - 0x4c, used 0xc00
 
   The first command setup the PMEM area in MFN 0x48 - 0x4c
   (1GB) as a management area, which is also used to manage itself.
   The second command list all management areas, and 'used' field
   shows the number of pages has been used from the beginning of that
   area.

   The size ratio between a management area and areas that it manages
   (including itself) should be at least 1 : 100 (i.e., 32 bytes for
   frametable and 8 bytes for M2P table per page).

   The size of a management area as well as a data area below is
   currently restricted to 256 Mbytes or multiples. The alignment is
   restricted to 2 Mbytes or multiples.

4. Setup a data area that can be used by guest.

 # xen-ndctl setup-data 0x4c 0x88 0x480c00 0x4c
 # xen-ndctl list --data
 Data PMEM regions:
  0: MFN 0x4c - 0x88, MGMT MFN 0x480c00 - 0x48b000

   The first command setup the remaining PMEM pages from MFN 0x4c
   to 0x88 as a data area. The management area MFN from 0x480c00
   to 0x4c is specified to manage this data area. The actual used
   management pages can be found by the second command.

5. Assign a data pages to a HVM domain by adding the following line in
   the domain configuration.

 vnvdimms = [ 'type=mfn, backend=0x4c, nr_pages=0x10' ]

   which assigns 4 Gbytes PMEM starting from MFN 0x4c to that
   domain. A 4 Gbytes PMEM should be present in guest (e.g., as
   /dev/pmem0) after above steps of setup.

   There can be one or multiple entries in vnvdimms, which do not
   overlap with each other. Sharing the PMEM pages between domains are
   not supported, so PMEM pages assigned to each domain should not
   overlap with each other.


Patch Organization
==

This RFC v3 is composed of following 6 parts per the task they are
going to solve. The tool stack patches are collected and separated
into each part.

- Part 0. Bug fix and code cleanup
[01/39] x86_64/mm: fix the PDX group check in mem_hotadd_check()
[02/39] x86_64/mm: drop redundant MFN to page conventions in 
cleanup_frame_table()
[03/39] x86_64/mm: avoid cleaning the unmapped frame table

- Part 1. Detect host PMEM
  Detect host PMEM via NFIT. No frametable and M2P table for them are
  created in this part.

[04/39] xen/common: add Kconfig item for pmem support
[05/39] x86/mm: exclude PMEM regions from initial frametable
[06/39] acpi: probe valid PMEM regions via NFIT
[07/39] xen/pmem: register valid PMEM regions to Xen hypervisor
[08/39] xen/pmem: hide NFIT and deny access to PMEM from Dom0
[09/39] xen/pmem: add framework for hypercall XEN_SYSCTL_nvdimm_op
[10/39] xen/pmem: add