[Xen-devel] [v4][PATCH 2/2] libxl: introduce gfx_passthru_kind

2015-09-18 Thread Tiejun Chen
Although we already have 'gfx_passthru' in b_info, this doesn't suffice
after we want to handle IGD specifically. Now we define a new field of
type, gfx_passthru_kind, to indicate we're trying to pass IGD. Actually
this means we can benefit this to support other specific devices just
by extending gfx_passthru_kind. And then we can cooperate with
gfx_passthru to address IGD cases as follows:

gfx_passthru = 0=> sets build_info.u.gfx_passthru to false
gfx_passthru = 1=> sets build_info.u.gfx_passthru to true and
   build_info.u.gfx_passthru_kind to DEFAULT
gfx_passthru = "igd"=> sets build_info.u.gfx_passthru to true
   and build_info.u.gfx_passthru_kind to IGD

Here if gfx_passthru_kind = DEFAULT, we will call
libxl__is_igd_vga_passthru() to check if we're hitting that table to need
to pass that option to qemu. But if gfx_passthru_kind = "igd" we always
force to pass that.

And "-gfx_passthru" is just introduced to work for qemu-xen-traditional
so we should get this away from libxl__build_device_model_args_new() in
the case of qemu upstream.

Signed-off-by: Tiejun Chen <tiejun.c...@intel.com>
---
 docs/man/xl.cfg.pod.5   | 35 ++
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_dm.c  | 46 +
 tools/libxl/libxl_types.idl |  6 ++
 tools/libxl/xl_cmdimpl.c| 14 --
 5 files changed, 97 insertions(+), 10 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index db4a163..4aa7b05 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -812,7 +812,7 @@ through to this VM. See L<seize|/"seize_boolean"> above.
 devices passed through to this VM. See L<power_mgt|/"power_mgmt_boolean">
 above.
 
-=item B

[Xen-devel] [v4][PATCH 0/2] libxl: try to support IGD passthrough for qemu upstream

2015-09-18 Thread Tiejun Chen
Ian,

As we discussed previously,

http://patchwork.ozlabs.org/patch/457055/

now it's time to push this into on xen/tools side since all qemu stuffs
have been merged.

https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg02094.html

v4:

Ian,

Actually we had v3.5 online previously, which was reviewed by you.

http://permalink.gmane.org/gmane.comp.emulators.qemu/329100

So here I just bring a little bit to refine code just for patch #2
according to out last conversation.

v3:

* Refine some codes based on Campbell's feedback so thanks for Campbell's
  kind guideline to patch #2
* Update the manpages in patch #2

v2:

* Refine patch #2's head description 
* Improve codes quality inside patch #1 based on Wei's comments
* Refill the summary inside patch #0 based on Konrad and Wei's suggestion

When we're working to support IGD GFX passthrough with qemu
upstream, instead of "-gfx_passthru" we'd like to make that
a machine option, "-machine xxx,igd-passthru=on".

https://lists.nongnu.org/archive/html/qemu-devel/2015-01/msg02050.html

This need to bring a change on tool side.

After a discussion with Campbell, we'd like to construct a table to record
all IGD devices we can support. If we hit that table, we should pass that
option. And so we also introduce a new field of type, 'gfx_passthru_kind',
to cooperate with 'gfx_passthru' to cover all scenarios like this,

gfx_passthru = 0=> sets build_info.u.gfx_passthru to false
gfx_passthru = 1=> sets build_info.u.gfx_passthru to true and
   build_info.u.gfx_passthru_kind to DEFAULT
gfx_passthru = "igd"=> sets build_info.u.gfx_passthru to false
   and build_info.u.gfx_passthru_kind to IGD

And note actually that option "-gfx_passthru" is just introduced to
work for qemu-xen-traditional so we should get this away from
libxl__build_device_model_args_new() in the case of qemu upstream. 

----
Tiejun Chen (2):
  libxl: introduce libxl__is_igd_vga_passthru
  libxl: introduce gfx_passthru_kind

 docs/man/xl.cfg.pod.5|  35 --
 tools/libxl/libxl.h  |   6 ++
 tools/libxl/libxl_dm.c   |  46 +++--
 tools/libxl/libxl_internal.h |   2 +
 tools/libxl/libxl_pci.c  | 124 +++
 tools/libxl/libxl_types.idl  |   6 ++
 tools/libxl/xl_cmdimpl.c |  14 +++-
 7 files changed, 223 insertions(+), 10 deletions(-)

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v4][PATCH 1/2] libxl: introduce libxl__is_igd_vga_passthru

2015-09-18 Thread Tiejun Chen
While working with qemu, IGD is a specific device in the case of pass through
so we need to identify that to handle more later. Here we define a table to
record all IGD types currently we can support. Also we need to introduce two
helper functions to get vendor and device ids to lookup that table.

Signed-off-by: Tiejun Chen <tiejun.c...@intel.com>
Acked-by: Ian Campbell <ian.campb...@citrix.com>
---
 tools/libxl/libxl_internal.h |   2 +
 tools/libxl/libxl_pci.c  | 124 +++
 2 files changed, 126 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 5fa55a7..5d6c793 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1343,6 +1343,8 @@ _hidden int libxl__device_pci_add(libxl__gc *gc, uint32_t 
domid, libxl_device_pc
 _hidden int libxl__create_pci_backend(libxl__gc *gc, uint32_t domid,
   libxl_device_pci *pcidev, int num);
 _hidden int libxl__device_pci_destroy_all(libxl__gc *gc, uint32_t domid);
+_hidden bool libxl__is_igd_vga_passthru(libxl__gc *gc,
+const libxl_domain_config *d_config);
 
 /* from libxl_dtdev */
 
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 7229a36..8434ec8 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -491,6 +491,130 @@ static int sysfs_dev_unbind(libxl__gc *gc, 
libxl_device_pci *pcidev,
 return 0;
 }
 
+static uint16_t sysfs_dev_get_vendor(libxl__gc *gc, libxl_device_pci *pcidev)
+{
+char *pci_device_vendor_path =
+GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/vendor",
+  pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
+uint16_t read_items;
+uint16_t pci_device_vendor;
+
+FILE *f = fopen(pci_device_vendor_path, "r");
+if (!f) {
+LOGE(ERROR,
+ "pci device "PCI_BDF" does not have vendor attribute",
+ pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
+return 0x;
+}
+read_items = fscanf(f, "0x%hx\n", _device_vendor);
+fclose(f);
+if (read_items != 1) {
+LOGE(ERROR,
+ "cannot read vendor of pci device "PCI_BDF,
+ pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
+return 0x;
+}
+
+return pci_device_vendor;
+}
+
+static uint16_t sysfs_dev_get_device(libxl__gc *gc, libxl_device_pci *pcidev)
+{
+char *pci_device_device_path =
+GCSPRINTF(SYSFS_PCI_DEV"/"PCI_BDF"/device",
+  pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
+uint16_t read_items;
+uint16_t pci_device_device;
+
+FILE *f = fopen(pci_device_device_path, "r");
+if (!f) {
+LOGE(ERROR,
+ "pci device "PCI_BDF" does not have device attribute",
+ pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
+return 0x;
+}
+read_items = fscanf(f, "0x%hx\n", _device_device);
+fclose(f);
+if (read_items != 1) {
+LOGE(ERROR,
+ "cannot read device of pci device "PCI_BDF,
+ pcidev->domain, pcidev->bus, pcidev->dev, pcidev->func);
+return 0x;
+}
+
+return pci_device_device;
+}
+
+typedef struct {
+uint16_t vendor;
+uint16_t device;
+} pci_info;
+
+static const pci_info fixup_ids[] = {
+/* Intel HSW Classic */
+{0x8086, 0x0402}, /* HSWGT1D, HSWD_w7 */
+{0x8086, 0x0406}, /* HSWGT1M, HSWM_w7 */
+{0x8086, 0x0412}, /* HSWGT2D, HSWD_w7 */
+{0x8086, 0x0416}, /* HSWGT2M, HSWM_w7 */
+{0x8086, 0x041E}, /* HSWGT15D, HSWD_w7 */
+/* Intel HSW ULT */
+{0x8086, 0x0A06}, /* HSWGT1UT, HSWM_w7 */
+{0x8086, 0x0A16}, /* HSWGT2UT, HSWM_w7 */
+{0x8086, 0x0A26}, /* HSWGT3UT, HSWM_w7 */
+{0x8086, 0x0A2E}, /* HSWGT3UT28W, HSWM_w7 */
+{0x8086, 0x0A1E}, /* HSWGT2UX, HSWM_w7 */
+{0x8086, 0x0A0E}, /* HSWGT1ULX, HSWM_w7 */
+/* Intel HSW CRW */
+{0x8086, 0x0D26}, /* HSWGT3CW, HSWM_w7 */
+{0x8086, 0x0D22}, /* HSWGT3CWDT, HSWD_w7 */
+/* Intel HSW Server */
+{0x8086, 0x041A}, /* HSWSVGT2, HSWD_w7 */
+/* Intel HSW SRVR */
+{0x8086, 0x040A}, /* HSWSVGT1, HSWD_w7 */
+/* Intel BSW */
+{0x8086, 0x1606}, /* BDWULTGT1, BDWM_w7 */
+{0x8086, 0x1616}, /* BDWULTGT2, BDWM_w7 */
+{0x8086, 0x1626}, /* BDWULTGT3, BDWM_w7 */
+{0x8086, 0x160E}, /* BDWULXGT1, BDWM_w7 */
+{0x8086, 0x161E}, /* BDWULXGT2, BDWM_w7 */
+{0x8086, 0x1602}, /* BDWHALOGT1, BDWM_w7 */
+{0x8086, 0x1612}, /* BDWHALOGT2, BDWM_w7 */
+{0x8086, 0x1622}, /* BDWHALOGT3, BDWM_w7 */
+{0x8086, 0x162B}, /* BDWHALO28W, BDWM_w7 */
+{0x8086, 0x162A}, /* BDWGT3WRKS, BDWM_w7 */
+{0x8086, 0x162D}, /* BDWGT3SRVR, 

[Xen-devel] [PATCH] hw/pci-host/piix: fix one file descriptor leak

2015-09-17 Thread Tiejun Chen
Commit 595a4f07d6bd (piix: create host bridge to passthrough)
introduced to leak of one file descriptor, "config_fd", now
just fix that.

CC: Michael S. Tsirkin <m...@redhat.com>
CC: Stefano Stabellini <stefano.stabell...@eu.citrix.com>
CC: Paolo Bonzini <pbonz...@redhat.com>
Acked-by: Stefano Stabellini <stefano.stabell...@eu.citrix.com>
Signed-off-by: Tiejun Chen <tiejun.c...@intel.com>
---
 hw/pci-host/piix.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/pci-host/piix.c b/hw/pci-host/piix.c
index 1fb71c8..7d44228 100644
--- a/hw/pci-host/piix.c
+++ b/hw/pci-host/piix.c
@@ -775,15 +775,18 @@ static int host_pci_config_read(int pos, int len, 
uint32_t val)
 }
 
 if (lseek(config_fd, pos, SEEK_SET) != pos) {
+close(config_fd);
 return -errno;
 }
 do {
 rc = read(config_fd, (uint8_t *), len);
 } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 if (rc != len) {
+close(config_fd);
 return -errno;
 }
 
+close(config_fd);
 return 0;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v2][PATCH] vtd/iommu: correct loglevel when check group divices

2015-09-16 Thread Tiejun Chen
Since commit 3848058e7dd6 (vtd/iommu: permit group devices to
passthrough in relaxed mode) is introduced, we always print
message as XENLOG_G_WARNING but its not correct in the case of
strict mode. So here is making this message depending on the
specific mode.

CC: Yang Zhang <yang.z.zh...@intel.com>
CC: Kevin Tian <kevin.t...@intel.com>
CC: Jan Beulich <jbeul...@suse.com>
CC: Wei Liu <wei.l...@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.c...@intel.com>
---
v2:

* Correct wrong place to loglevel.

 xen/drivers/passthrough/vtd/iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 7b45bff..b67b624 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2314,9 +2314,10 @@ static int intel_iommu_assign_device(
 {
 bool_t relaxed = !!(flag & XEN_DOMCTL_DEV_RDM_RELAXED);
 
-printk(XENLOG_G_WARNING VTDPREFIX
+printk(XENLOG_GUEST "%s" VTDPREFIX
" It's %s to assign %04x:%02x:%02x.%u"
" with shared RMRR at %"PRIx64" for Dom%d.\n",
+   relaxed ? XENLOG_WARNING : XENLOG_ERR,
relaxed ? "risky" : "disallowed",
seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
rmrr->base_address, d->domain_id);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] vtd/iommu: correct loglevel when check group divices

2015-09-15 Thread Tiejun Chen
Since commit 3848058e7dd6 (vtd/iommu: permit group devices to
passthrough in relaxed mode) is introduced, we always print
message as XENLOG_G_WARNING but its not correct in the case of
strict mode. So here is making this message depending on the
specific mode.

CC: Yang Zhang <yang.z.zh...@intel.com>
CC: Kevin Tian <kevin.t...@intel.com>
CC: Jan Beulich <jbeul...@suse.com>
CC: Wei Liu <wei.l...@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.c...@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 7b45bff..53aac18 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2314,10 +2314,11 @@ static int intel_iommu_assign_device(
 {
 bool_t relaxed = !!(flag & XEN_DOMCTL_DEV_RDM_RELAXED);
 
-printk(XENLOG_G_WARNING VTDPREFIX
+printk(XENLOG_GUEST "%s" VTDPREFIX
" It's %s to assign %04x:%02x:%02x.%u"
" with shared RMRR at %"PRIx64" for Dom%d.\n",
relaxed ? "risky" : "disallowed",
+   relaxed ? XENLOG_WARNING : XENLOG_ERR,
seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
rmrr->base_address, d->domain_id);
 if ( !relaxed )
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen/domctl: lower loglevel of XEN_DOMCTL_memory_mapping

2015-09-09 Thread Tiejun Chen
We should lower loglevel to XENLOG_G_DEBUG while mapping or
unmapping memory via XEN_DOMCTL_memory_mapping since its
fair enough to check this info just while debugging.

CC: Ian Campbell <ian.campb...@citrix.com>
CC: Ian Jackson <ian.jack...@eu.citrix.com>
CC: Jan Beulich <jbeul...@suse.com>
CC: Keir Fraser <k...@xen.org>
CC: Tim Deegan <t...@xen.org>
Signed-off-by: Tiejun Chen <tiejun.c...@intel.com>
---
 xen/common/domctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index 7f959f3..3bf39f1 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -1049,7 +1049,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
 
 if ( add )
 {
-printk(XENLOG_G_INFO
+printk(XENLOG_G_DEBUG
"memory_map:add: dom%d gfn=%lx mfn=%lx nr=%lx\n",
d->domain_id, gfn, mfn, nr_mfns);
 
@@ -1061,7 +1061,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
u_domctl)
 }
 else
 {
-printk(XENLOG_G_INFO
+printk(XENLOG_G_DEBUG
"memory_map:remove: dom%d gfn=%lx mfn=%lx nr=%lx\n",
d->domain_id, gfn, mfn, nr_mfns);
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v2][PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode

2015-09-08 Thread Tiejun Chen
Currently we don't allow passing through any group devices which are
sharing same RMRR entry since it would break security among VMs. And
indeed, we expect we can figure out a better way to handle this kind
of case completely.

But before the group assignment gets implemented, we might make this
permission dependent on our RMRR policy. So, now it would be allowed
in the relaxed mode.

CC: Yang Zhang <yang.z.zh...@intel.com>
CC: Kevin Tian <kevin.t...@intel.com>
CC: Jan Beulich <jbeul...@suse.com>
CC: Wei Liu <wei.l...@citrix.com>
Signed-off-by: Tiejun Chen <tiejun.c...@intel.com>
---
v2:

* Sync code comments
* refactor vaiable "relaxed" as bool_t
* s/XENLOG_G_ERR VTDPREFIX/XENLOG_G_WARNING VTDPREFIX
* Try to refine print message

 xen/drivers/passthrough/vtd/iommu.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 836aed5..7b45bff 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2297,7 +2297,9 @@ static int intel_iommu_assign_device(
 /*
  * In rare cases one given rmrr is shared by multiple devices but
  * obviously this would put the security of a system at risk. So
- * we should prevent from this sort of device assignment.
+ * we would prevent from this sort of device assignment. But this
+ * can be permitted if user set
+ *  "pci = [ 'sbdf, rdm_policy=relaxed' ]"
  *
  * TODO: in the future we can introduce group device assignment
  * interface to make sure devices sharing RMRR are assigned to the
@@ -2310,12 +2312,16 @@ static int intel_iommu_assign_device(
  PCI_DEVFN2(bdf) == devfn &&
  rmrr->scope.devices_cnt > 1 )
 {
-printk(XENLOG_G_ERR VTDPREFIX
-   " cannot assign %04x:%02x:%02x.%u"
+bool_t relaxed = !!(flag & XEN_DOMCTL_DEV_RDM_RELAXED);
+
+printk(XENLOG_G_WARNING VTDPREFIX
+   " It's %s to assign %04x:%02x:%02x.%u"
" with shared RMRR at %"PRIx64" for Dom%d.\n",
+   relaxed ? "risky" : "disallowed",
seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
rmrr->base_address, d->domain_id);
-return -EPERM;
+if ( !relaxed )
+return -EPERM;
 }
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen/vtd/iommu: permit group devices to passthrough in relaxed mode

2015-09-07 Thread Tiejun Chen
Currently we don't allow passing through any group devices which are
sharing same RMRR entry since it would break security among VMs. And
indeed, we expect we can figure out a better way to handle this kind
of case completely.

But before the group assignment gets implemented, we might make this
permission dependent on our RMRR policy. So, now it would be allowed
in the relaxed mode.

CC: Yang Zhang <yang.z.zh...@intel.com>
CC: Kevin Tian <kevin.t...@intel.com>
CC: Jan Beulich <jbeul...@suse.com>
Signed-off-by: Tiejun Chen <tiejun.c...@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 836aed5..4249cfa 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2310,12 +2310,16 @@ static int intel_iommu_assign_device(
  PCI_DEVFN2(bdf) == devfn &&
  rmrr->scope.devices_cnt > 1 )
 {
+u32 relaxed = flag & XEN_DOMCTL_DEV_RDM_RELAXED;
+
 printk(XENLOG_G_ERR VTDPREFIX
-   " cannot assign %04x:%02x:%02x.%u"
+   " Currently its %s to assign %04x:%02x:%02x.%u"
" with shared RMRR at %"PRIx64" for Dom%d.\n",
+   relaxed ? "risky" : "disallowed",
seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
rmrr->base_address, d->domain_id);
-return -EPERM;
+if ( !relaxed )
+return -EPERM;
 }
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] tools/hvmloader: sync memory map[]

2015-07-28 Thread Tiejun Chen
Currently we always use memory map[] to help hvmloader construct e820 table
but hvmloader may have relocated RAM to support mmio allocation or just
populated ram to ensure we can have enough room to load ovmf. Anyway we
need to sync these changes into memory map[].

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
CC: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
 tools/firmware/hvmloader/e820.c | 105 ++--
 tools/firmware/hvmloader/pci.c  |   3 ++
 tools/firmware/hvmloader/util.c |   3 ++
 tools/firmware/hvmloader/util.h |   3 ++
 4 files changed, 57 insertions(+), 57 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index a6cacdf..f4ccacb 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -55,6 +55,54 @@ void memory_map_setup(void)
 }
 }
 
+/*
+ * Sometimes hvmloader may have relocated RAM so low_mem_pgend/high_mem_end
+ * would be changed over there. But memory_map[] just records the
+ * original low/high memory, so we need to sync these entries once
+ * hvmloader modifies low/high memory.
+ */
+void adjust_memory_map(void)
+{
+uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
+uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT;
+unsigned int i;
+
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+uint64_t map_start = memory_map.map[i].addr;
+uint64_t map_size = memory_map.map[i].size;
+uint64_t map_end = map_start + map_size;
+
+/* If we need to adjust lowmem. */
+if ( memory_map.map[i].type == E820_RAM 
+ low_mem_end  map_start  low_mem_end  map_end )
+{
+memory_map.map[i].size = low_mem_end - map_start;
+continue;
+}
+
+/* Modify the existing highmem region if it exists. */
+if ( memory_map.map[i].type == E820_RAM 
+ high_mem_end  map_start == ((uint64_t)1  32) )
+{
+if ( high_mem_end != map_end )
+memory_map.map[i].size = high_mem_end - map_start;
+high_mem_end = 0;
+continue;
+}
+}
+
+/* If there was no highmem region, just create one. */
+if ( high_mem_end )
+{
+memory_map.map[i].addr = ((uint64_t)1  32);
+memory_map.map[i].size =
+((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) -
+memory_map.map[i].addr;
+memory_map.map[i].type = E820_RAM;
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
@@ -107,9 +155,6 @@ int build_e820_table(struct e820entry *e820,
 {
 unsigned int nr = 0, i, j;
 uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
-uint32_t add_high_mem = 0;
-uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT;
-uint64_t map_start, map_size, map_end;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -208,63 +253,9 @@ int build_e820_table(struct e820entry *e820,
  *
  * Note we just have one low memory entry and one high mmeory entry if
  * exists.
- *
- * But we may have relocated RAM to allocate sufficient MMIO previously
- * so low_mem_pgend would be changed over there. And here memory_map[]
- * records the original low/high memory, so if low_mem_end is less than
- * the original we need to revise low/high memory range firstly.
  */
 for ( i = 0; i  memory_map.nr_map; i++ )
 {
-map_start = memory_map.map[i].addr;
-map_size = memory_map.map[i].size;
-map_end = map_start + map_size;
-
-/* If we need to adjust lowmem. */
-if ( memory_map.map[i].type == E820_RAM 
- low_mem_end  map_start  low_mem_end  map_end )
-{
-add_high_mem = map_end - low_mem_end;
-memory_map.map[i].size = low_mem_end - map_start;
-break;
-}
-}
-
-/* If we need to adjust highmem. */
-if ( add_high_mem )
-{
-/* Modify the existing highmem region if it exists. */
-for ( i = 0; i  memory_map.nr_map; i++ )
-{
-map_start = memory_map.map[i].addr;
-map_size = memory_map.map[i].size;
-map_end = map_start + map_size;
-
-if ( memory_map.map[i].type == E820_RAM 
- map_start == ((uint64_t)1  32))
-{
-memory_map.map[i].size += add_high_mem;
-break;
-}
-}
-
-/* If there was no highmem region, just create one. */
-if ( i == memory_map.nr_map

[Xen-devel] [v11][PATCH 02/16] xen/vtd: create RMRR mapping

2015-07-21 Thread Tiejun Chen
RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Acked-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v11:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m-domain) )
return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c   | 40 +++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h   | 13 +---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6fe6387..1e763dc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long 
gfn, unsigned long mfn,
  p2m-default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
   unsigned long mfn, unsigned int page_order)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int rc;
 gfn_lock(p2m, gfn, page_order);
-p2m_remove_page(p2m, gfn, mfn, page_order);
+rc = p2m_remove_page(p2m, gfn, mfn, page_order);
 gfn_unlock(p2m, gfn, page_order);
+return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma)
+{
+p2m_type_t p2mt;
+p2m_access_t a;
+mfn_t mfn;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+
+if ( !paging_mode_translate(p2m-domain) )
+return 0;
+
+gfn_lock(p2m, gfn, 0);
+
+mfn = p2m-get_entry(p2m, gfn, p2mt, a, 0, NULL);
+
+if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+p2m_mmio_direct, p2ma);
+else if ( mfn_x(mfn) == gfn  p2mt == p2m_mmio_direct  a == p2ma )
+ret = 0;
+else
+{
+ret = -EBUSY;
+printk(XENLOG_G_WARNING
+   Cannot setup identity map d%d:%lx,
+gfn already mapped to %lx.\n,
+   d-domain_id, gfn, mfn_x(mfn));
+}
+
+gfn_unlock(p2m, gfn, 0);
+return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 9849d0e..5aa482f 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( clear_identity_p2m_entry(d, base_pfn, 0) )
 ret = -ENXIO;
 base_pfn++;
 }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-int err = intel_iommu_map_page(d, base_pfn, base_pfn

[Xen-devel] [v11][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-21 Thread Tiejun Chen
This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Global RDM parameter, strategy, allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v9 ~ v11:

* Nothing is changed.

v8:

* One minimal code style change

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

v6:

* Some rename to make our policy reasonable
  type - strategy
  none - ignore
* Don't expose ignore in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 81 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   |  7 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  9 +
 tools/libxl/libxl_types.idl  | 18 ++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 382f30b..e6e0f70 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -633,6 +633,79 @@ assigned slave device.
 
 =back
 
+=item Brdm=RDM_RESERVATION_STRING
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+BRDM_RESERVE_STRING has the form C[KEY=VALUE,KEY=VALUE,... where:
+
+=over 4
+
+=item BKEY=VALUE
+
+Possible BKEYs are:
+
+=over 4
+
+=item Bstrategy=STRING
+
+Currently there is only one valid type:
+
+host means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using host includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does 
exist.
+
+For example, you're trying to set memory = 2800 to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If Bstrategy is set to host, for example,
+
+rdm = strategy=host,policy=strict or rdm = strategy=host,policy=relaxed
+
+It means all conflicts will be handled according to the policy
+introduced by Bpolicy as described below.
+
+#2. If Bstrategy is not set at all, but
+
+pci = [ 'sbdf_A, rdm_policy=x' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by Brdm_policy=STRING as described inside pci options.
+
+=item Bpolicy=STRING

[Xen-devel] [v11][PATCH 04/16] xen: enable XENMEM_memory_map in hvm

2015-07-21 Thread Tiejun Chen
This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: George Dunlap george.dun...@eu.citrix.com
---
v5 ~ v11:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c  | 6 --
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c07e3ef..d860579 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4855,7 +4855,6 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
@@ -4931,7 +4930,6 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 342414f..8c887d8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-if ( is_hvm_domain(d) )
-{
-rcu_unlock_domain(d);
-return -EPERM;
-}
-
 e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
 if ( e820 == NULL )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v11][PATCH 00/16] Fix RMRR

2015-07-21 Thread Tiejun Chen
-arranged otherwise
it will result in a more scattered layout:
a) in highmem region (4G)
b) in lowmem region, and below a predefined boundary (default 2G)
  a) is a new assumption not discussed before. From VT-d spec this is 
possible but no such observation in real-world. So we can make this
reasonable assumption until there's real usage on it.

5. Extend XENMEM_set_memory_map usable for HVM guest, and then have
libxl to use that hypercall to carry RDM information to hvmloader. There
is one difference from original discussion. Previously we discussed to
introduce a new E820 type specifically for RDM entries. After more thought
we think it's OK to just tag them as E820_reserved. Actually hvmloader
doesn't need to know whether the reserved entries come from RDM or
from other purposes. 

6. Then in hvmloader the change is generic for XENMEM_memory_map
change. Given a predefined memory layout, hvmloader should avoid
allocating all reserved entries for other usages (opregion, mmio, etc.)

7. Extend existing device passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.


Jan Beulich (1):
  xen: introduce XENMEM_reserved_device_memory_map

Tiejun Chen (15):
  xen/vtd: create RMRR mapping
  xen/passthrough: extend hypercall to support rdm reservation policy
  xen: enable XENMEM_memory_map in hvm
  hvmloader: get guest memory map into memory_map[]
  hvmloader/pci: Try to avoid placing BARs in RMRRs
  hvmloader/e820: construct guest e820 table
  tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  tools: extend xc_assign_device() to support rdm reservation policy
  tools: introduce some new parameters to set rdm policy
  tools/libxl: detect and avoid conflicts with RDM
  tools: introduce a new parameter to set a predefined rdm boundary
  libxl: construct e820 map with RDM information for HVM guest
  xen/vtd: enable USB device assignment
  xen/vtd: prevent from assign the device with shared rmrr
  tools: parse to enable new rdm policy parameters

 docs/man/xl.cfg.pod.5   | 103 
 docs/misc/vtd.txt   |  24 ++
 tools/firmware/hvmloader/e820.c | 141 +-
 tools/firmware/hvmloader/e820.h |   7 +
 tools/firmware/hvmloader/hvmloader.c|   2 +
 tools/firmware/hvmloader/pci.c  |  65 +
 tools/firmware/hvmloader/util.c |  26 ++
 tools/firmware/hvmloader/util.h |  12 +
 tools/libxc/include/xenctrl.h   |  11 +-
 tools/libxc/xc_domain.c |  45 +++-
 tools/libxl/libxl.h |   6 +
 tools/libxl/libxl_arch.h|   7 +
 tools/libxl/libxl_arm.c |   8 +
 tools/libxl/libxl_create.c  |  13 +-
 tools/libxl/libxl_dm.c  | 274 
 tools/libxl/libxl_dom.c |  16 +-
 tools/libxl/libxl_internal.h|  16 +-
 tools/libxl/libxl_pci.c |  12 +-
 tools/libxl/libxl_types.idl |  26 ++
 tools/libxl/libxl_x86.c |  83 ++
 tools/libxl/libxlu_pci.c|  92 ++-
 tools/libxl/libxlutil.h |   4 +
 tools/libxl/xl_cmdimpl.c|  16 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c |  16 +-
 tools/python/xen/lowlevel/xc/xc.c   |  30 ++-
 xen/arch/x86/hvm/hvm.c  |   2 -
 xen/arch/x86/mm.c   |   6 -
 xen/arch/x86/mm/p2m.c   |  43 ++-
 xen/common/compat/memory.c  |  66 +
 xen/common/memory.c |  64 +
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   3 +-
 xen/drivers/passthrough/arm/smmu.c  |   2 +-
 xen/drivers/passthrough/device_tree.c   |   3 +-
 xen/drivers/passthrough/iommu.c |  10 +
 xen/drivers/passthrough/pci.c   |  15 +-
 xen/drivers/passthrough/vtd/dmar.c  |  32 +++
 xen/drivers/passthrough/vtd/dmar.h  |   1 -
 xen/drivers/passthrough/vtd/extern.h|   1 +
 xen

[Xen-devel] [v11][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-07-21 Thread Tiejun Chen
Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v10 ~ v11:

* Noting is changed.

v9:

* Correct one indentation issue

v8:

* Merge two if{} as one if{}

* Add to print RMRR range info when stop assign a group device

v5 ~ v7:

* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 8a8d763..ce5c295 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2293,13 +2293,37 @@ static int intel_iommu_assign_device(
 if ( list_empty(acpi_drhd_units) )
 return -ENODEV;
 
+seg = pdev-seg;
+bus = pdev-bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: in the future we can introduce group device assignment
+ * interface to make sure devices sharing RMRR are assigned to the
+ * same domain together.
+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr-segment == seg 
+ PCI_BUS(bdf) == bus 
+ PCI_DEVFN2(bdf) == devfn 
+ rmrr-scope.devices_cnt  1 )
+{
+printk(XENLOG_G_ERR VTDPREFIX
+cannot assign %04x:%02x:%02x.%u
+with shared RMRR at %PRIx64 for Dom%d.\n,
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+   rmrr-base_address, d-domain_id);
+return -EPERM;
+}
+}
+
 ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
 if ( ret )
 return ret;
 
-seg = pdev-seg;
-bus = pdev-bus;
-
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v11][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs

2015-07-21 Thread Tiejun Chen
Try to avoid placing PCI BARs over RMRRs:

- If mmio_hole_size is not specified, and the existing MMIO range has
  RMRRs in it, and there is space to expand the hole in lowmem without
  moving more memory, then make the MMIO hole as large as possible.

- When placing RMRRs, find the next RMRR higher than the current base
  in the lowmem mmio hole.  If it overlaps, skip ahead of it and find
  the next one.

This certainly won't work in all cases, but it should work in a
significant number of cases.  Additionally, users should be able to
work around problems by setting mmio_hole_size larger in the guest
config.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Reviewed-by: Jan Beulich jbeul...@suse.com
Signed-off-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v11:

* To find the lowest RMRR the _end_ of which is higher than base.

* Refine some code implementations 

v10:

* This is from George' draft patch which implements an acceptable solution in
  current cycle. Here I just implemented check_overlap_all() and some cleanups.

v9:

* A little improvement to code implementation but again, its still argued about
  this solution.

v8:

* Based on current discussion its hard to reshape the original mmio
  allocation mechanism but we haven't a good and simple way to in short term.
  So instead, we don't bring more complicated to intervene that process but
  still check any conflicts to disable all associated devices.

v6 ~ v7:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2.

  2. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3.

  3. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource-base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource-base to 
reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 65 ++
 1 file changed, 65 insertions(+)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..74fc080 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,46 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+/* Check if the specified range conflicts with any reserved device memory. */
+static bool check_overlap_all(uint64_t start, uint64_t size)
+{
+unsigned int i;
+
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ check_overlap(start, size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+return true;
+}
+
+return false;
+}
+
+/* Find the lowest RMRR higher than base. */
+static int find_next_rmrr(uint32_t base)
+{
+unsigned int i;
+int next_rmrr = -1;
+uint64_t end, min_end = (1ull  32);
+
+for ( i = 0; i  memory_map.nr_map ; i

[Xen-devel] [v11][PATCH 05/16] hvmloader: get guest memory map into memory_map[]

2015-07-21 Thread Tiejun Chen
Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END).
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: George Dunlap george.dun...@eu.citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
---
v10 ~ v11:

* Nothing is changed.

v9:

* Correct [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END]
- [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END) in
  the patch head description;
  Merge two if{} as one if{};

v8:

* Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

v5 ~ v7:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate printf()+BUG() and BUG_ON()

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c  | 32 
 tools/firmware/hvmloader/e820.h  |  7 +++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c  | 26 ++
 tools/firmware/hvmloader/util.h  | 12 
 5 files changed, 79 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..7a414ab 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,38 @@
 #include config.h
 #include util.h
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, nr_entries);
+
+if ( rc || !nr_entries )
+{
+printf(Get guest memory maps[%d] failed. (%d)\n, nr_entries, rc);
+BUG();
+}
+
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i  nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr, memory_map.map[i].size) )
+{
+printf(Fail to setup memory map due to conflict);
+printf( on dynamic reserved memory range.\n);
+BUG();
+}
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
 uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
 init_hypercalls();
 
+memory_map_setup();
+
 xenbus_setup();
 
 bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include xen/memory.h
 #include xen/sched.h
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+   uint64_t reserved_start, uint64_t reserved_size)
+{
+return (start + size  reserved_start) 
+(start  reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
 asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
 *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+int rc;
+struct xen_memory_map memmap = {
+.nr_entries = *max_entries
+};
+
+set_xen_guest_handle(memmap.buffer, entries);
+
+rc

[Xen-devel] [v11][PATCH 14/16] xen/vtd: enable USB device assignment

2015-07-21 Thread Tiejun Chen
USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v5 ~ v11:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++-
 xen/drivers/passthrough/vtd/utils.c |  7 ---
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index a2f3a66..8a8d763 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2242,11 +2242,9 @@ static int reassign_device_ownership(
 /*
  * If the device belongs to the hardware domain, and it has RMRR, don't
  * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.
  */
-if ( !is_hardware_domain(source) 
- !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) )
+if ( !is_hardware_domain(source) )
 {
 const struct acpi_rmrr_unit *rmrr;
 u16 bdf;
@@ -2299,13 +2297,8 @@ static int intel_iommu_assign_device(
 if ( ret )
 return ret;
 
-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
 seg = pdev-seg;
 bus = pdev-bus;
-if ( is_usb_device(seg, bus, pdev-devfn) )
-return 0;
 
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include extern.h
 #include asm/io_apic.h
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v11][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-21 Thread Tiejun Chen
This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is strict or relaxed when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
CC: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Stefano Stabellini stefano.stabell...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: George Dunlap george.dun...@eu.citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
---
v10 ~ v11:

* Nothing is changed.

v9:

* Correct one check condition of XEN_DOMCTL_DEV_RDM_RELAXED

v8:

* Force to pass 0(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

v6 ~ v7:

* Nothing is changed.

v5:

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
  0 means strict and 1 means relaxed.

* So make DT device ignore the flag field

* Improve the code comments

v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user 
domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl-u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  - domctl-u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c   |  7 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c  |  2 +-
 xen/drivers/passthrough/device_tree.c   |  3 ++-
 xen/drivers/passthrough/pci.c   | 15 
 xen/drivers/passthrough/vtd/iommu.c | 37 ++---
 xen/include/asm-x86/p2m.h   |  2 +-
 xen/include/public/domctl.h |  3 +++
 xen/include/xen/iommu.h |  2 +-
 9 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 1e763dc..89616b7 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, 
mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-   p2m_access_t p2ma)
+   p2m_access_t p2ma, unsigned int flag)
 {
 p2m_type_t p2mt;
 p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
 ret = 0;
 else
 {
-ret = -EBUSY;
+if ( flag  XEN_DOMCTL_DEV_RDM_RELAXED )
+ret = 0;
+else
+ret = -EBUSY;
 printk(XENLOG_G_WARNING
Cannot setup identity map d%d:%lx,
 gfn already mapped to %lx.\n,
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct 
domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-   struct pci_dev *pdev)
+   struct pci_dev *pdev,
+   u32 flag)
 {
 struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev-seg);
 int bdf = PCI_BDF2(pdev-bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct 
iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-  struct device *dev)
+  struct device *dev, u32 flag)
 {
struct iommu_domain *domain;
struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 5d3842a..7ff79f8 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 goto fail;
 }
 
-rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev));
+/* The flag field doesn't

[Xen-devel] [v11][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-21 Thread Tiejun Chen
While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

#1. Above a predefined boundary (2G)
- move lowmem_end below reserved region to solve conflict;

#2. Below a predefined boundary (2G)
- Check strict/relaxed policy.
strict policy leads to fail libxl. Note when both policies
are specified on a given region, 'strict' is always preferred.
relaxed policy issue a warning message and also mask this entry 
INVALID
to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
---
v11:

* Use GCNEW_ARRAY to replace libxl__malloc()

* #define pfn_to_paddrk is missing safety () around x, and
  move this into libxl_internal.h

* Rename set_rdm_entries() to add_rdm_entry() and put the
  increment at the end so that the assignments are
  to -rdms[d_config-num_rdms].

* Simply make it so that if there are any rdms specified
  in the domain config, they are used instead of the
  automatically gathered information (from strategy and
  devices). So just return if d_config-rmds is valid.

* Shorten some code comments.

v9 ~ v10:

* Nothing is changed.

v8:

* Introduce pfn_to_paddr(x) - ((uint64_t)x  XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* fix some code stypes
* Refine libxl__xc_device_get_rdm()

v5:

* A little change to make sure the per-device policy always override the global
  policy and correct its associated code comments.
* Fix one typo in the patch head description
* Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace
  malloc() with libxl__malloc(), and finally cleanup this fallout.
* libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out 
parameter.

v4:

* Consistent to use term RDM.
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c   | 274 +++
 tools/libxl/libxl_dom.c  |  17 ++-
 tools/libxl/libxl_internal.h |  14 ++-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 311 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 7c884c4..5b57062 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -407,7 +407,7 @@ int libxl__domain_build(libxl__gc *gc,
 
 switch (info-type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-ret = libxl__build_hvm(gc, domid, info, state);
+ret = libxl__build_hvm(gc, domid, d_config, state);
 if (ret)
 goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 634b8d2..29476fc 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -92,6 +92,280 @@ const char *libxl__domain_device_model(libxl__gc *gc,
 return dm;
 }
 
+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+ uint32_t flag,
+ uint16_t seg,
+ uint8_t bus,
+ uint8_t devfn,
+ unsigned int *nr_entries,
+ struct xen_reserved_device_memory **xrdm)
+{
+int rc = 0, r;
+
+/*
+ * We really can't presume how many entries we can get in advance.
+ */
+*nr_entries = 0;
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  NULL, nr_entries);
+assert(r = 0);
+/* 0 means we have no any rdm entry. */
+if (!r) goto out;
+
+if (errno != ENOBUFS) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+GCNEW_ARRAY

[Xen-devel] [v11][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-21 Thread Tiejun Chen
Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
  hypervisor map
* Add in the hypervisor supplied regions
* Adjust the lowmem and highmem regions if we've had to relocate
  memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Reviewed-by: George Dunlap george.dun...@eu.citrix.com
Reviewed-by: Jan Beulich jbeul...@suse.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v11:

* To check/sync memory_map.map[] before copy them into e820 since
  ultimately this can make sure hvm_info, memory_map.map[] and e820
  are on the same page. 

* Refine some code implementations

v10:

* Instead of correcting e820, I'd like to correct memory_map.map[]
  and then copy them into e820 directly. I think this can make sure
  hvm_info, memory_map.map[] and e820 are on the same page.

v9:

* Refine that chunk of codes to check/modify highmem

v8:

* define low_mem_end as uint32_t

* Correct those two wrong loops, memory_map.nr_map - nr
  when we're trying to revise low/high memory e820 entries.

* Improve code comments and the patch head description

* Add one check if highmem is just populated by hvmloader itself

v5 ~ v7:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.
 
 tools/firmware/hvmloader/e820.c | 109 +++-
 1 file changed, 96 insertions(+), 13 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 7a414ab..a6cacdf 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,11 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
+uint32_t add_high_mem = 0;
+uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT;
+uint64_t map_start, map_size, map_end;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -149,13 +153,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;
 
-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +188,102 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));
 
-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ *
+ * Note we just have one low memory entry and one high mmeory entry if
+ * exists.
+ *
+ * But we may have relocated RAM to allocate sufficient MMIO previously
+ * so low_mem_pgend would be changed over there. And here memory_map[]
+ * records the original low/high memory, so if low_mem_end is less than
+ * the original we need to revise low/high memory range firstly.
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
+map_start = memory_map.map[i].addr;
+map_size = memory_map.map[i].size;
+map_end = map_start + map_size;
+
+/* If we need to adjust lowmem. */
+if ( memory_map.map[i].type == E820_RAM 
+ low_mem_end  map_start  low_mem_end  map_end )
+{
+add_high_mem = map_end - low_mem_end;
+memory_map.map[i].size = low_mem_end - map_start;
+break;
+}
+}
+
+/* If we need to adjust highmem. */
+if ( add_high_mem )
+{
+/* Modify the existing highmem

[Xen-devel] [v11][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy

2015-07-21 Thread Tiejun Chen
This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
CC: David Scott dave.sc...@eu.citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v11:

* Nothing is changed.

v5:

* Fix the flag field as 0 to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h   |  3 ++-
 tools/libxc/xc_domain.c |  9 -
 tools/libxl/libxl_pci.c |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 
 tools/python/xen/lowlevel/xc/xc.c   | 30 --
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 2991333..5c535c4 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2067,7 +2067,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
  uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
  uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 298b3b5..69e6d8f 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
 xc_interface *xch,
 uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
 {
 DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
 domctl.domain = domid;
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
 domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;
 
 return do_domctl(xch, domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
 domctl.u.assign_device.u.dt.size = size;
+/*
+ * DT doesn't own any RDM so actually DT has nothing to do
+ * for any flag and here just fix that as 0.
+ */
+domctl.u.assign_device.flag = 0;
 set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
 rc = do_domctl(xch, domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc  0  (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed);
 return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value 
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
 {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;
 
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, 
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
 
-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
if (ret  0

[Xen-devel] [v11][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map

2015-07-21 Thread Tiejun Chen
From: Jan Beulich jbeul...@suse.com

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich jbeul...@suse.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Jan Beulich jbeul...@suse.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v7 ~ v11:

* Nothing is changed.

v6:

* Add a comments to the nr_entries field inside xen_reserved_device_memory_map

v5 ~ v4:

* Nothing is changed.

 xen/common/compat/memory.c   | 66 
 xen/common/memory.c  | 64 ++
 xen/drivers/passthrough/iommu.c  | 10 ++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h  | 37 +++-
 xen/include/xen/iommu.h  | 10 ++
 xen/include/xen/pci.h|  2 ++
 xen/include/xlat.lst |  3 +-
 10 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct compat_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+struct compat_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+return -ERANGE;
+
+if ( __copy_to_compat_offset(grdm-map.buffer,
+ grdm-used_entries,
+ rdm,
+ 1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1;
+}
+
+return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
 int split, op = cmd  MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) compat)
 break;
 }
 
+#ifdef HAS_PASSTHROUGH
+case XENMEM_reserved_device_memory_map:
+{
+struct get_reserved_device_memory grdm;
+
+if ( copy_from_guest(grdm.map, compat, 1) ||
+ !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+return -EFAULT;
+
+grdm.used_entries = 0;
+rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+  grdm);
+
+if ( !rc  grdm.map.nr_entries  grdm.used_entries )
+rc = -ENOBUFS;
+
+grdm.map.nr_entries = grdm.used_entries;
+if ( grdm.map.nr_entries )
+{
+if ( __copy_to_guest(compat, grdm.map, 1) )
+rc = -EFAULT;
+}
+
+return rc;
+}
+#endif
+
 default:
 return compat_arch_memory_op(cmd, compat);
 }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index e5d49d8..2fa45d0 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
 return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct xen_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+struct xen_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+if ( __copy_to_guest_offset(grdm-map.buffer,
+grdm-used_entries,
+rdm,
+1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1

[Xen-devel] [v10][PATCH 00/16] Fix RMRR

2015-07-20 Thread Tiejun Chen
 passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.


Jan Beulich (1):
  xen: introduce XENMEM_reserved_device_memory_map

Tiejun Chen (15):
  xen/vtd: create RMRR mapping
  xen/passthrough: extend hypercall to support rdm reservation policy
  xen: enable XENMEM_memory_map in hvm
  hvmloader: get guest memory map into memory_map[]
  hvmloader/pci: skip reserved ranges
  hvmloader/e820: construct guest e820 table
  tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  tools: extend xc_assign_device() to support rdm reservation policy
  tools: introduce some new parameters to set rdm policy
  tools/libxl: detect and avoid conflicts with RDM
  tools: introduce a new parameter to set a predefined rdm boundary
  libxl: construct e820 map with RDM information for HVM guest
  xen/vtd: enable USB device assignment
  xen/vtd: prevent from assign the device with shared rmrr
  tools: parse to enable new rdm policy parameters

Jan Beulich (1):
  xen: introduce XENMEM_reserved_device_memory_map

 docs/man/xl.cfg.pod.5   | 103 
 docs/misc/vtd.txt   |  24 ++
 tools/firmware/hvmloader/e820.c | 140 +-
 tools/firmware/hvmloader/e820.h |   7 +
 tools/firmware/hvmloader/hvmloader.c|   2 +
 tools/firmware/hvmloader/pci.c  |  62 +
 tools/firmware/hvmloader/util.c |  26 ++
 tools/firmware/hvmloader/util.h |  12 +
 tools/libxc/include/xenctrl.h   |  11 +-
 tools/libxc/xc_domain.c |  45 +++-
 tools/libxl/libxl.h |   6 +
 tools/libxl/libxl_arch.h|   7 +
 tools/libxl/libxl_arm.c |   8 +
 tools/libxl/libxl_create.c  |  13 +-
 tools/libxl/libxl_dm.c  | 273 
 tools/libxl/libxl_dom.c |  16 +-
 tools/libxl/libxl_internal.h|  13 +-
 tools/libxl/libxl_pci.c |  12 +-
 tools/libxl/libxl_types.idl |  26 ++
 tools/libxl/libxl_x86.c |  83 ++
 tools/libxl/libxlu_pci.c|  92 ++-
 tools/libxl/libxlutil.h |   4 +
 tools/libxl/xl_cmdimpl.c|  16 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c |  16 +-
 tools/python/xen/lowlevel/xc/xc.c   |  30 ++-
 xen/arch/x86/hvm/hvm.c  |   2 -
 xen/arch/x86/mm.c   |   6 -
 xen/arch/x86/mm/p2m.c   |  43 ++-
 xen/common/compat/memory.c  |  66 +
 xen/common/memory.c |  64 +
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   3 +-
 xen/drivers/passthrough/arm/smmu.c  |   2 +-
 xen/drivers/passthrough/device_tree.c   |   3 +-
 xen/drivers/passthrough/iommu.c |  10 +
 xen/drivers/passthrough/pci.c   |  15 +-
 xen/drivers/passthrough/vtd/dmar.c  |  32 +++
 xen/drivers/passthrough/vtd/dmar.h  |   1 -
 xen/drivers/passthrough/vtd/extern.h|   1 +
 xen/drivers/passthrough/vtd/iommu.c |  82 --
 xen/drivers/passthrough/vtd/utils.c |   7 -
 xen/include/asm-x86/p2m.h   |  13 +-
 xen/include/public/domctl.h |   3 +
 xen/include/public/memory.h |  37 ++-
 xen/include/xen/iommu.h |  12 +-
 xen/include/xen/pci.h   |   2 +
 xen/include/xlat.lst|   3 +-
 46 files changed, 1367 insertions(+), 87 deletions(-)

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v10][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary

2015-07-20 Thread Tiejun Chen
Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter rdm_mem_boundary in .cfg file.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8 ~ v10:

* Nothing is changed.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable rdm_mem_boundary_memkb specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5   | 22 ++
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_create.c  |  4 
 tools/libxl/libxl_dom.c |  8 +---
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c|  3 +++
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 6c55a8b..23068ec 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -867,6 +867,28 @@ More information about Xen gfx_passthru feature is 
available
 on the XenVGAPassthrough Lhttp://wiki.xen.org/wiki/XenVGAPassthrough
 wiki page.
 
+=item Brdm_mem_boundary=MBYTES
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+#1. Above a predefined boundary
+- move lowmem_end below reserved region to solve conflict;
+
+#2. Below a predefined boundary
+- Check strict/relaxed policy.
+strict policy leads to fail libxl. Note when both policies
+are specified on a given region, 'strict' is always preferred.
+relaxed policy issue a warning message and also mask this
+entry INVALID to indicate we shouldn't expose this entry to
+hvmloader.
+
+Here the default is 2G.
+
 =item Bdtdev=[ DTDEV_PATH, DTDEV_PATH, ... ]
 
 Specifies the host device tree nodes to passthrough to this guest. Each
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..6f157c9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
 uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c8a32d5..3de86a6 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, 
libxl_domain_build_info *b_info)
 {
 if (b_info-u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
 b_info-u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+
+if (b_info-u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+b_info-u.hvm.rdm_mem_boundary_memkb =
+LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 80fa17d..e41d54a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 int ret, rc = ERROR_FAIL;
 uint64_t mmio_start, lowmem_end, highmem_end;
 libxl_domain_build_info *const info = d_config-b_info;
-/*
- * Currently we fix this as 2G to guarantee how to handle
- * our rdm policy. But we'll provide a parameter to set
- * this dynamically.
- */
-uint64_t rdm_mem_boundary = 0x8000;
 
 memset(args, 0, sizeof(struct xc_hvm_build_args));
 /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 args.mmio_start = mmio_start;
 
 rc = libxl__domain_device_construct_rdm(gc, d_config,
-rdm_mem_boundary,
+
info-u.hvm.rdm_mem_boundary_memkb*1024,
 args);
 if (rc) {
 LOG(ERROR, checking reserved device memory failed);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a3ad8d1..4eb4f8a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl

[Xen-devel] [v10][PATCH 05/16] hvmloader: get guest memory map into memory_map[]

2015-07-20 Thread Tiejun Chen
Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END).
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: George Dunlap george.dun...@eu.citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
---
v10:

* Nothing is changed.

v9:

* Correct [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END]
- [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END) in
  the patch head description;
  Merge two if{} as one if{};

v8:

* Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

v5 ~ v7:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate printf()+BUG() and BUG_ON()

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c  | 32 
 tools/firmware/hvmloader/e820.h  |  7 +++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c  | 26 ++
 tools/firmware/hvmloader/util.h  | 12 
 5 files changed, 79 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..7a414ab 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,38 @@
 #include config.h
 #include util.h
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, nr_entries);
+
+if ( rc || !nr_entries )
+{
+printf(Get guest memory maps[%d] failed. (%d)\n, nr_entries, rc);
+BUG();
+}
+
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i  nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr, memory_map.map[i].size) )
+{
+printf(Fail to setup memory map due to conflict);
+printf( on dynamic reserved memory range.\n);
+BUG();
+}
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
 uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
 init_hypercalls();
 
+memory_map_setup();
+
 xenbus_setup();
 
 bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include xen/memory.h
 #include xen/sched.h
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+   uint64_t reserved_start, uint64_t reserved_size)
+{
+return (start + size  reserved_start) 
+(start  reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
 asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
 *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+int rc;
+struct xen_memory_map memmap = {
+.nr_entries = *max_entries
+};
+
+set_xen_guest_handle(memmap.buffer, entries);
+
+rc

[Xen-devel] [v10][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-20 Thread Tiejun Chen
While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

#1. Above a predefined boundary (2G)
- move lowmem_end below reserved region to solve conflict;

#2. Below a predefined boundary (2G)
- Check strict/relaxed policy.
strict policy leads to fail libxl. Note when both policies
are specified on a given region, 'strict' is always preferred.
relaxed policy issue a warning message and also mask this entry 
INVALID
to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
---
v9 ~ v10:

* Nothing is changed.

v8:

* Introduce pfn_to_paddr(x) - ((uint64_t)x  XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* fix some code stypes
* Refine libxl__xc_device_get_rdm()

v5:

* A little change to make sure the per-device policy always override the global
  policy and correct its associated code comments.
* Fix one typo in the patch head description
* Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace
  malloc() with libxl__malloc(), and finally cleanup this fallout.
* libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out 
parameter.

v4:

* Consistent to use term RDM.
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c   | 273 +++
 tools/libxl/libxl_dom.c  |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 307 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f75d4f1..c8a32d5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -459,7 +459,7 @@ int libxl__domain_build(libxl__gc *gc,
 
 switch (info-type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-ret = libxl__build_hvm(gc, domid, info, state);
+ret = libxl__build_hvm(gc, domid, d_config, state);
 if (ret)
 goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 317a8eb..692258b 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,279 @@ const char *libxl__domain_device_model(libxl__gc *gc,
 return dm;
 }
 
+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+ uint32_t flag,
+ uint16_t seg,
+ uint8_t bus,
+ uint8_t devfn,
+ unsigned int *nr_entries,
+ struct xen_reserved_device_memory **xrdm)
+{
+int rc = 0, r;
+
+/*
+ * We really can't presume how many entries we can get in advance.
+ */
+*nr_entries = 0;
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  NULL, nr_entries);
+assert(r = 0);
+/* 0 means we have no any rdm entry. */
+if (!r) goto out;
+
+if (errno != ENOBUFS) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+*xrdm = libxl__malloc(gc,
+  *nr_entries * sizeof(xen_reserved_device_memory_t));
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  *xrdm, nr_entries);
+if (r)
+rc = ERROR_FAIL;
+
+ out:
+if (rc) {
+*nr_entries = 0;
+*xrdm = NULL;
+LOG(ERROR, Could not get reserved device memory maps.\n);
+}
+return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else

[Xen-devel] [v10][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-20 Thread Tiejun Chen
This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Global RDM parameter, strategy, allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v9 ~ v10:

* Nothing is changed.

v8:

* One minimal code style change

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

v6:

* Some rename to make our policy reasonable
  type - strategy
  none - ignore
* Don't expose ignore in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 81 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   |  7 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  9 +
 tools/libxl/libxl_types.idl  | 18 ++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..6c55a8b 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item Brdm=RDM_RESERVATION_STRING
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+BRDM_RESERVE_STRING has the form C[KEY=VALUE,KEY=VALUE,... where:
+
+=over 4
+
+=item BKEY=VALUE
+
+Possible BKEYs are:
+
+=over 4
+
+=item Bstrategy=STRING
+
+Currently there is only one valid type:
+
+host means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using host includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does 
exist.
+
+For example, you're trying to set memory = 2800 to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If Bstrategy is set to host, for example,
+
+rdm = strategy=host,policy=strict or rdm = strategy=host,policy=relaxed
+
+It means all conflicts will be handled according to the policy
+introduced by Bpolicy as described below.
+
+#2. If Bstrategy is not set at all, but
+
+pci = [ 'sbdf_A, rdm_policy=x' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by Brdm_policy=STRING as described inside pci options.
+
+=item Bpolicy=STRING

[Xen-devel] [v10][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-20 Thread Tiejun Chen
Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
  hypervisor map
* Add in the hypervisor supplied regions
* Adjust the lowmem and highmem regions if we've had to relocate
  memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v10:

* Instead of correcting e820, I'd like to correct memory_map.map[]
  and then copy them into e820 directly. I think this can make sure
  hvm_info, memory_map.map[] and e820 are on the same page.

v9:

* Refine that chunk of codes to check/modify highmem

v8:

* define low_mem_end as uint32_t

* Correct those two wrong loops, memory_map.nr_map - nr
  when we're trying to revise low/high memory e820 entries.

* Improve code comments and the patch head description

* Add one check if highmem is just populated by hvmloader itself

v5 ~ v7:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.
 
 tools/firmware/hvmloader/e820.c | 108 +++-
 1 file changed, 95 insertions(+), 13 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 7a414ab..ca794ad 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,11 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
+uint32_t add_high_mem = 0;
+uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT;
+uint64_t map_start, map_size, map_end;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -149,13 +153,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;
 
-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +188,101 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));
 
-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ *
+ * Note we just have one low memory entry and one high mmeory entry if
+ * exists.
+ *
+ * But we may have relocated RAM to allocate sufficient MMIO previously
+ * so low_mem_pgend would be changed over there. And here memory_map[]
+ * records the original low/high memory, so if low_mem_end is less than
+ * the original we need to revise low/high memory range firstly.
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
+map_start = memory_map.map[i].addr;
+map_size = memory_map.map[i].size;
+map_end = map_start + map_size;
+
+/* If we need to adjust lowmem. */
+if ( memory_map.map[i].type == E820_RAM 
+ low_mem_end  map_start  low_mem_end  map_end )
+{
+add_high_mem = map_end - low_mem_end;
+memory_map.map[i].size = low_mem_end - map_start;
+break;
+}
+}
+
+/* If we need to adjust highmem. */
+if ( add_high_mem )
+{
+/* Modify the existing highmem region if it exists. */
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+map_start = memory_map.map[i].addr;
+map_size = memory_map.map[i].size;
+map_end = map_start + map_size;
+
+if ( memory_map.map[i].type == E820_RAM

[Xen-devel] [v10][PATCH 16/16] tools: parse to enable new rdm policy parameters

2015-07-20 Thread Tiejun Chen
This patch parses to enable user configurable parameters to specify
RDM resource and according policies which are defined previously,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v9 ~ v10:

* Nothing is changed.

v8:

* Clean some codes style issues.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Just sync those renames introduced by patch #10.

v5:

* Need a rebase after we make all rdm variables specific to .hvm.
* Like other pci option, the per-device policy always follows
  the global policy by default.

v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 92 +++-
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 13 +++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..026413b 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, 
unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE  9
+#define STATE_RDM_STRATEGY  10
+#define STATE_RESERVE_POLICY11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char 
*str)
 {
 unsigned state = STATE_DOMAIN;
@@ -143,7 +146,18 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci 
*pcidev, const char *str
 pcidev-permissive = atoi(tok);
 }else if ( !strcmp(optkey, seize) ) {
 pcidev-seize = atoi(tok);
-}else{
+} else if (!strcmp(optkey, rdm_policy)) {
+if (!strcmp(tok, strict)) {
+pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if (!strcmp(tok, relaxed)) {
+pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, %s is not an valid PCI RDM property
+   policy: 'strict' or 'relaxed'.,
+ tok);
+goto parse_error;
+}
+} else {
 XLU__PCI_ERR(cfg, Unknown PCI BDF option: %s, optkey);
 }
 tok = ptr + 1;
@@ -167,6 +181,82 @@ parse_error:
 return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+unsigned state = STATE_TYPE;
+char *buf2, *tok, *ptr, *end;
+
+if (NULL == (buf2 = ptr = strdup(str)))
+return ERROR_NOMEM;
+
+for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr  end; ptr++) {
+switch(state) {
+case STATE_TYPE:
+if (*ptr == '=') {
+state = STATE_RDM_STRATEGY;
+*ptr = '\0';
+if (strcmp(tok, strategy)) {
+XLU__PCI_ERR(cfg, Unknown RDM state option: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RDM_STRATEGY:
+if (*ptr == '\0' || *ptr == ',') {
+state = STATE_RESERVE_POLICY;
+*ptr = '\0';
+if (!strcmp(tok, host)) {
+rdm-strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST;
+} else {
+XLU__PCI_ERR(cfg, Unknown RDM strategy option: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RESERVE_POLICY:
+if (*ptr == '=') {
+state = STATE_OPTIONS_V;
+*ptr = '\0';
+if (strcmp(tok, policy)) {
+XLU__PCI_ERR(cfg, Unknown RDM property value: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_OPTIONS_V:
+if (*ptr == ',' || *ptr == '\0') {
+state = STATE_TERMINAL;
+*ptr = '\0';
+if (!strcmp(tok, strict)) {
+rdm-policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if (!strcmp(tok

[Xen-devel] [v10][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map

2015-07-20 Thread Tiejun Chen
From: Jan Beulich jbeul...@suse.com

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich jbeul...@suse.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Jan Beulich jbeul...@suse.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v7 ~ v10:

* Nothing is changed.

v6:

* Add a comments to the nr_entries field inside xen_reserved_device_memory_map

v5 ~ v4:

* Nothing is changed.

 xen/common/compat/memory.c   | 66 
 xen/common/memory.c  | 64 ++
 xen/drivers/passthrough/iommu.c  | 10 ++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h  | 37 +++-
 xen/include/xen/iommu.h  | 10 ++
 xen/include/xen/pci.h|  2 ++
 xen/include/xlat.lst |  3 +-
 10 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct compat_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+struct compat_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+return -ERANGE;
+
+if ( __copy_to_compat_offset(grdm-map.buffer,
+ grdm-used_entries,
+ rdm,
+ 1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1;
+}
+
+return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
 int split, op = cmd  MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) compat)
 break;
 }
 
+#ifdef HAS_PASSTHROUGH
+case XENMEM_reserved_device_memory_map:
+{
+struct get_reserved_device_memory grdm;
+
+if ( copy_from_guest(grdm.map, compat, 1) ||
+ !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+return -EFAULT;
+
+grdm.used_entries = 0;
+rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+  grdm);
+
+if ( !rc  grdm.map.nr_entries  grdm.used_entries )
+rc = -ENOBUFS;
+
+grdm.map.nr_entries = grdm.used_entries;
+if ( grdm.map.nr_entries )
+{
+if ( __copy_to_guest(compat, grdm.map, 1) )
+rc = -EFAULT;
+}
+
+return rc;
+}
+#endif
+
 default:
 return compat_arch_memory_op(cmd, compat);
 }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
 return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct xen_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+struct xen_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+if ( __copy_to_guest_offset(grdm-map.buffer,
+grdm-used_entries,
+rdm,
+1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1

[Xen-devel] [v10][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map

2015-07-20 Thread Tiejun Chen
We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v4 ~ v10:

* Nothing is changed.

 tools/libxc/include/xenctrl.h |  8 
 tools/libxc/xc_domain.c   | 36 
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..9160623 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
   uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ce51e69..0951291 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
 return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries)
+{
+int rc;
+struct xen_reserved_device_memory_map xrdmmap = {
+.flag = flag,
+.seg = seg,
+.bus = bus,
+.devfn = devfn,
+.nr_entries = *max_entries
+};
+DECLARE_HYPERCALL_BOUNCE(entries,
+ sizeof(struct xen_reserved_device_memory) *
+ *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+if ( xc_hypercall_bounce_pre(xch, entries) )
+return -1;
+
+set_xen_guest_handle(xrdmmap.buffer, entries);
+
+rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+  xrdmmap, sizeof(xrdmmap));
+
+xc_hypercall_bounce_post(xch, entries);
+
+*max_entries = xrdmmap.nr_entries;
+
+return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v10][PATCH 04/16] xen: enable XENMEM_memory_map in hvm

2015-07-20 Thread Tiejun Chen
This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: George Dunlap george.dun...@eu.citrix.com
---
v5 ~ v10:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c  | 6 --
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-if ( is_hvm_domain(d) )
-{
-rcu_unlock_domain(d);
-return -EPERM;
-}
-
 e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
 if ( e820 == NULL )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v10][PATCH 02/16] xen/vtd: create RMRR mapping

2015-07-20 Thread Tiejun Chen
RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Acked-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v10:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m-domain) )
return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c   | 40 +++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h   | 13 +---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long 
gfn, unsigned long mfn,
  p2m-default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
   unsigned long mfn, unsigned int page_order)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int rc;
 gfn_lock(p2m, gfn, page_order);
-p2m_remove_page(p2m, gfn, mfn, page_order);
+rc = p2m_remove_page(p2m, gfn, mfn, page_order);
 gfn_unlock(p2m, gfn, page_order);
+return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma)
+{
+p2m_type_t p2mt;
+p2m_access_t a;
+mfn_t mfn;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+
+if ( !paging_mode_translate(p2m-domain) )
+return 0;
+
+gfn_lock(p2m, gfn, 0);
+
+mfn = p2m-get_entry(p2m, gfn, p2mt, a, 0, NULL);
+
+if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+p2m_mmio_direct, p2ma);
+else if ( mfn_x(mfn) == gfn  p2mt == p2m_mmio_direct  a == p2ma )
+ret = 0;
+else
+{
+ret = -EBUSY;
+printk(XENLOG_G_WARNING
+   Cannot setup identity map d%d:%lx,
+gfn already mapped to %lx.\n,
+   d-domain_id, gfn, mfn_x(mfn));
+}
+
+gfn_unlock(p2m, gfn, 0);
+return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( clear_identity_p2m_entry(d, base_pfn, 0) )
 ret = -ENXIO;
 base_pfn++;
 }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-int err = intel_iommu_map_page(d, base_pfn, base_pfn

[Xen-devel] [v10][PATCH 06/16] hvmloader/pci: Try to avoid placing BARs in RMRRs

2015-07-20 Thread Tiejun Chen
Try to avoid placing PCI BARs over RMRRs:

- If mmio_hole_size is not specified, and the existing MMIO range has
  RMRRs in it, and there is space to expand the hole in lowmem without
  moving more memory, then make the MMIO hole as large as possible.

- When placing RMRRs, find the next RMRR higher than the current base
  in the lowmem mmio hole.  If it overlaps, skip ahead of it and find
  the next one.

This certainly won't work in all cases, but it should work in a
significant number of cases.  Additionally, users should be able to
work around problems by setting mmio_hole_size larger in the guest
config.

Signed-off-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v10:

* This is from George' draft patch which implements an acceptable solution in
  current cycle. Here I just implemented check_overlap_all() and some cleanups.

v9:

* A little improvement to code implementation but again, its still argued about
  this solution.

v8:

* Based on current discussion its hard to reshape the original mmio
  allocation mechanism but we haven't a good and simple way to in short term.
  So instead, we don't bring more complicated to intervene that process but
  still check any conflicts to disable all associated devices.

v6 ~ v7:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2.

  2. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3.

  3. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource-base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource-base to 
reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 62 ++
 1 file changed, 62 insertions(+)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..f229a91 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,43 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+/* Check if any conflicts with all reserved device memory. */
+static bool check_overlap_all(uint64_t start, uint64_t size)
+{
+unsigned int i;
+
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ check_overlap(start, size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+return true;
+}
+
+return false;
+}
+
+/* Find the lowest RMRR higher than base. */
+static int find_next_rmrr(uint32_t base)
+{
+unsigned int i;
+int next_rmrr = -1;
+uint64_t min_base = (1ull  32);
+
+for ( i = 0; i  memory_map.nr_map ; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ memory_map.map[i].addr  base 
+ memory_map.map[i].addr  min_base )
+{
+next_rmrr = i;
+min_base = memory_map.map[i].addr;
+}
+}
+return next_rmrr;
+}
+
 void pci_setup(void)
 {
 uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -46,6 +83,7 @@ void pci_setup(void)
 uint32_t vga_devfn = 256;
 uint16_t class

[Xen-devel] [v10][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-20 Thread Tiejun Chen
Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8 ~ v10:

* Nothing is changed.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable rdm_mem_boundary_memkb specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 tools/libxl/libxl_arch.h |  7 
 tools/libxl/libxl_arm.c  |  8 +
 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_x86.c  | 83 
 4 files changed, 103 insertions(+)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d04871c..939178a 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -49,4 +49,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
 _hidden
 int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);
 
+/* arch specific to construct memory mapping function */
+_hidden
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args);
+
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index f09c860..1526467 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -926,6 +926,14 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 return xc_domain_bind_pt_spi_irq(CTX-xch, domid, irq, irq);
 }
 
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args)
+{
+return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e41d54a..a8c6aa9 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
+if (libxl__arch_domain_construct_memmap(gc, d_config, domid, args)) {
+LOG(ERROR, setting domain memory map failed);
+goto out;
+}
+
 ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port,
state-store_mfn, state-console_port,
state-console_mfn, state-store_domid,
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..66b3d7f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x10
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args)
+{
+int rc = 0;
+unsigned int nr = 0, i;
+/* We always own at least one lowmem entry. */
+unsigned int e820_entries = 1;
+struct e820entry *e820 = NULL;
+uint64_t highmem_size =
+args-highmem_end ? args-highmem_end - (1ull  32) : 0;
+
+/* Add all rdm entries. */
+for (i = 0; i  d_config-num_rdms; i++)
+if (d_config-rdms[i].policy != LIBXL_RDM_RESERVE_POLICY_INVALID)
+e820_entries++;
+
+
+/* If we should have a highmem range. */
+if (highmem_size)
+e820_entries++;
+
+if (e820_entries = E820MAX) {
+LOG(ERROR, Ooops! Too many entries in the memory map!\n);
+rc = ERROR_INVAL;
+goto out;
+}
+
+e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+/* Low

[Xen-devel] [v10][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-20 Thread Tiejun Chen
This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is strict or relaxed when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
CC: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Stefano Stabellini stefano.stabell...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: George Dunlap george.dun...@eu.citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
---
v10:

* Nothing is changed.

v9:

* Correct one check condition of XEN_DOMCTL_DEV_RDM_RELAXED

v8:

* Force to pass 0(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

v6 ~ v7:

* Nothing is changed.

v5:

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
  0 means strict and 1 means relaxed.

* So make DT device ignore the flag field

* Improve the code comments

v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user 
domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl-u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  - domctl-u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c   |  7 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c  |  2 +-
 xen/drivers/passthrough/device_tree.c   |  3 ++-
 xen/drivers/passthrough/pci.c   | 15 
 xen/drivers/passthrough/vtd/iommu.c | 37 ++---
 xen/include/asm-x86/p2m.h   |  2 +-
 xen/include/public/domctl.h |  3 +++
 xen/include/xen/iommu.h |  2 +-
 9 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 99a26ca..47785dc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, 
mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-   p2m_access_t p2ma)
+   p2m_access_t p2ma, unsigned int flag)
 {
 p2m_type_t p2mt;
 p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
 ret = 0;
 else
 {
-ret = -EBUSY;
+if ( flag  XEN_DOMCTL_DEV_RDM_RELAXED )
+ret = 0;
+else
+ret = -EBUSY;
 printk(XENLOG_G_WARNING
Cannot setup identity map d%d:%lx,
 gfn already mapped to %lx.\n,
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct 
domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-   struct pci_dev *pdev)
+   struct pci_dev *pdev,
+   u32 flag)
 {
 struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev-seg);
 int bdf = PCI_BDF2(pdev-bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct 
iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-  struct device *dev)
+  struct device *dev, u32 flag)
 {
struct iommu_domain *domain;
struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 5d3842a..7ff79f8 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 goto fail;
 }
 
-rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev));
+/* The flag field doesn't matter

[Xen-devel] [v10][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy

2015-07-20 Thread Tiejun Chen
This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
CC: David Scott dave.sc...@eu.citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v10:

* Nothing is changed.

v5:

* Fix the flag field as 0 to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h   |  3 ++-
 tools/libxc/xc_domain.c |  9 -
 tools/libxl/libxl_pci.c |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 
 tools/python/xen/lowlevel/xc/xc.c   | 30 --
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
  uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
  uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
 xc_interface *xch,
 uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
 {
 DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
 domctl.domain = domid;
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
 domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;
 
 return do_domctl(xch, domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
 domctl.u.assign_device.u.dt.size = size;
+/*
+ * DT doesn't own any RDM so actually DT has nothing to do
+ * for any flag and here just fix that as 0.
+ */
+domctl.u.assign_device.flag = 0;
 set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
 rc = do_domctl(xch, domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc  0  (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed);
 return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value 
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
 {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;
 
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, 
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
 
-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
if (ret  0

[Xen-devel] [v10][PATCH 14/16] xen/vtd: enable USB device assignment

2015-07-20 Thread Tiejun Chen
USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v5 ~ v10:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++-
 xen/drivers/passthrough/vtd/utils.c |  7 ---
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index b5d658e..c8b0455 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2243,11 +2243,9 @@ static int reassign_device_ownership(
 /*
  * If the device belongs to the hardware domain, and it has RMRR, don't
  * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.
  */
-if ( !is_hardware_domain(source) 
- !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) )
+if ( !is_hardware_domain(source) )
 {
 const struct acpi_rmrr_unit *rmrr;
 u16 bdf;
@@ -2300,13 +2298,8 @@ static int intel_iommu_assign_device(
 if ( ret )
 return ret;
 
-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
 seg = pdev-seg;
 bus = pdev-bus;
-if ( is_usb_device(seg, bus, pdev-devfn) )
-return 0;
 
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include extern.h
 #include asm/io_apic.h
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v10][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-07-20 Thread Tiejun Chen
Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v10:

* Noting is changed.

v9:

* Correct one indentation issue

v8:

* Merge two if{} as one if{}

* Add to print RMRR range info when stop assign a group device

v5 ~ v7:

* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c8b0455..770e484 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2294,13 +2294,37 @@ static int intel_iommu_assign_device(
 if ( list_empty(acpi_drhd_units) )
 return -ENODEV;
 
+seg = pdev-seg;
+bus = pdev-bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: in the future we can introduce group device assignment
+ * interface to make sure devices sharing RMRR are assigned to the
+ * same domain together.
+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr-segment == seg 
+ PCI_BUS(bdf) == bus 
+ PCI_DEVFN2(bdf) == devfn 
+ rmrr-scope.devices_cnt  1 )
+{
+printk(XENLOG_G_ERR VTDPREFIX
+cannot assign %04x:%02x:%02x.%u
+with shared RMRR at %PRIx64 for Dom%d.\n,
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+   rmrr-base_address, d-domain_id);
+return -EPERM;
+}
+}
+
 ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
 if ( ret )
 return ret;
 
-seg = pdev-seg;
-bus = pdev-bus;
-
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v8][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy

2015-07-16 Thread Tiejun Chen
This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
CC: David Scott dave.sc...@eu.citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v8:

* Nothing is changed.

v5:

* Fix the flag field as 0 to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h   |  3 ++-
 tools/libxc/xc_domain.c |  9 -
 tools/libxl/libxl_pci.c |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 
 tools/python/xen/lowlevel/xc/xc.c   | 30 --
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
  uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
  uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
 xc_interface *xch,
 uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
 {
 DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
 domctl.domain = domid;
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
 domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;
 
 return do_domctl(xch, domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
 domctl.u.assign_device.u.dt.size = size;
+/*
+ * DT doesn't own any RDM so actually DT has nothing to do
+ * for any flag and here just fix that as 0.
+ */
+domctl.u.assign_device.flag = 0;
 set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
 rc = do_domctl(xch, domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc  0  (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed);
 return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value 
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
 {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;
 
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, 
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
 
-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
if (ret  0

[Xen-devel] [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-16 Thread Tiejun Chen
Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
  hypervisor map
* Add in the hypervisor regions
* Adjust the lowmem and highmem regions if we've had to relocate
  memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8:

* define low_mem_end as uint32_t

* Correct those two wrong loops, memory_map.nr_map - nr
  when we're trying to revise low/high memory e820 entries.

* Improve code comments and the patch head description

* Add one check if highmem is just populated by hvmloader itself

v5 ~ v7:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.
 
 
 tools/firmware/hvmloader/e820.c | 92 +
 1 file changed, 83 insertions(+), 9 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index b72baa5..aa678a7 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint64_t add_high_mem = 0;
+uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;
 
-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -194,9 +189,73 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+e820[nr] = memory_map.map[i];
+nr++;
+}
+
+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));
 
-if ( hvm_info-high_mem_pgend )
+/*
+ * Its possible to relocate RAM to allocate sufficient MMIO previously
+ * so low_mem_pgend would be changed over there. And here memory_map[]
+ * records the original low/high memory, so if low_mem_end is less than
+ * the original we need to revise low/high memory range in e820.
+ */
+for ( i = 0; i  nr; i++ )
 {
+uint64_t end = e820[i].addr + e820[i].size;
+if ( e820[i].type == E820_RAM 
+ low_mem_end  e820[i].addr  low_mem_end  end )
+{
+add_high_mem = end - low_mem_end;
+e820[i].size = low_mem_end - e820[i].addr;
+}
+}
+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+for ( i = 0; i  nr; i++ )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr == (1ull  32))
+{
+e820[i].size += add_high_mem;
+add_high_mem = 0;
+break;
+}
+}
+}
+
+/* Or this is just populated by hvmloader itself. */
+if ( add_high_mem )
+{
+/*
+ * hvmloader should always update hvm_info-high_mem_pgend
+ * when it relocates RAM anywhere.
+ */
+BUG_ON( !hvm_info-high_mem_pgend );
+
 e820[nr].addr = ((uint64_t)1  32);
 e820[nr].size =
 ((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
@@ -204,6 +263,21 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
+/* Finally we need to sort all e820 entries. */
+for ( j = 0; j  nr-1; j++ )
+{
+for ( i = j+1; i  nr; i++ )
+{
+if ( e820[j].addr  e820[i].addr )
+{
+struct e820entry tmp

[Xen-devel] [v8][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-07-16 Thread Tiejun Chen
Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v8:

* Merge two if{} as one if{}

* Add to print RMRR range info when stop assign a group device

v5 ~ v7:

* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c8b0455..8b7e18f 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2294,13 +2294,37 @@ static int intel_iommu_assign_device(
 if ( list_empty(acpi_drhd_units) )
 return -ENODEV;
 
+seg = pdev-seg;
+bus = pdev-bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: in the future we can introduce group device assignment
+ * interface to make sure devices sharing RMRR are assigned to the
+ * same domain together.
+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr-segment == seg 
+ PCI_BUS(bdf) == bus 
+ PCI_DEVFN2(bdf) == devfn 
+ rmrr-scope.devices_cnt  1 )
+{
+printk(XENLOG_G_ERR VTDPREFIX
+cannot assign %04x:%02x:%02x.%u
+with shared RMRR at %PRIx64 for Dom%d.\n,
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+   rmrr-base_address, d-domain_id);
+return -EPERM;
+}
+}
+
 ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
 if ( ret )
 return ret;
 
-seg = pdev-seg;
-bus = pdev-bus;
-
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v8][PATCH 14/16] xen/vtd: enable USB device assignment

2015-07-16 Thread Tiejun Chen
USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v5 ~ v8:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++-
 xen/drivers/passthrough/vtd/utils.c |  7 ---
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index b5d658e..c8b0455 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2243,11 +2243,9 @@ static int reassign_device_ownership(
 /*
  * If the device belongs to the hardware domain, and it has RMRR, don't
  * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.
  */
-if ( !is_hardware_domain(source) 
- !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) )
+if ( !is_hardware_domain(source) )
 {
 const struct acpi_rmrr_unit *rmrr;
 u16 bdf;
@@ -2300,13 +2298,8 @@ static int intel_iommu_assign_device(
 if ( ret )
 return ret;
 
-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
 seg = pdev-seg;
 bus = pdev-bus;
-if ( is_usb_device(seg, bus, pdev-devfn) )
-return 0;
 
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include extern.h
 #include asm/io_apic.h
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v8][PATCH 04/16] xen: enable XENMEM_memory_map in hvm

2015-07-16 Thread Tiejun Chen
This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: George Dunlap george.dun...@eu.citrix.com
---
v5 ~ v8:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c  | 6 --
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-if ( is_hvm_domain(d) )
-{
-rcu_unlock_domain(d);
-return -EPERM;
-}
-
 e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
 if ( e820 == NULL )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v8][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map

2015-07-16 Thread Tiejun Chen
We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v4 ~ v8:

* Nothing is changed.

 tools/libxc/include/xenctrl.h |  8 
 tools/libxc/xc_domain.c   | 36 
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..9160623 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
   uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ce51e69..0951291 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
 return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries)
+{
+int rc;
+struct xen_reserved_device_memory_map xrdmmap = {
+.flag = flag,
+.seg = seg,
+.bus = bus,
+.devfn = devfn,
+.nr_entries = *max_entries
+};
+DECLARE_HYPERCALL_BOUNCE(entries,
+ sizeof(struct xen_reserved_device_memory) *
+ *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+if ( xc_hypercall_bounce_pre(xch, entries) )
+return -1;
+
+set_xen_guest_handle(xrdmmap.buffer, entries);
+
+rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+  xrdmmap, sizeof(xrdmmap));
+
+xc_hypercall_bounce_post(xch, entries);
+
+*max_entries = xrdmmap.nr_entries;
+
+return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v8][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-16 Thread Tiejun Chen
This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is strict or relaxed when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
CC: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Stefano Stabellini stefano.stabell...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8:

* Force to pass 0(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

v6 ~ v7:

* Nothing is changed.

v5:

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
  0 means strict and 1 means relaxed.

* So make DT device ignore the flag field

* Improve the code comments

v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user 
domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl-u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  - domctl-u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c   |  7 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c  |  2 +-
 xen/drivers/passthrough/device_tree.c   |  3 ++-
 xen/drivers/passthrough/pci.c   | 15 
 xen/drivers/passthrough/vtd/iommu.c | 37 ++---
 xen/include/asm-x86/p2m.h   |  2 +-
 xen/include/public/domctl.h |  3 +++
 xen/include/xen/iommu.h |  2 +-
 9 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 99a26ca..47785dc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, 
mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-   p2m_access_t p2ma)
+   p2m_access_t p2ma, unsigned int flag)
 {
 p2m_type_t p2mt;
 p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
 ret = 0;
 else
 {
-ret = -EBUSY;
+if ( flag  XEN_DOMCTL_DEV_RDM_RELAXED )
+ret = 0;
+else
+ret = -EBUSY;
 printk(XENLOG_G_WARNING
Cannot setup identity map d%d:%lx,
 gfn already mapped to %lx.\n,
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct 
domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-   struct pci_dev *pdev)
+   struct pci_dev *pdev,
+   u32 flag)
 {
 struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev-seg);
 int bdf = PCI_BDF2(pdev-bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct 
iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-  struct device *dev)
+  struct device *dev, u32 flag)
 {
struct iommu_domain *domain;
struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 5d3842a..7ff79f8 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 goto fail;
 }
 
-rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev));
+/* The flag field doesn't matter to DT device. */
+rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev), 0);
 
 if ( rc )
 goto fail;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough

[Xen-devel] [v8][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-16 Thread Tiejun Chen
Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8:

* make that core construction function as arch-specific to make sure
  we don't break ARM at this point.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Rephrase patch's short log
* Make libxl__domain_construct_e820() hidden

v4:

* Use goto style error handling.
* Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820.


 tools/libxl/libxl_arch.h |  7 
 tools/libxl/libxl_arm.c  |  8 +
 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_x86.c  | 83 
 4 files changed, 103 insertions(+)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d04871c..939178a 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -49,4 +49,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
 _hidden
 int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);
 
+/* arch specific to construct memory mapping function */
+_hidden
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args);
+
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index f09c860..1526467 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -926,6 +926,14 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 return xc_domain_bind_pt_spi_irq(CTX-xch, domid, irq, irq);
 }
 
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args)
+{
+return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e41d54a..a8c6aa9 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
+if (libxl__arch_domain_construct_memmap(gc, d_config, domid, args)) {
+LOG(ERROR, setting domain memory map failed);
+goto out;
+}
+
 ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port,
state-store_mfn, state-console_port,
state-console_mfn, state-store_domid,
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..66b3d7f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x10
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args)
+{
+int rc = 0;
+unsigned int nr = 0, i;
+/* We always own at least one lowmem entry. */
+unsigned int e820_entries = 1;
+struct e820entry *e820 = NULL;
+uint64_t highmem_size =
+args-highmem_end ? args-highmem_end - (1ull  32) : 0;
+
+/* Add all rdm entries. */
+for (i = 0; i  d_config-num_rdms; i++)
+if (d_config-rdms[i].policy != LIBXL_RDM_RESERVE_POLICY_INVALID)
+e820_entries++;
+
+
+/* If we should have a highmem range. */
+if (highmem_size)
+e820_entries++;
+
+if (e820_entries = E820MAX) {
+LOG(ERROR, Ooops! Too many entries in the memory map!\n);
+rc = ERROR_INVAL;
+goto out

[Xen-devel] [v8][PATCH 05/16] hvmloader: get guest memory map into memory_map[]

2015-07-16 Thread Tiejun Chen
Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
---
v8:

* Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

v5 ~ v7:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate printf()+BUG() and BUG_ON()

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c  | 35 +++
 tools/firmware/hvmloader/e820.h  |  7 +++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c  | 26 ++
 tools/firmware/hvmloader/util.h  | 12 
 5 files changed, 82 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..b72baa5 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,41 @@
 #include config.h
 #include util.h
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, nr_entries);
+
+if ( rc || !nr_entries )
+{
+printf(Get guest memory maps[%d] failed. (%d)\n, nr_entries, rc);
+BUG();
+}
+
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i  nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED )
+{
+if ( check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+{
+printf(Fail to setup memory map due to conflict);
+printf( on dynamic reserved memory range.\n);
+BUG();
+}
+}
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
 uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
 init_hypercalls();
 
+memory_map_setup();
+
 xenbus_setup();
 
 bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include xen/memory.h
 #include xen/sched.h
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+   uint64_t reserved_start, uint64_t reserved_size)
+{
+return (start + size  reserved_start) 
+(start  reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
 asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
 *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+int rc;
+struct xen_memory_map memmap = {
+.nr_entries = *max_entries
+};
+
+set_xen_guest_handle(memmap.buffer, entries);
+
+rc = hypercall_memory_op(XENMEM_memory_map, memmap);
+*max_entries = memmap.nr_entries;
+
+return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
 static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools

[Xen-devel] [v8][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-16 Thread Tiejun Chen
While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

#1. Above a predefined boundary (2G)
- move lowmem_end below reserved region to solve conflict;

#2. Below a predefined boundary (2G)
- Check strict/relaxed policy.
strict policy leads to fail libxl. Note when both policies
are specified on a given region, 'strict' is always preferred.
relaxed policy issue a warning message and also mask this entry 
INVALID
to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
---
v8:

* Introduce pfn_to_paddr(x) - ((uint64_t)x  XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* fix some code stypes
* Refine libxl__xc_device_get_rdm()

v5:

* A little change to make sure the per-device policy always override the global
  policy and correct its associated code comments.
* Fix one typo in the patch head description
* Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace
  malloc() with libxl__malloc(), and finally cleanup this fallout.
* libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out 
parameter.

v4:

* Consistent to use term RDM.
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c   | 273 +++
 tools/libxl/libxl_dom.c  |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 307 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f75d4f1..c8a32d5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -459,7 +459,7 @@ int libxl__domain_build(libxl__gc *gc,
 
 switch (info-type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-ret = libxl__build_hvm(gc, domid, info, state);
+ret = libxl__build_hvm(gc, domid, d_config, state);
 if (ret)
 goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 317a8eb..692258b 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,279 @@ const char *libxl__domain_device_model(libxl__gc *gc,
 return dm;
 }
 
+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+ uint32_t flag,
+ uint16_t seg,
+ uint8_t bus,
+ uint8_t devfn,
+ unsigned int *nr_entries,
+ struct xen_reserved_device_memory **xrdm)
+{
+int rc = 0, r;
+
+/*
+ * We really can't presume how many entries we can get in advance.
+ */
+*nr_entries = 0;
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  NULL, nr_entries);
+assert(r = 0);
+/* 0 means we have no any rdm entry. */
+if (!r) goto out;
+
+if (errno != ENOBUFS) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+*xrdm = libxl__malloc(gc,
+  *nr_entries * sizeof(xen_reserved_device_memory_t));
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  *xrdm, nr_entries);
+if (r)
+rc = ERROR_FAIL;
+
+ out:
+if (rc) {
+*nr_entries = 0;
+*xrdm = NULL;
+LOG(ERROR, Could not get reserved device memory maps.\n);
+}
+return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool

[Xen-devel] [v8][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map

2015-07-16 Thread Tiejun Chen
From: Jan Beulich jbeul...@suse.com

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich jbeul...@suse.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Jan Beulich jbeul...@suse.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v7 ~ v8:

* Nothing is changed.

v6:

* Add a comments to the nr_entries field inside xen_reserved_device_memory_map

v5 ~ v4:

* Nothing is changed.

 xen/common/compat/memory.c   | 66 
 xen/common/memory.c  | 64 ++
 xen/drivers/passthrough/iommu.c  | 10 ++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h  | 37 +++-
 xen/include/xen/iommu.h  | 10 ++
 xen/include/xen/pci.h|  2 ++
 xen/include/xlat.lst |  3 +-
 10 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct compat_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+struct compat_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+return -ERANGE;
+
+if ( __copy_to_compat_offset(grdm-map.buffer,
+ grdm-used_entries,
+ rdm,
+ 1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1;
+}
+
+return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
 int split, op = cmd  MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) compat)
 break;
 }
 
+#ifdef HAS_PASSTHROUGH
+case XENMEM_reserved_device_memory_map:
+{
+struct get_reserved_device_memory grdm;
+
+if ( copy_from_guest(grdm.map, compat, 1) ||
+ !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+return -EFAULT;
+
+grdm.used_entries = 0;
+rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+  grdm);
+
+if ( !rc  grdm.map.nr_entries  grdm.used_entries )
+rc = -ENOBUFS;
+
+grdm.map.nr_entries = grdm.used_entries;
+if ( grdm.map.nr_entries )
+{
+if ( __copy_to_guest(compat, grdm.map, 1) )
+rc = -EFAULT;
+}
+
+return rc;
+}
+#endif
+
 default:
 return compat_arch_memory_op(cmd, compat);
 }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
 return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct xen_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+struct xen_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+if ( __copy_to_guest_offset(grdm-map.buffer,
+grdm-used_entries,
+rdm,
+1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1

[Xen-devel] [v8][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-16 Thread Tiejun Chen
This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Global RDM parameter, strategy, allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8:

* One minimal code style change

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

v6:

* Some rename to make our policy reasonable
  type - strategy
  none - ignore
* Don't expose ignore in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 81 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   |  7 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  9 +
 tools/libxl/libxl_types.idl  | 18 ++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..6c55a8b 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item Brdm=RDM_RESERVATION_STRING
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+BRDM_RESERVE_STRING has the form C[KEY=VALUE,KEY=VALUE,... where:
+
+=over 4
+
+=item BKEY=VALUE
+
+Possible BKEYs are:
+
+=over 4
+
+=item Bstrategy=STRING
+
+Currently there is only one valid type:
+
+host means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using host includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does 
exist.
+
+For example, you're trying to set memory = 2800 to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If Bstrategy is set to host, for example,
+
+rdm = strategy=host,policy=strict or rdm = strategy=host,policy=relaxed
+
+It means all conflicts will be handled according to the policy
+introduced by Bpolicy as described below.
+
+#2. If Bstrategy is not set at all, but
+
+pci = [ 'sbdf_A, rdm_policy=x' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by Brdm_policy=STRING as described inside pci options.
+
+=item Bpolicy=STRING
+
+Specifies how to deal

[Xen-devel] [v8][PATCH 16/16] tools: parse to enable new rdm policy parameters

2015-07-16 Thread Tiejun Chen
This patch parses to enable user configurable parameters to specify
RDM resource and according policies which are defined previously,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8:

* Clean some codes style issues.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Just sync those renames introduced by patch #10.

v5:

* Need a rebase after we make all rdm variables specific to .hvm.
* Like other pci option, the per-device policy always follows
  the global policy by default.

v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 92 +++-
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 13 +++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..026413b 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, 
unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE  9
+#define STATE_RDM_STRATEGY  10
+#define STATE_RESERVE_POLICY11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char 
*str)
 {
 unsigned state = STATE_DOMAIN;
@@ -143,7 +146,18 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci 
*pcidev, const char *str
 pcidev-permissive = atoi(tok);
 }else if ( !strcmp(optkey, seize) ) {
 pcidev-seize = atoi(tok);
-}else{
+} else if (!strcmp(optkey, rdm_policy)) {
+if (!strcmp(tok, strict)) {
+pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if (!strcmp(tok, relaxed)) {
+pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, %s is not an valid PCI RDM property
+   policy: 'strict' or 'relaxed'.,
+ tok);
+goto parse_error;
+}
+} else {
 XLU__PCI_ERR(cfg, Unknown PCI BDF option: %s, optkey);
 }
 tok = ptr + 1;
@@ -167,6 +181,82 @@ parse_error:
 return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+unsigned state = STATE_TYPE;
+char *buf2, *tok, *ptr, *end;
+
+if (NULL == (buf2 = ptr = strdup(str)))
+return ERROR_NOMEM;
+
+for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr  end; ptr++) {
+switch(state) {
+case STATE_TYPE:
+if (*ptr == '=') {
+state = STATE_RDM_STRATEGY;
+*ptr = '\0';
+if (strcmp(tok, strategy)) {
+XLU__PCI_ERR(cfg, Unknown RDM state option: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RDM_STRATEGY:
+if (*ptr == '\0' || *ptr == ',') {
+state = STATE_RESERVE_POLICY;
+*ptr = '\0';
+if (!strcmp(tok, host)) {
+rdm-strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST;
+} else {
+XLU__PCI_ERR(cfg, Unknown RDM strategy option: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RESERVE_POLICY:
+if (*ptr == '=') {
+state = STATE_OPTIONS_V;
+*ptr = '\0';
+if (strcmp(tok, policy)) {
+XLU__PCI_ERR(cfg, Unknown RDM property value: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_OPTIONS_V:
+if (*ptr == ',' || *ptr == '\0') {
+state = STATE_TERMINAL;
+*ptr = '\0';
+if (!strcmp(tok, strict)) {
+rdm-policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if (!strcmp(tok, relaxed)) {
+rdm

[Xen-devel] [v8][PATCH 00/16] Fix RMRR

2015-07-16 Thread Tiejun Chen
 to handle those devices which share same RMRR.

v2:

* Instead of that fixed predefined rdm memory boundary, we'd like to
  introduce a parameter, rdm_mem_boundary, to set this threshold value.

* Remove that existing USB hack.

* Make sure the MMIO regions all fit in the available resource window

* Rename our policy, force/try - strict/relaxed

* Indeed, Wei and Jan gave me more and more comments to refine codes
  * Code style
  * Better and reasonable code implementation
  * Correct or improve code comments.

* A little bit to work well with ARM.

Open:

* We should fail assigning device which has a shared RMRR with
another device. We can only do group assignment when RMRR is shared
among devices.

We need more time to figure a good policy/way out because something
is not clear to me.

As you know all devices are owned by Dom0 firstly before we create any
DomU, right? Do we allow Dom0 still own a group device while assign another
device in the same group?

Really appreciate any comments to policy.


v1:

RMRR is an acronym for Reserved Memory Region Reporting, expected to
be used for legacy usages (such as USB, UMA Graphics, etc.) requiring
reserved memory. Special treatment is required in system software to
setup those reserved regions in IOMMU translation structures, otherwise
passing through a device with RMRR reported may not work correctly.

This patch set tries to enhance existing Xen RMRR implementation to fix
various reported and theoretical problems. Most noteworthy changes are
to setup identity mapping in p2m layer and handle possible conflicts between
reported regions and gfn space. Initial proposal can be found at:
http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg00524.html
and after a long discussion a summarized agreement is here:
http://lists.xen.org/archives/html/xen-devel/2015-01/msg01580.html

Below is a key summary of this patch set according to agreed proposal:

1. Use RDM (Reserved Device Memory) name in user space as a general 
description instead of using ACPI RMRR name directly.

2. Introduce configuration parameters to allow user control both per-device 
and global RDM resources along with desired policies upon a detected conflict.

3. Introduce a new hypercall to query global and per-device RDM resources.

4. Extend libxl to be a central place to manage RDM resources and handle 
potential conflicts between reserved regions and gfn space. One simplification
goal is made to keep existing lowmem / mmio / highmem layout which is
passed around various function blocks. So a reasonable assumption
is made, that conflicts falling into below areas are not re-arranged otherwise
it will result in a more scattered layout:
a) in highmem region (4G)
b) in lowmem region, and below a predefined boundary (default 2G)
  a) is a new assumption not discussed before. From VT-d spec this is 
possible but no such observation in real-world. So we can make this
reasonable assumption until there's real usage on it.

5. Extend XENMEM_set_memory_map usable for HVM guest, and then have
libxl to use that hypercall to carry RDM information to hvmloader. There
is one difference from original discussion. Previously we discussed to
introduce a new E820 type specifically for RDM entries. After more thought
we think it's OK to just tag them as E820_reserved. Actually hvmloader
doesn't need to know whether the reserved entries come from RDM or
from other purposes. 

6. Then in hvmloader the change is generic for XENMEM_memory_map
change. Given a predefined memory layout, hvmloader should avoid
allocating all reserved entries for other usages (opregion, mmio, etc.)

7. Extend existing device passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.


Jan Beulich (1):
  xen: introduce XENMEM_reserved_device_memory_map

Tiejun Chen (15):
  xen/vtd: create RMRR mapping
  xen/passthrough: extend hypercall to support rdm reservation policy
  xen: enable XENMEM_memory_map in hvm
  hvmloader: get guest memory map into memory_map[]
  hvmloader/pci: skip reserved ranges
  hvmloader/e820: construct guest e820 table
  tools/libxc: Expose new

[Xen-devel] [v8][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary

2015-07-16 Thread Tiejun Chen
Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter rdm_mem_boundary in .cfg file.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8:

* Nothing is changed.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable rdm_mem_boundary_memkb specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5   | 22 ++
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_create.c  |  4 
 tools/libxl/libxl_dom.c |  8 +---
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c|  3 +++
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 6c55a8b..23068ec 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -867,6 +867,28 @@ More information about Xen gfx_passthru feature is 
available
 on the XenVGAPassthrough Lhttp://wiki.xen.org/wiki/XenVGAPassthrough
 wiki page.
 
+=item Brdm_mem_boundary=MBYTES
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+#1. Above a predefined boundary
+- move lowmem_end below reserved region to solve conflict;
+
+#2. Below a predefined boundary
+- Check strict/relaxed policy.
+strict policy leads to fail libxl. Note when both policies
+are specified on a given region, 'strict' is always preferred.
+relaxed policy issue a warning message and also mask this
+entry INVALID to indicate we shouldn't expose this entry to
+hvmloader.
+
+Here the default is 2G.
+
 =item Bdtdev=[ DTDEV_PATH, DTDEV_PATH, ... ]
 
 Specifies the host device tree nodes to passthrough to this guest. Each
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..6f157c9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
 uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c8a32d5..3de86a6 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, 
libxl_domain_build_info *b_info)
 {
 if (b_info-u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
 b_info-u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+
+if (b_info-u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+b_info-u.hvm.rdm_mem_boundary_memkb =
+LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 80fa17d..e41d54a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 int ret, rc = ERROR_FAIL;
 uint64_t mmio_start, lowmem_end, highmem_end;
 libxl_domain_build_info *const info = d_config-b_info;
-/*
- * Currently we fix this as 2G to guarantee how to handle
- * our rdm policy. But we'll provide a parameter to set
- * this dynamically.
- */
-uint64_t rdm_mem_boundary = 0x8000;
 
 memset(args, 0, sizeof(struct xc_hvm_build_args));
 /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 args.mmio_start = mmio_start;
 
 rc = libxl__domain_device_construct_rdm(gc, d_config,
-rdm_mem_boundary,
+
info-u.hvm.rdm_mem_boundary_memkb*1024,
 args);
 if (rc) {
 LOG(ERROR, checking reserved device memory failed);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a3ad8d1..4eb4f8a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl

[Xen-devel] [v8][PATCH 02/16] xen/vtd: create RMRR mapping

2015-07-16 Thread Tiejun Chen
RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Acked-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v8:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m-domain) )
return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c   | 40 +++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h   | 13 +---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long 
gfn, unsigned long mfn,
  p2m-default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
   unsigned long mfn, unsigned int page_order)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int rc;
 gfn_lock(p2m, gfn, page_order);
-p2m_remove_page(p2m, gfn, mfn, page_order);
+rc = p2m_remove_page(p2m, gfn, mfn, page_order);
 gfn_unlock(p2m, gfn, page_order);
+return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma)
+{
+p2m_type_t p2mt;
+p2m_access_t a;
+mfn_t mfn;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+
+if ( !paging_mode_translate(p2m-domain) )
+return 0;
+
+gfn_lock(p2m, gfn, 0);
+
+mfn = p2m-get_entry(p2m, gfn, p2mt, a, 0, NULL);
+
+if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+p2m_mmio_direct, p2ma);
+else if ( mfn_x(mfn) == gfn  p2mt == p2m_mmio_direct  a == p2ma )
+ret = 0;
+else
+{
+ret = -EBUSY;
+printk(XENLOG_G_WARNING
+   Cannot setup identity map d%d:%lx,
+gfn already mapped to %lx.\n,
+   d-domain_id, gfn, mfn_x(mfn));
+}
+
+gfn_unlock(p2m, gfn, 0);
+return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( clear_identity_p2m_entry(d, base_pfn, 0) )
 ret = -ENXIO;
 base_pfn++;
 }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-int err = intel_iommu_map_page(d, base_pfn, base_pfn

[Xen-devel] [v9][PATCH 04/16] xen: enable XENMEM_memory_map in hvm

2015-07-16 Thread Tiejun Chen
This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: George Dunlap george.dun...@eu.citrix.com
---
v5 ~ v9:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c  | 6 --
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-if ( is_hvm_domain(d) )
-{
-rcu_unlock_domain(d);
-return -EPERM;
-}
-
 e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
 if ( e820 == NULL )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v9][PATCH 00/16] Fix RMRR

2015-07-16 Thread Tiejun Chen
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.


Jan Beulich (1):
  xen: introduce XENMEM_reserved_device_memory_map

Tiejun Chen (15):
  xen/vtd: create RMRR mapping
  xen/passthrough: extend hypercall to support rdm reservation policy
  xen: enable XENMEM_memory_map in hvm
  hvmloader: get guest memory map into memory_map[]
  hvmloader/pci: skip reserved ranges
  hvmloader/e820: construct guest e820 table
  tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  tools: extend xc_assign_device() to support rdm reservation policy
  tools: introduce some new parameters to set rdm policy
  tools/libxl: detect and avoid conflicts with RDM
  tools: introduce a new parameter to set a predefined rdm boundary
  libxl: construct e820 map with RDM information for HVM guest
  xen/vtd: enable USB device assignment
  xen/vtd: prevent from assign the device with shared rmrr
  tools: parse to enable new rdm policy parameters

Jan Beulich (1):
  xen: introduce XENMEM_reserved_device_memory_map

 docs/man/xl.cfg.pod.5   | 103 
 docs/misc/vtd.txt   |  24 ++
 tools/firmware/hvmloader/e820.c | 131 +-
 tools/firmware/hvmloader/e820.h |   7 +
 tools/firmware/hvmloader/hvmloader.c|   2 +
 tools/firmware/hvmloader/pci.c  |  81 ++
 tools/firmware/hvmloader/util.c |  26 ++
 tools/firmware/hvmloader/util.h |  12 +
 tools/libxc/include/xenctrl.h   |  11 +-
 tools/libxc/xc_domain.c |  45 +++-
 tools/libxl/libxl.h |   6 +
 tools/libxl/libxl_arch.h|   7 +
 tools/libxl/libxl_arm.c |   8 +
 tools/libxl/libxl_create.c  |  13 +-
 tools/libxl/libxl_dm.c  | 273 
 tools/libxl/libxl_dom.c |  16 +-
 tools/libxl/libxl_internal.h|  13 +-
 tools/libxl/libxl_pci.c |  12 +-
 tools/libxl/libxl_types.idl |  26 ++
 tools/libxl/libxl_x86.c |  83 ++
 tools/libxl/libxlu_pci.c|  92 ++-
 tools/libxl/libxlutil.h |   4 +
 tools/libxl/xl_cmdimpl.c|  16 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c |  16 +-
 tools/python/xen/lowlevel/xc/xc.c   |  30 ++-
 xen/arch/x86/hvm/hvm.c  |   2 -
 xen/arch/x86/mm.c   |   6 -
 xen/arch/x86/mm/p2m.c   |  43 ++-
 xen/common/compat/memory.c  |  66 +
 xen/common/memory.c |  64 +
 xen/drivers/passthrough/amd/pci_amd_iommu.c |   3 +-
 xen/drivers/passthrough/arm/smmu.c  |   2 +-
 xen/drivers/passthrough/device_tree.c   |   3 +-
 xen/drivers/passthrough/iommu.c |  10 +
 xen/drivers/passthrough/pci.c   |  15 +-
 xen/drivers/passthrough/vtd/dmar.c  |  32 +++
 xen/drivers/passthrough/vtd/dmar.h  |   1 -
 xen/drivers/passthrough/vtd/extern.h|   1 +
 xen/drivers/passthrough/vtd/iommu.c |  82 --
 xen/drivers/passthrough/vtd/utils.c |   7 -
 xen/include/asm-x86/p2m.h   |  13 +-
 xen/include/public/domctl.h |   3 +
 xen/include/public/memory.h |  37 ++-
 xen/include/xen/iommu.h |  12 +-
 xen/include/xen/pci.h   |   2 +
 xen/include/xlat.lst|   3 +-
 46 files changed, 1376 insertions(+), 88 deletions(-)

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v9][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary

2015-07-16 Thread Tiejun Chen
Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter rdm_mem_boundary in .cfg file.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8 ~ v9:

* Nothing is changed.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable rdm_mem_boundary_memkb specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5   | 22 ++
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_create.c  |  4 
 tools/libxl/libxl_dom.c |  8 +---
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c|  3 +++
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 6c55a8b..23068ec 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -867,6 +867,28 @@ More information about Xen gfx_passthru feature is 
available
 on the XenVGAPassthrough Lhttp://wiki.xen.org/wiki/XenVGAPassthrough
 wiki page.
 
+=item Brdm_mem_boundary=MBYTES
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+#1. Above a predefined boundary
+- move lowmem_end below reserved region to solve conflict;
+
+#2. Below a predefined boundary
+- Check strict/relaxed policy.
+strict policy leads to fail libxl. Note when both policies
+are specified on a given region, 'strict' is always preferred.
+relaxed policy issue a warning message and also mask this
+entry INVALID to indicate we shouldn't expose this entry to
+hvmloader.
+
+Here the default is 2G.
+
 =item Bdtdev=[ DTDEV_PATH, DTDEV_PATH, ... ]
 
 Specifies the host device tree nodes to passthrough to this guest. Each
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..6f157c9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
 uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c8a32d5..3de86a6 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, 
libxl_domain_build_info *b_info)
 {
 if (b_info-u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
 b_info-u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+
+if (b_info-u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+b_info-u.hvm.rdm_mem_boundary_memkb =
+LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 80fa17d..e41d54a 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 int ret, rc = ERROR_FAIL;
 uint64_t mmio_start, lowmem_end, highmem_end;
 libxl_domain_build_info *const info = d_config-b_info;
-/*
- * Currently we fix this as 2G to guarantee how to handle
- * our rdm policy. But we'll provide a parameter to set
- * this dynamically.
- */
-uint64_t rdm_mem_boundary = 0x8000;
 
 memset(args, 0, sizeof(struct xc_hvm_build_args));
 /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 args.mmio_start = mmio_start;
 
 rc = libxl__domain_device_construct_rdm(gc, d_config,
-rdm_mem_boundary,
+
info-u.hvm.rdm_mem_boundary_memkb*1024,
 args);
 if (rc) {
 LOG(ERROR, checking reserved device memory failed);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a3ad8d1..4eb4f8a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl

[Xen-devel] [v9][PATCH 06/16] hvmloader/pci: disable all pci devices conflicting with rdm

2015-07-16 Thread Tiejun Chen
When allocating mmio address for PCI bars, mmio may overlap with
reserved regions. Currently we just want to disable these associate
devices simply to avoid conflicts but we will reshape current mmio
allocation mechanism to fix this completely.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v9:

* A little improvement to code implementation but again, its still argued about
  this solution.

v8:

* Based on current discussion its hard to reshape the original mmio
  allocation mechanism but we haven't a good and simple way to in short term.
  So instead, we don't bring more complicated to intervene that process but
  still check any conflicts to disable all associated devices.

v6 ~ v7:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2.

  2. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3.

  3. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource-base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource-base to 
reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 81 ++
 1 file changed, 81 insertions(+)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..15ed9b2 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,84 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+/*
+ * We should check if all valid bars conflict with RDM.
+ *
+ * Here we just need to check mmio bars in the case of non-highmem
+ * since the hypervisor can make sure RDM doesn't involve highmem.
+ */
+static void disable_conflicting_devices(void)
+{
+uint8_t is_64bar;
+uint32_t devfn, bar_reg, cmd, bar_data;
+uint16_t vendor_id, device_id;
+unsigned int bar, i;
+uint64_t bar_sz;
+bool is_conflict = false;
+
+for ( devfn = 0; devfn  256; devfn++ )
+{
+vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
+device_id = pci_readw(devfn, PCI_DEVICE_ID);
+if ( (vendor_id == 0x)  (device_id == 0x) )
+continue;
+
+/* Check all bars */
+for ( bar = 0; bar  7  !is_conflict; bar++ )
+{
+bar_reg = PCI_BASE_ADDRESS_0 + 4*bar;
+if ( bar == 6 )
+bar_reg = PCI_ROM_ADDRESS;
+
+bar_data = pci_readl(devfn, bar_reg);
+bar_data = PCI_BASE_ADDRESS_MEM_MASK;
+if ( !bar_data )
+continue;
+
+if ( bar_reg != PCI_ROM_ADDRESS )
+is_64bar = !!((bar_data  (PCI_BASE_ADDRESS_SPACE |
+ PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
+ (PCI_BASE_ADDRESS_SPACE_MEMORY |
+ PCI_BASE_ADDRESS_MEM_TYPE_64));
+
+/* Until here we never conflict high memory. */
+if ( is_64bar

[Xen-devel] [v9][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-16 Thread Tiejun Chen
Now use the hypervisor-supplied memory map to build our final e820 table:
* Add regions for BIOS ranges and other special mappings not in the
  hypervisor map
* Add in the hypervisor regions
* Adjust the lowmem and highmem regions if we've had to relocate
  memory (adding a highmem region if necessary)
* Sort all the ranges so that they appear in memory order.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v9:

* Refine that chunk of codes to check/modify highmem

v8:

* define low_mem_end as uint32_t

* Correct those two wrong loops, memory_map.nr_map - nr
  when we're trying to revise low/high memory e820 entries.

* Improve code comments and the patch head description

* Add one check if highmem is just populated by hvmloader itself

v5 ~ v7:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.
 
 tools/firmware/hvmloader/e820.c | 99 +++--
 1 file changed, 85 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 7a414ab..49d420a 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint32_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
+uint64_t high_mem_end = (uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT;
+uint64_t add_high_mem = 0;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;
 
-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
-
-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
+e820[nr] = memory_map.map[i];
 nr++;
 }
 
+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));
+
+/*
+ * Its possible to relocate RAM to allocate sufficient MMIO previously
+ * so low_mem_pgend would be changed over there. And here memory_map[]
+ * records the original low/high memory, so if low_mem_end is less than
+ * the original we need to revise low/high memory range in e820.
+ */
+for ( i = 0; i  nr; i++ )
+{
+uint64_t end = e820[i].addr + e820[i].size;
+if ( e820[i].type == E820_RAM 
+ low_mem_end  e820[i].addr  low_mem_end  end )
+{
+add_high_mem = end - low_mem_end;
+e820[i].size = low_mem_end - e820[i].addr;
+}
+}
+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+/* Modify the existing highmem region if it exists. */
+for ( i = 0; i  nr; i++ )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr == ((uint64_t)1  32))
+{
+e820[i].size += add_high_mem;
+break;
+}
+}
+
+/* If there was no highmem region, just create one. */
+if ( i == nr )
+{
+e820[nr].addr = ((uint64_t)1  32);
+e820[nr].size = add_high_mem;
+e820[nr].type = E820_RAM;
+nr++;
+}
+
+/* A sanity check if high memory is broken. */
+BUG_ON( high_mem_end != e820[i].addr + e820[i].size);
+}
+
+/* Finally we

[Xen-devel] [v9][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-16 Thread Tiejun Chen
This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is strict or relaxed when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
CC: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Stefano Stabellini stefano.stabell...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: George Dunlap george.dun...@eu.citrix.com
---
v9:

* Correct one check condition of XEN_DOMCTL_DEV_RDM_RELAXED

v8:

* Force to pass 0(strict) when add or move a device in hardware domain,
  and improve some associated code comments.

v6 ~ v7:

* Nothing is changed.

v5:

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
  0 means strict and 1 means relaxed.

* So make DT device ignore the flag field

* Improve the code comments

v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user 
domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl-u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  - domctl-u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c   |  7 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c  |  2 +-
 xen/drivers/passthrough/device_tree.c   |  3 ++-
 xen/drivers/passthrough/pci.c   | 15 
 xen/drivers/passthrough/vtd/iommu.c | 37 ++---
 xen/include/asm-x86/p2m.h   |  2 +-
 xen/include/public/domctl.h |  3 +++
 xen/include/xen/iommu.h |  2 +-
 9 files changed, 55 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 99a26ca..47785dc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, 
mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-   p2m_access_t p2ma)
+   p2m_access_t p2ma, unsigned int flag)
 {
 p2m_type_t p2mt;
 p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
 ret = 0;
 else
 {
-ret = -EBUSY;
+if ( flag  XEN_DOMCTL_DEV_RDM_RELAXED )
+ret = 0;
+else
+ret = -EBUSY;
 printk(XENLOG_G_WARNING
Cannot setup identity map d%d:%lx,
 gfn already mapped to %lx.\n,
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct 
domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-   struct pci_dev *pdev)
+   struct pci_dev *pdev,
+   u32 flag)
 {
 struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev-seg);
 int bdf = PCI_BDF2(pdev-bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct 
iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-  struct device *dev)
+  struct device *dev, u32 flag)
 {
struct iommu_domain *domain;
struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 5d3842a..7ff79f8 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 goto fail;
 }
 
-rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev));
+/* The flag field doesn't matter to DT device. */
+rc = hd-platform_ops-assign_device(d, 0

[Xen-devel] [v9][PATCH 16/16] tools: parse to enable new rdm policy parameters

2015-07-16 Thread Tiejun Chen
This patch parses to enable user configurable parameters to specify
RDM resource and according policies which are defined previously,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v9:

* Nothing is changed.

v8:

* Clean some codes style issues.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Just sync those renames introduced by patch #10.

v5:

* Need a rebase after we make all rdm variables specific to .hvm.
* Like other pci option, the per-device policy always follows
  the global policy by default.

v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 92 +++-
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 13 +++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..026413b 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, 
unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE  9
+#define STATE_RDM_STRATEGY  10
+#define STATE_RESERVE_POLICY11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char 
*str)
 {
 unsigned state = STATE_DOMAIN;
@@ -143,7 +146,18 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci 
*pcidev, const char *str
 pcidev-permissive = atoi(tok);
 }else if ( !strcmp(optkey, seize) ) {
 pcidev-seize = atoi(tok);
-}else{
+} else if (!strcmp(optkey, rdm_policy)) {
+if (!strcmp(tok, strict)) {
+pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if (!strcmp(tok, relaxed)) {
+pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, %s is not an valid PCI RDM property
+   policy: 'strict' or 'relaxed'.,
+ tok);
+goto parse_error;
+}
+} else {
 XLU__PCI_ERR(cfg, Unknown PCI BDF option: %s, optkey);
 }
 tok = ptr + 1;
@@ -167,6 +181,82 @@ parse_error:
 return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+unsigned state = STATE_TYPE;
+char *buf2, *tok, *ptr, *end;
+
+if (NULL == (buf2 = ptr = strdup(str)))
+return ERROR_NOMEM;
+
+for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr  end; ptr++) {
+switch(state) {
+case STATE_TYPE:
+if (*ptr == '=') {
+state = STATE_RDM_STRATEGY;
+*ptr = '\0';
+if (strcmp(tok, strategy)) {
+XLU__PCI_ERR(cfg, Unknown RDM state option: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RDM_STRATEGY:
+if (*ptr == '\0' || *ptr == ',') {
+state = STATE_RESERVE_POLICY;
+*ptr = '\0';
+if (!strcmp(tok, host)) {
+rdm-strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST;
+} else {
+XLU__PCI_ERR(cfg, Unknown RDM strategy option: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RESERVE_POLICY:
+if (*ptr == '=') {
+state = STATE_OPTIONS_V;
+*ptr = '\0';
+if (strcmp(tok, policy)) {
+XLU__PCI_ERR(cfg, Unknown RDM property value: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_OPTIONS_V:
+if (*ptr == ',' || *ptr == '\0') {
+state = STATE_TERMINAL;
+*ptr = '\0';
+if (!strcmp(tok, strict)) {
+rdm-policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if (!strcmp(tok, relaxed

[Xen-devel] [v9][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-16 Thread Tiejun Chen
While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

#1. Above a predefined boundary (2G)
- move lowmem_end below reserved region to solve conflict;

#2. Below a predefined boundary (2G)
- Check strict/relaxed policy.
strict policy leads to fail libxl. Note when both policies
are specified on a given region, 'strict' is always preferred.
relaxed policy issue a warning message and also mask this entry 
INVALID
to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
---
v9:

* Nothing is changed.

v8:

* Introduce pfn_to_paddr(x) - ((uint64_t)x  XC_PAGE_SHIFT)
  and set_rdm_entries() to factor out current codes.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* fix some code stypes
* Refine libxl__xc_device_get_rdm()

v5:

* A little change to make sure the per-device policy always override the global
  policy and correct its associated code comments.
* Fix one typo in the patch head description
* Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace
  malloc() with libxl__malloc(), and finally cleanup this fallout.
* libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out 
parameter.

v4:

* Consistent to use term RDM.
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c   | 273 +++
 tools/libxl/libxl_dom.c  |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 307 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f75d4f1..c8a32d5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -459,7 +459,7 @@ int libxl__domain_build(libxl__gc *gc,
 
 switch (info-type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-ret = libxl__build_hvm(gc, domid, info, state);
+ret = libxl__build_hvm(gc, domid, d_config, state);
 if (ret)
 goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 317a8eb..692258b 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,279 @@ const char *libxl__domain_device_model(libxl__gc *gc,
 return dm;
 }
 
+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+ uint32_t flag,
+ uint16_t seg,
+ uint8_t bus,
+ uint8_t devfn,
+ unsigned int *nr_entries,
+ struct xen_reserved_device_memory **xrdm)
+{
+int rc = 0, r;
+
+/*
+ * We really can't presume how many entries we can get in advance.
+ */
+*nr_entries = 0;
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  NULL, nr_entries);
+assert(r = 0);
+/* 0 means we have no any rdm entry. */
+if (!r) goto out;
+
+if (errno != ENOBUFS) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+*xrdm = libxl__malloc(gc,
+  *nr_entries * sizeof(xen_reserved_device_memory_t));
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  *xrdm, nr_entries);
+if (r)
+rc = ERROR_FAIL;
+
+ out:
+if (rc) {
+*nr_entries = 0;
+*xrdm = NULL;
+LOG(ERROR, Could not get reserved device memory maps.\n);
+}
+return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns

[Xen-devel] [v9][PATCH 14/16] xen/vtd: enable USB device assignment

2015-07-16 Thread Tiejun Chen
USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v5 ~ v9:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++-
 xen/drivers/passthrough/vtd/utils.c |  7 ---
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index b5d658e..c8b0455 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2243,11 +2243,9 @@ static int reassign_device_ownership(
 /*
  * If the device belongs to the hardware domain, and it has RMRR, don't
  * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.
  */
-if ( !is_hardware_domain(source) 
- !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) )
+if ( !is_hardware_domain(source) )
 {
 const struct acpi_rmrr_unit *rmrr;
 u16 bdf;
@@ -2300,13 +2298,8 @@ static int intel_iommu_assign_device(
 if ( ret )
 return ret;
 
-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
 seg = pdev-seg;
 bus = pdev-bus;
-if ( is_usb_device(seg, bus, pdev-devfn) )
-return 0;
 
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include extern.h
 #include asm/io_apic.h
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v9][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map

2015-07-16 Thread Tiejun Chen
From: Jan Beulich jbeul...@suse.com

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich jbeul...@suse.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Jan Beulich jbeul...@suse.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v7 ~ v9:

* Nothing is changed.

v6:

* Add a comments to the nr_entries field inside xen_reserved_device_memory_map

v5 ~ v4:

* Nothing is changed.

 xen/common/compat/memory.c   | 66 
 xen/common/memory.c  | 64 ++
 xen/drivers/passthrough/iommu.c  | 10 ++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h  | 37 +++-
 xen/include/xen/iommu.h  | 10 ++
 xen/include/xen/pci.h|  2 ++
 xen/include/xlat.lst |  3 +-
 10 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct compat_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+struct compat_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+return -ERANGE;
+
+if ( __copy_to_compat_offset(grdm-map.buffer,
+ grdm-used_entries,
+ rdm,
+ 1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1;
+}
+
+return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
 int split, op = cmd  MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) compat)
 break;
 }
 
+#ifdef HAS_PASSTHROUGH
+case XENMEM_reserved_device_memory_map:
+{
+struct get_reserved_device_memory grdm;
+
+if ( copy_from_guest(grdm.map, compat, 1) ||
+ !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+return -EFAULT;
+
+grdm.used_entries = 0;
+rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+  grdm);
+
+if ( !rc  grdm.map.nr_entries  grdm.used_entries )
+rc = -ENOBUFS;
+
+grdm.map.nr_entries = grdm.used_entries;
+if ( grdm.map.nr_entries )
+{
+if ( __copy_to_guest(compat, grdm.map, 1) )
+rc = -EFAULT;
+}
+
+return rc;
+}
+#endif
+
 default:
 return compat_arch_memory_op(cmd, compat);
 }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
 return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct xen_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+struct xen_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+if ( __copy_to_guest_offset(grdm-map.buffer,
+grdm-used_entries,
+rdm,
+1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1

[Xen-devel] [v9][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-16 Thread Tiejun Chen
This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Global RDM parameter, strategy, allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v9:

* Nothing is changed.

v8:

* One minimal code style change

v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

v6:

* Some rename to make our policy reasonable
  type - strategy
  none - ignore
* Don't expose ignore in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 81 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   |  7 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  9 +
 tools/libxl/libxl_types.idl  | 18 ++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..6c55a8b 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item Brdm=RDM_RESERVATION_STRING
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+BRDM_RESERVE_STRING has the form C[KEY=VALUE,KEY=VALUE,... where:
+
+=over 4
+
+=item BKEY=VALUE
+
+Possible BKEYs are:
+
+=over 4
+
+=item Bstrategy=STRING
+
+Currently there is only one valid type:
+
+host means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using host includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does 
exist.
+
+For example, you're trying to set memory = 2800 to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If Bstrategy is set to host, for example,
+
+rdm = strategy=host,policy=strict or rdm = strategy=host,policy=relaxed
+
+It means all conflicts will be handled according to the policy
+introduced by Bpolicy as described below.
+
+#2. If Bstrategy is not set at all, but
+
+pci = [ 'sbdf_A, rdm_policy=x' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by Brdm_policy=STRING as described inside pci options.
+
+=item Bpolicy=STRING

[Xen-devel] [v9][PATCH 05/16] hvmloader: get guest memory map into memory_map[]

2015-07-16 Thread Tiejun Chen
Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END).
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: George Dunlap george.dun...@eu.citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
---
v9:

* Correct [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END]
- [RESERVED_MEMORY_DYNAMIC_START, RESERVED_MEMORY_DYNAMIC_END) in
  the patch head description;
  Merge two if{} as one if{};

v8:

* Actually we should check this range started from
  RESERVED_MEMORY_DYNAMIC_START, not RESERVED_MEMORY_DYNAMIC_START - 1.
  So correct this and sync the patch head description.

v5 ~ v7:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate printf()+BUG() and BUG_ON()

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c  | 32 
 tools/firmware/hvmloader/e820.h  |  7 +++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c  | 26 ++
 tools/firmware/hvmloader/util.h  | 12 
 5 files changed, 79 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..7a414ab 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,38 @@
 #include config.h
 #include util.h
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, nr_entries);
+
+if ( rc || !nr_entries )
+{
+printf(Get guest memory maps[%d] failed. (%d)\n, nr_entries, rc);
+BUG();
+}
+
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i  nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED 
+ check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr, memory_map.map[i].size) )
+{
+printf(Fail to setup memory map due to conflict);
+printf( on dynamic reserved memory range.\n);
+BUG();
+}
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
 uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
 init_hypercalls();
 
+memory_map_setup();
+
 xenbus_setup();
 
 bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include xen/memory.h
 #include xen/sched.h
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+   uint64_t reserved_start, uint64_t reserved_size)
+{
+return (start + size  reserved_start) 
+(start  reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
 asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
 *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+int rc;
+struct xen_memory_map memmap = {
+.nr_entries = *max_entries
+};
+
+set_xen_guest_handle(memmap.buffer, entries);
+
+rc = hypercall_memory_op(XENMEM_memory_map, memmap

[Xen-devel] [v9][PATCH 02/16] xen/vtd: create RMRR mapping

2015-07-16 Thread Tiejun Chen
RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Acked-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v9:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m-domain) )
return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c   | 40 +++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h   | 13 +---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long 
gfn, unsigned long mfn,
  p2m-default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
   unsigned long mfn, unsigned int page_order)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int rc;
 gfn_lock(p2m, gfn, page_order);
-p2m_remove_page(p2m, gfn, mfn, page_order);
+rc = p2m_remove_page(p2m, gfn, mfn, page_order);
 gfn_unlock(p2m, gfn, page_order);
+return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma)
+{
+p2m_type_t p2mt;
+p2m_access_t a;
+mfn_t mfn;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+
+if ( !paging_mode_translate(p2m-domain) )
+return 0;
+
+gfn_lock(p2m, gfn, 0);
+
+mfn = p2m-get_entry(p2m, gfn, p2mt, a, 0, NULL);
+
+if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+p2m_mmio_direct, p2ma);
+else if ( mfn_x(mfn) == gfn  p2mt == p2m_mmio_direct  a == p2ma )
+ret = 0;
+else
+{
+ret = -EBUSY;
+printk(XENLOG_G_WARNING
+   Cannot setup identity map d%d:%lx,
+gfn already mapped to %lx.\n,
+   d-domain_id, gfn, mfn_x(mfn));
+}
+
+gfn_unlock(p2m, gfn, 0);
+return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( clear_identity_p2m_entry(d, base_pfn, 0) )
 ret = -ENXIO;
 base_pfn++;
 }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-int err = intel_iommu_map_page(d, base_pfn, base_pfn

[Xen-devel] [v9][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-16 Thread Tiejun Chen
Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v8 ~ v9:

* Nothing is changed.

v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable rdm_mem_boundary_memkb specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.


 tools/libxl/libxl_arch.h |  7 
 tools/libxl/libxl_arm.c  |  8 +
 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_x86.c  | 83 
 4 files changed, 103 insertions(+)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d04871c..939178a 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -49,4 +49,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
 _hidden
 int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);
 
+/* arch specific to construct memory mapping function */
+_hidden
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args);
+
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index f09c860..1526467 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -926,6 +926,14 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 return xc_domain_bind_pt_spi_irq(CTX-xch, domid, irq, irq);
 }
 
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args)
+{
+return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e41d54a..a8c6aa9 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
+if (libxl__arch_domain_construct_memmap(gc, d_config, domid, args)) {
+LOG(ERROR, setting domain memory map failed);
+goto out;
+}
+
 ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port,
state-store_mfn, state-console_port,
state-console_mfn, state-store_domid,
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..66b3d7f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x10
+int libxl__arch_domain_construct_memmap(libxl__gc *gc,
+libxl_domain_config *d_config,
+uint32_t domid,
+struct xc_hvm_build_args *args)
+{
+int rc = 0;
+unsigned int nr = 0, i;
+/* We always own at least one lowmem entry. */
+unsigned int e820_entries = 1;
+struct e820entry *e820 = NULL;
+uint64_t highmem_size =
+args-highmem_end ? args-highmem_end - (1ull  32) : 0;
+
+/* Add all rdm entries. */
+for (i = 0; i  d_config-num_rdms; i++)
+if (d_config-rdms[i].policy != LIBXL_RDM_RESERVE_POLICY_INVALID)
+e820_entries++;
+
+
+/* If we should have a highmem range. */
+if (highmem_size)
+e820_entries++;
+
+if (e820_entries = E820MAX) {
+LOG(ERROR, Ooops! Too many entries in the memory map!\n);
+rc = ERROR_INVAL;
+goto out;
+}
+
+e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+/* Low

[Xen-devel] [v9][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy

2015-07-16 Thread Tiejun Chen
This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
CC: David Scott dave.sc...@eu.citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v9:

* Nothing is changed.

v5:

* Fix the flag field as 0 to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h   |  3 ++-
 tools/libxc/xc_domain.c |  9 -
 tools/libxl/libxl_pci.c |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 
 tools/python/xen/lowlevel/xc/xc.c   | 30 --
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
  uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
  uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
 xc_interface *xch,
 uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
 {
 DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
 domctl.domain = domid;
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
 domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;
 
 return do_domctl(xch, domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
 domctl.u.assign_device.u.dt.size = size;
+/*
+ * DT doesn't own any RDM so actually DT has nothing to do
+ * for any flag and here just fix that as 0.
+ */
+domctl.u.assign_device.flag = 0;
 set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
 rc = do_domctl(xch, domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc  0  (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed);
 return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value 
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
 {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;
 
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, 
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
 
-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
if (ret  0

[Xen-devel] [v9][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-07-16 Thread Tiejun Chen
Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v9:

* Correct one indentation issue

v8:

* Merge two if{} as one if{}

* Add to print RMRR range info when stop assign a group device

v5 ~ v7:

* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c8b0455..770e484 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2294,13 +2294,37 @@ static int intel_iommu_assign_device(
 if ( list_empty(acpi_drhd_units) )
 return -ENODEV;
 
+seg = pdev-seg;
+bus = pdev-bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: in the future we can introduce group device assignment
+ * interface to make sure devices sharing RMRR are assigned to the
+ * same domain together.
+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr-segment == seg 
+ PCI_BUS(bdf) == bus 
+ PCI_DEVFN2(bdf) == devfn 
+ rmrr-scope.devices_cnt  1 )
+{
+printk(XENLOG_G_ERR VTDPREFIX
+cannot assign %04x:%02x:%02x.%u
+with shared RMRR at %PRIx64 for Dom%d.\n,
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+   rmrr-base_address, d-domain_id);
+return -EPERM;
+}
+}
+
 ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
 if ( ret )
 return ret;
 
-seg = pdev-seg;
-bus = pdev-bus;
-
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v6][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-07-08 Thread Tiejun Chen
Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v6:

* Nothing is changed.

v5:
 
* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c833290..095fb1d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2297,13 +2297,39 @@ static int intel_iommu_assign_device(
 if ( list_empty(acpi_drhd_units) )
 return -ENODEV;
 
+seg = pdev-seg;
+bus = pdev-bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: in the future we can introduce group device assignment
+ * interface to make sure devices sharing RMRR are assigned to the
+ * same domain together.
+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr-segment == seg 
+ PCI_BUS(bdf) == bus 
+ PCI_DEVFN2(bdf) == devfn )
+{
+if ( rmrr-scope.devices_cnt  1 )
+{
+printk(XENLOG_G_ERR VTDPREFIX
+cannot assign %04x:%02x:%02x.%u
+with shared RMRR for Dom%d.\n,
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+   d-domain_id);
+return -EPERM;
+}
+}
+}
+
 ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
 if ( ret )
 return ret;
 
-seg = pdev-seg;
-bus = pdev-bus;
-
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v6][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-08 Thread Tiejun Chen
When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2.

  2. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3.

  3. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource-base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource-base to 
reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 194 ++---
 1 file changed, 164 insertions(+), 30 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..397f3b7 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,31 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+static void relocate_ram_for_pci_memory(unsigned long cur_pci_mem_start)
+{
+struct xen_add_to_physmap xatp;
+unsigned int nr_pages = min_t(
+unsigned int,
+hvm_info-low_mem_pgend - (cur_pci_mem_start  PAGE_SHIFT),
+(1u  16) - 1);
+if ( hvm_info-high_mem_pgend == 0 )
+hvm_info-high_mem_pgend = 1ull  (32 - PAGE_SHIFT);
+hvm_info-low_mem_pgend -= nr_pages;
+printf(Relocating 0x%x pages from PRIllx to PRIllx\
+for lowmem MMIO hole\n,
+   nr_pages,
+   PRIllx_arg(((uint64_t)hvm_info-low_mem_pgend)PAGE_SHIFT),
+   PRIllx_arg(((uint64_t)hvm_info-high_mem_pgend)PAGE_SHIFT));
+xatp.domid = DOMID_SELF;
+xatp.space = XENMAPSPACE_gmfn_range;
+xatp.idx   = hvm_info-low_mem_pgend;
+xatp.gpfn  = hvm_info-high_mem_pgend;
+xatp.size  = nr_pages;
+if ( hypercall_memory_op(XENMEM_add_to_physmap, xatp) != 0 )
+BUG();
+hvm_info-high_mem_pgend += nr_pages;
+}
+
 void pci_setup(void)
 {
 uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -50,17 +75,22 @@ void pci_setup(void)
 /* Resources assignable to PCI devices via BARs. */
 struct resource {
 uint64_t base, max;
-} *resource, mem_resource, high_mem_resource, io_resource;
+} *resource, mem_resource, high_mem_resource, io_resource, 
exp_mem_resource;
 
 /* Create a list of device BARs in descending order of size. */
 struct bars {
-uint32_t is_64bar;
+#define PCI_BAR_IS_64BIT0x1
+#define PCI_BAR_IS_ALLOCATED0x2
+uint32_t flag;
 uint32_t devfn;
 uint32_t bar_reg;
 uint64_t bar_sz;
 } *bars = (struct bars *)scratch_start;
-unsigned int i, nr_bars = 0;
-uint64_t mmio_hole_size = 0;
+unsigned int i, j, n, nr_bars = 0;
+uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
+bool bar32_allocating = 0;
+uint64_t mmio32_unallocated_total = 0;
+unsigned long

[Xen-devel] [v6][PATCH 14/16] xen/vtd: enable USB device assignment

2015-07-08 Thread Tiejun Chen
USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++-
 xen/drivers/passthrough/vtd/utils.c |  7 ---
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 56f5911..c833290 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2245,11 +2245,9 @@ static int reassign_device_ownership(
 /*
  * If the device belongs to the hardware domain, and it has RMRR, don't
  * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.
  */
-if ( !is_hardware_domain(source) 
- !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) )
+if ( !is_hardware_domain(source) )
 {
 const struct acpi_rmrr_unit *rmrr;
 u16 bdf;
@@ -2303,13 +2301,8 @@ static int intel_iommu_assign_device(
 if ( ret )
 return ret;
 
-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
 seg = pdev-seg;
 bus = pdev-bus;
-if ( is_usb_device(seg, bus, pdev-devfn) )
-return 0;
 
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include extern.h
 #include asm/io_apic.h
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v6][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Tiejun Chen
This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = strategy=host,reserve=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Global RDM parameter, strategy, allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Some rename to make our policy reasonable
  type - strategy
  none - ignore
* Don't expose ignore in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 81 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   |  7 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  9 +
 tools/libxl/libxl_types.idl  | 18 ++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..091e80d 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item Brdm=RDM_RESERVATION_STRING
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+BRDM_RESERVE_STRING has the form C[KEY=VALUE,KEY=VALUE,... where:
+
+=over 4
+
+=item BKEY=VALUE
+
+Possible BKEYs are:
+
+=over 4
+
+=item Bstrategy=STRING
+
+Currently there is only one valid type:
+
+host means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using host includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does 
exist.
+
+For example, you're trying to set memory = 2800 to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If Bstrategy is set to host, for example,
+
+rdm = strategy=host,reserve=strict or rdm = strategy=host,reserve=relaxed
+
+It means all conflicts will be handled according to the policy
+introduced by Breserve as described below.
+
+#2. If Bstrategy is not set at all, but
+
+pci = [ 'sbdf_A, rdm_reserve=x' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by Brdm_reserve=STRING as described inside pci options.
+
+=item Breserve=STRING
+
+Specifies how to deal with conflicts when reserving reserved device
+memory in guest address space.
+
+When that conflict is unsolved,
+
+strict means VM can't be created, or the associated device can't be
+attached in the case of hotplug.
+
+relaxed allows VM to be created but may cause VM to crash if
+pass-through device accesses RDM. For exampl,e Windows IGD GFX driver
+always accessed RDM regions so it leads to VM crash.
+
+Note this may be overridden

[Xen-devel] [v6][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-08 Thread Tiejun Chen
Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Nothing is changed.

v5:

* Rephrase patch's short log
* Make libxl__domain_construct_e820() hidden

v4:

* Use goto style error handling.
* Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820.

 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_internal.h | 24 +
 tools/libxl/libxl_x86.c  | 83 
 3 files changed, 112 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 62ef120..41da479 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
+if (libxl__domain_construct_e820(gc, d_config, domid, args)) {
+LOG(ERROR, setting domain memory map failed);
+goto out;
+}
+
 ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port,
state-store_mfn, state-console_port,
state-console_mfn, state-store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b4d8419..a50449a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3794,6 +3794,30 @@ static inline void libxl__update_config_vtpm(libxl__gc 
*gc,
  */
 void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
 const libxl_bitmap *sptr);
+
+/*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+_hidden int libxl__domain_construct_e820(libxl__gc *gc,
+ libxl_domain_config *d_config,
+ uint32_t domid,
+ struct xc_hvm_build_args *args);
+
 #endif
 
 /*
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..be297b2 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x10
+int libxl__domain_construct_e820(libxl__gc *gc,
+ libxl_domain_config *d_config,
+ uint32_t domid,
+ struct xc_hvm_build_args *args)
+{
+int rc = 0;
+unsigned int nr = 0, i;
+/* We always own at least one lowmem entry. */
+unsigned int e820_entries = 1;
+struct e820entry *e820 = NULL;
+uint64_t highmem_size =
+args-highmem_end ? args-highmem_end - (1ull  32) : 0;
+
+/* Add all rdm entries. */
+for (i = 0; i  d_config-num_rdms; i++)
+if (d_config-rdms[i].flag != LIBXL_RDM_RESERVE_FLAG_INVALID)
+e820_entries++;
+
+
+/* If we should have a highmem range. */
+if (highmem_size)
+e820_entries++;
+
+if (e820_entries = E820MAX) {
+LOG(ERROR, Ooops! Too many entries in the memory map!\n);
+rc = ERROR_INVAL;
+goto out;
+}
+
+e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+/* Low memory */
+e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+e820[nr].size = args-lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+e820[nr].type

[Xen-devel] [v6][PATCH 02/16] xen/vtd: create RMRR mapping

2015-07-08 Thread Tiejun Chen
RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Acked-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m-domain) )
return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c   | 40 +++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h   | 13 +---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long 
gfn, unsigned long mfn,
  p2m-default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
   unsigned long mfn, unsigned int page_order)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int rc;
 gfn_lock(p2m, gfn, page_order);
-p2m_remove_page(p2m, gfn, mfn, page_order);
+rc = p2m_remove_page(p2m, gfn, mfn, page_order);
 gfn_unlock(p2m, gfn, page_order);
+return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma)
+{
+p2m_type_t p2mt;
+p2m_access_t a;
+mfn_t mfn;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+
+if ( !paging_mode_translate(p2m-domain) )
+return 0;
+
+gfn_lock(p2m, gfn, 0);
+
+mfn = p2m-get_entry(p2m, gfn, p2mt, a, 0, NULL);
+
+if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+p2m_mmio_direct, p2ma);
+else if ( mfn_x(mfn) == gfn  p2mt == p2m_mmio_direct  a == p2ma )
+ret = 0;
+else
+{
+ret = -EBUSY;
+printk(XENLOG_G_WARNING
+   Cannot setup identity map d%d:%lx,
+gfn already mapped to %lx.\n,
+   d-domain_id, gfn, mfn_x(mfn));
+}
+
+gfn_unlock(p2m, gfn, 0);
+return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( clear_identity_p2m_entry(d, base_pfn, 0) )
 ret = -ENXIO;
 base_pfn++;
 }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-int err = intel_iommu_map_page(d, base_pfn, base_pfn

[Xen-devel] [v6][PATCH 04/16] xen: enable XENMEM_memory_map in hvm

2015-07-08 Thread Tiejun Chen
This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: George Dunlap george.dun...@eu.citrix.com
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c  | 6 --
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-if ( is_hvm_domain(d) )
-{
-rcu_unlock_domain(d);
-return -EPERM;
-}
-
 e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
 if ( e820 == NULL )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v6][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy

2015-07-08 Thread Tiejun Chen
This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
CC: David Scott dave.sc...@eu.citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Nothing is changed.

v5:

* Fix the flag field as 0 to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h   |  3 ++-
 tools/libxc/xc_domain.c |  9 -
 tools/libxl/libxl_pci.c |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 
 tools/python/xen/lowlevel/xc/xc.c   | 30 --
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
  uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
  uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
 xc_interface *xch,
 uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
 {
 DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
 domctl.domain = domid;
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
 domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;
 
 return do_domctl(xch, domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
 domctl.u.assign_device.u.dt.size = size;
+/*
+ * DT doesn't own any RDM so actually DT has nothing to do
+ * for any flag and here just fix that as 0.
+ */
+domctl.u.assign_device.flag = 0;
 set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
 rc = do_domctl(xch, domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc  0  (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed);
 return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value 
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
 {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;
 
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, 
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
 
-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
if (ret  0

[Xen-devel] [v6][PATCH 05/16] hvmloader: get guest memory map into memory_map[]

2015-07-08 Thread Tiejun Chen
Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate printf()+BUG() and BUG_ON()

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c  | 35 +++
 tools/firmware/hvmloader/e820.h  |  7 +++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c  | 26 ++
 tools/firmware/hvmloader/util.h  | 12 
 5 files changed, 82 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..3e53c47 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,41 @@
 #include config.h
 #include util.h
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, nr_entries);
+
+if ( rc || !nr_entries )
+{
+printf(Get guest memory maps[%d] failed. (%d)\n, nr_entries, rc);
+BUG();
+}
+
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i  nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED )
+{
+if ( check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+{
+printf(Fail to setup memory map due to conflict);
+printf( on dynamic reserved memory range.\n);
+BUG();
+}
+}
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
 uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
 init_hypercalls();
 
+memory_map_setup();
+
 xenbus_setup();
 
 bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include xen/memory.h
 #include xen/sched.h
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+   uint64_t reserved_start, uint64_t reserved_size)
+{
+return (start + size  reserved_start) 
+(start  reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
 asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
 *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+int rc;
+struct xen_memory_map memmap = {
+.nr_entries = *max_entries
+};
+
+set_xen_guest_handle(memmap.buffer, entries);
+
+rc = hypercall_memory_op(XENMEM_memory_map, memmap);
+*max_entries = memmap.nr_entries;
+
+return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
 static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index f99c0f19..1100a3b 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10

[Xen-devel] [v6][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-08 Thread Tiejun Chen
Now we can use that memory map to build our final
e820 table but it may need to reorder all e820
entries.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6:

* Nothing is changed.

v5:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.

 tools/firmware/hvmloader/e820.c | 80 +
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 3e53c47..aa2569f 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint64_t add_high_mem = 0;
+uint64_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;
 
-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -194,16 +189,73 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
-
-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
+e820[nr] = memory_map.map[i];
 nr++;
 }
 
+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));
+
+/*
+ * We may need to adjust real lowmem end since we may
+ * populate RAM to get enough MMIO previously.
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+uint64_t end = e820[i].addr + e820[i].size;
+if ( e820[i].type == E820_RAM 
+ low_mem_end  e820[i].addr  low_mem_end  end )
+{
+add_high_mem = end - low_mem_end;
+e820[i].size = low_mem_end - e820[i].addr;
+}
+}
+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr  (1ull  32))
+e820[i].size += add_high_mem;
+}
+}
+
+/* Finally we need to reorder all e820 entries. */
+for ( j = 0; j  nr-1; j++ )
+{
+for ( i = j+1; i  nr; i++ )
+{
+if ( e820[j].addr  e820[i].addr )
+{
+struct e820entry tmp;
+tmp = e820[j];
+e820[j] = e820[i];
+e820[i] = tmp;
+}
+}
+}
+
 return nr;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v7][PATCH 01/16] xen: introduce XENMEM_reserved_device_memory_map

2015-07-08 Thread Tiejun Chen
From: Jan Beulich jbeul...@suse.com

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich jbeul...@suse.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Jan Beulich jbeul...@suse.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v7:

* Nothing is changed.

v6:

* Add a comments to the nr_entries field inside xen_reserved_device_memory_map

v5 ~ v4:

* Nothing is changed.

 xen/common/compat/memory.c   | 66 
 xen/common/memory.c  | 64 ++
 xen/drivers/passthrough/iommu.c  | 10 ++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h  | 37 +++-
 xen/include/xen/iommu.h  | 10 ++
 xen/include/xen/pci.h|  2 ++
 xen/include/xlat.lst |  3 +-
 10 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct compat_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+struct compat_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+return -ERANGE;
+
+if ( __copy_to_compat_offset(grdm-map.buffer,
+ grdm-used_entries,
+ rdm,
+ 1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1;
+}
+
+return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
 int split, op = cmd  MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) compat)
 break;
 }
 
+#ifdef HAS_PASSTHROUGH
+case XENMEM_reserved_device_memory_map:
+{
+struct get_reserved_device_memory grdm;
+
+if ( copy_from_guest(grdm.map, compat, 1) ||
+ !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+return -EFAULT;
+
+grdm.used_entries = 0;
+rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+  grdm);
+
+if ( !rc  grdm.map.nr_entries  grdm.used_entries )
+rc = -ENOBUFS;
+
+grdm.map.nr_entries = grdm.used_entries;
+if ( grdm.map.nr_entries )
+{
+if ( __copy_to_guest(compat, grdm.map, 1) )
+rc = -EFAULT;
+}
+
+return rc;
+}
+#endif
+
 default:
 return compat_arch_memory_op(cmd, compat);
 }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
 return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct xen_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+
+sbdf = PCI_SBDF2(grdm-map.seg, grdm-map.bus, grdm-map.devfn);
+if ( (grdm-map.flag  PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm-used_entries  grdm-map.nr_entries )
+{
+struct xen_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+if ( __copy_to_guest_offset(grdm-map.buffer,
+grdm-used_entries,
+rdm,
+1) )
+{
+return -EFAULT;
+}
+}
+++grdm-used_entries;
+return 1

[Xen-devel] [v7][PATCH 03/16] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-08 Thread Tiejun Chen
This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is strict or relaxed when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
CC: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Stefano Stabellini stefano.stabell...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v7:

* Nothing is changed.

v5:

* Just leave one bit XEN_DOMCTL_DEV_RDM_RELAXED as our flag, so
  0 means strict and 1 means relaxed.

* So make DT device ignore the flag field

* Improve the code comments

v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user 
domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl-u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  - domctl-u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c   |  7 +++--
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c  |  2 +-
 xen/drivers/passthrough/device_tree.c   |  3 ++-
 xen/drivers/passthrough/pci.c   | 15 ---
 xen/drivers/passthrough/vtd/iommu.c | 40 +++--
 xen/include/asm-x86/p2m.h   |  2 +-
 xen/include/public/domctl.h |  3 +++
 xen/include/xen/iommu.h |  2 +-
 9 files changed, 58 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 99a26ca..47785dc 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, 
mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-   p2m_access_t p2ma)
+   p2m_access_t p2ma, unsigned int flag)
 {
 p2m_type_t p2mt;
 p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
 ret = 0;
 else
 {
-ret = -EBUSY;
+if ( flag  XEN_DOMCTL_DEV_RDM_RELAXED )
+ret = 0;
+else
+ret = -EBUSY;
 printk(XENLOG_G_WARNING
Cannot setup identity map d%d:%lx,
 gfn already mapped to %lx.\n,
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct 
domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-   struct pci_dev *pdev)
+   struct pci_dev *pdev,
+   u32 flag)
 {
 struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev-seg);
 int bdf = PCI_BDF2(pdev-bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct 
iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-  struct device *dev)
+  struct device *dev, u32 flag)
 {
struct iommu_domain *domain;
struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 5d3842a..7ff79f8 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 goto fail;
 }
 
-rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev));
+/* The flag field doesn't matter to DT device. */
+rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev), 0);
 
 if ( rc )
 goto fail;
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e30be43..6e23fc6 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1335,7

[Xen-devel] [v7][PATCH 10/16] tools: introduce some new parameters to set rdm policy

2015-07-08 Thread Tiejun Chen
This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Global RDM parameter, strategy, allows user to specify reserved regions
explicitly, Currently, using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. By default this isn't set so we don't
check all rdms. Instead, we just check rdm specific to a given device if
you're assigning this kind of device. Note this option is not recommended
unless you can make sure any conflict does exist.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM can't keep running, while 'relaxed' allows moving forward with a
warning message thrown out.

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Acked-by: Ian Jackson ian.jack...@eu.citrix.com
Acked-by: Ian Campbell ian.campb...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v7:

* Need to rename some parameters:
  In the xl rdm config parsing, `reserve=' should be `policy='.
  In the xl pci config parsing, `rdm_reserve=' should be `rdm_policy='.
  The type `libxl_rdm_reserve_flag' should be `libxl_rdm_policy'.
  The field name `reserve' in `libxl_rdm_reserve' should be `policy'.

v6:

* Some rename to make our policy reasonable
  type - strategy
  none - ignore
* Don't expose ignore in xl level and just keep that as a default.
  And then sync docs and the patch head description

v5:

* Just make sure the per-device plicy always override the global policy,
  and so cleanup some associated comments and the patch head description.
* A little change to follow one bit, XEN_DOMCTL_DEV_RDM_RELAXED.
* Improve all descriptions in doc.
* Make all rdm variables specific to .hvm

v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 81 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   |  7 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  9 +
 tools/libxl/libxl_types.idl  | 18 ++
 6 files changed, 141 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..6c55a8b 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,79 @@ assigned slave device.
 
 =back
 
+=item Brdm=RDM_RESERVATION_STRING
+
+(HVM/x86 only) Specifies information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+BRDM_RESERVE_STRING has the form C[KEY=VALUE,KEY=VALUE,... where:
+
+=over 4
+
+=item BKEY=VALUE
+
+Possible BKEYs are:
+
+=over 4
+
+=item Bstrategy=STRING
+
+Currently there is only one valid type:
+
+host means all reserved device memory on this platform should be checked to
+reserve regions in this VM's guest address space. This global rdm parameter
+allows user to specify reserved regions explicitly, and using host includes
+all reserved regions reported on this platform, which is useful when doing
+hotplug.
+
+By default this isn't set so we don't check all rdms. Instead, we just check
+rdm specific to a given device if you're assigning this kind of device. Note
+this option is not recommended unless you can make sure any conflict does 
exist.
+
+For example, you're trying to set memory = 2800 to allocate memory to one
+given VM but the platform owns two RDM regions like,
+
+Device A [sbdf_A]: RMRR region_A: base_addr ac6d3000 end_address ac6e6fff
+Device B [sbdf_B]: RMRR region_B: base_addr ad80 end_address afff
+
+In this conflict case,
+
+#1. If Bstrategy is set to host, for example,
+
+rdm = strategy=host,policy=strict or rdm = strategy=host,policy=relaxed
+
+It means all conflicts will be handled according to the policy
+introduced by Bpolicy as described below.
+
+#2. If Bstrategy is not set at all, but
+
+pci = [ 'sbdf_A, rdm_policy=x' ]
+
+It means only one conflict of region_A will be handled according to the policy
+introduced by Brdm_policy=STRING as described inside pci options.
+
+=item Bpolicy=STRING
+
+Specifies how to deal

[Xen-devel] [v7][PATCH 08/16] tools/libxc: Expose new hypercall xc_reserved_device_memory_map

2015-07-08 Thread Tiejun Chen
We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v4 ~ v7:

* Nothing is changed.

 tools/libxc/include/xenctrl.h |  8 
 tools/libxc/xc_domain.c   | 36 
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..9160623 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
   uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ce51e69..0951291 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
 return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries)
+{
+int rc;
+struct xen_reserved_device_memory_map xrdmmap = {
+.flag = flag,
+.seg = seg,
+.bus = bus,
+.devfn = devfn,
+.nr_entries = *max_entries
+};
+DECLARE_HYPERCALL_BOUNCE(entries,
+ sizeof(struct xen_reserved_device_memory) *
+ *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+if ( xc_hypercall_bounce_pre(xch, entries) )
+return -1;
+
+set_xen_guest_handle(xrdmmap.buffer, entries);
+
+rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+  xrdmmap, sizeof(xrdmmap));
+
+xc_hypercall_bounce_post(xch, entries);
+
+*max_entries = xrdmmap.nr_entries;
+
+return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v7][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary

2015-07-08 Thread Tiejun Chen
Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter rdm_mem_boundary in .cfg file.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Make this variable rdm_mem_boundary_memkb specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5   | 22 ++
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_create.c  |  4 
 tools/libxl/libxl_dom.c |  8 +---
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c|  3 +++
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 6c55a8b..23068ec 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -867,6 +867,28 @@ More information about Xen gfx_passthru feature is 
available
 on the XenVGAPassthrough Lhttp://wiki.xen.org/wiki/XenVGAPassthrough
 wiki page.
 
+=item Brdm_mem_boundary=MBYTES
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+#1. Above a predefined boundary
+- move lowmem_end below reserved region to solve conflict;
+
+#2. Below a predefined boundary
+- Check strict/relaxed policy.
+strict policy leads to fail libxl. Note when both policies
+are specified on a given region, 'strict' is always preferred.
+relaxed policy issue a warning message and also mask this
+entry INVALID to indicate we shouldn't expose this entry to
+hvmloader.
+
+Here the default is 2G.
+
 =item Bdtdev=[ DTDEV_PATH, DTDEV_PATH, ... ]
 
 Specifies the host device tree nodes to passthrough to this guest. Each
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..6f157c9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
 uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index c8a32d5..3de86a6 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, 
libxl_domain_build_info *b_info)
 {
 if (b_info-u.hvm.rdm.policy == LIBXL_RDM_RESERVE_POLICY_INVALID)
 b_info-u.hvm.rdm.policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+
+if (b_info-u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+b_info-u.hvm.rdm_mem_boundary_memkb =
+LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index f3c39a0..62ef120 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 int ret, rc = ERROR_FAIL;
 uint64_t mmio_start, lowmem_end, highmem_end;
 libxl_domain_build_info *const info = d_config-b_info;
-/*
- * Currently we fix this as 2G to guarantte how to handle
- * our rdm policy. But we'll provide a parameter to set
- * this dynamically.
- */
-uint64_t rdm_mem_boundary = 0x8000;
 
 memset(args, 0, sizeof(struct xc_hvm_build_args));
 /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 args.mmio_start = mmio_start;
 
 ret = libxl__domain_device_construct_rdm(gc, d_config,
- rdm_mem_boundary,
+ 
info-u.hvm.rdm_mem_boundary_memkb*1024,
  args);
 if (ret) {
 LOG(ERROR, checking reserved device memory failed);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index a3ad8d1..4eb4f8a 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -484,6 +484,7 @@ libxl_domain_build_info = Struct(domain_build_info

[Xen-devel] [v7][PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-07-08 Thread Tiejun Chen
Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v5 ~ v7:

* Nothing is changed.

v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c833290..095fb1d 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2297,13 +2297,39 @@ static int intel_iommu_assign_device(
 if ( list_empty(acpi_drhd_units) )
 return -ENODEV;
 
+seg = pdev-seg;
+bus = pdev-bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: in the future we can introduce group device assignment
+ * interface to make sure devices sharing RMRR are assigned to the
+ * same domain together.
+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr-segment == seg 
+ PCI_BUS(bdf) == bus 
+ PCI_DEVFN2(bdf) == devfn )
+{
+if ( rmrr-scope.devices_cnt  1 )
+{
+printk(XENLOG_G_ERR VTDPREFIX
+cannot assign %04x:%02x:%02x.%u
+with shared RMRR for Dom%d.\n,
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+   d-domain_id);
+return -EPERM;
+}
+}
+}
+
 ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
 if ( ret )
 return ret;
 
-seg = pdev-seg;
-bus = pdev-bus;
-
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v7][PATCH 05/16] hvmloader: get guest memory map into memory_map[]

2015-07-08 Thread Tiejun Chen
Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
---
v5 ~ v7:

* Nothing is changed.

v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate printf()+BUG() and BUG_ON()

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c  | 35 +++
 tools/firmware/hvmloader/e820.h  |  7 +++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c  | 26 ++
 tools/firmware/hvmloader/util.h  | 12 
 5 files changed, 82 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..3e53c47 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,41 @@
 #include config.h
 #include util.h
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, nr_entries);
+
+if ( rc || !nr_entries )
+{
+printf(Get guest memory maps[%d] failed. (%d)\n, nr_entries, rc);
+BUG();
+}
+
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i  nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED )
+{
+if ( check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+{
+printf(Fail to setup memory map due to conflict);
+printf( on dynamic reserved memory range.\n);
+BUG();
+}
+}
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
 uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
 init_hypercalls();
 
+memory_map_setup();
+
 xenbus_setup();
 
 bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include xen/memory.h
 #include xen/sched.h
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+   uint64_t reserved_start, uint64_t reserved_size)
+{
+return (start + size  reserved_start) 
+(start  reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
 asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
 *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+int rc;
+struct xen_memory_map memmap = {
+.nr_entries = *max_entries
+};
+
+set_xen_guest_handle(memmap.buffer, entries);
+
+rc = hypercall_memory_op(XENMEM_memory_map, memmap);
+*max_entries = memmap.nr_entries;
+
+return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
 static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index f99c0f19..1100a3b 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10 @@
 #include stdarg.h
 #include

[Xen-devel] [v7][PATCH 09/16] tools: extend xc_assign_device() to support rdm reservation policy

2015-07-08 Thread Tiejun Chen
This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
CC: David Scott dave.sc...@eu.citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v7:

* Nothing is changed.

v5:

* Fix the flag field as 0 to DT device

v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h   |  3 ++-
 tools/libxc/xc_domain.c |  9 -
 tools/libxl/libxl_pci.c |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 16 
 tools/python/xen/lowlevel/xc/xc.c   | 30 --
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
  uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
  uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..ef41228 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
 xc_interface *xch,
 uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
 {
 DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
 domctl.domain = domid;
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
 domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;
 
 return do_domctl(xch, domctl);
 }
@@ -1792,6 +1794,11 @@ int xc_assign_dt_device(
 
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
 domctl.u.assign_device.u.dt.size = size;
+/*
+ * DT doesn't own any RDM so actually DT has nothing to do
+ * for any flag and here just fix that as 0.
+ */
+domctl.u.assign_device.flag = 0;
 set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
 rc = do_domctl(xch, domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx-xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc  0  (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, xc_assign_device failed);
 return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..b7de615 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,17 @@ CAMLprim value stub_xc_domain_test_assign_device(value 
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_RDM_RELAXED,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
 {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;
 
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1190,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, 
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
 
-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
if (ret  0

[Xen-devel] [v7][PATCH 00/16] Fix RMRR

2015-07-08 Thread Tiejun Chen
/xen-devel/2015-01/msg01580.html

Below is a key summary of this patch set according to agreed proposal:

1. Use RDM (Reserved Device Memory) name in user space as a general 
description instead of using ACPI RMRR name directly.

2. Introduce configuration parameters to allow user control both per-device 
and global RDM resources along with desired policies upon a detected conflict.

3. Introduce a new hypercall to query global and per-device RDM resources.

4. Extend libxl to be a central place to manage RDM resources and handle 
potential conflicts between reserved regions and gfn space. One simplification
goal is made to keep existing lowmem / mmio / highmem layout which is
passed around various function blocks. So a reasonable assumption
is made, that conflicts falling into below areas are not re-arranged otherwise
it will result in a more scattered layout:
a) in highmem region (4G)
b) in lowmem region, and below a predefined boundary (default 2G)
  a) is a new assumption not discussed before. From VT-d spec this is 
possible but no such observation in real-world. So we can make this
reasonable assumption until there's real usage on it.

5. Extend XENMEM_set_memory_map usable for HVM guest, and then have
libxl to use that hypercall to carry RDM information to hvmloader. There
is one difference from original discussion. Previously we discussed to
introduce a new E820 type specifically for RDM entries. After more thought
we think it's OK to just tag them as E820_reserved. Actually hvmloader
doesn't need to know whether the reserved entries come from RDM or
from other purposes. 

6. Then in hvmloader the change is generic for XENMEM_memory_map
change. Given a predefined memory layout, hvmloader should avoid
allocating all reserved entries for other usages (opregion, mmio, etc.)

7. Extend existing device passthrough hypercall to carry conflict handling
policy.

8. Setup identity map in p2m layer for RMRRs reported for the given
device. And conflicts are handled according to specified policy in hypercall.

Current patch set contains core enhancements calling for comments.
There are still several tasks not implemented now. We'll include them
in final version after RFC is agreed:

- remove existing USB hack
- detect and fail assigning device which has a shared RMRR with another device
- add a config parameter to configure that memory boundary flexibly
- In the case of hotplug we also need to figure out a way to fix that policy
  conflict between the per-pci policy and the global policy but firstly we think
  we'd better collect some good or correct ideas to step next in RFC. 

So here I made this as RFC to collect your any comments.


Jan Beulich (1):
  xen: introduce XENMEM_reserved_device_memory_map

Tiejun Chen (15):
  xen/vtd: create RMRR mapping
  xen/passthrough: extend hypercall to support rdm reservation policy
  xen: enable XENMEM_memory_map in hvm
  hvmloader: get guest memory map into memory_map[]
  hvmloader/pci: skip reserved ranges
  hvmloader/e820: construct guest e820 table
  tools/libxc: Expose new hypercall xc_reserved_device_memory_map
  tools: extend xc_assign_device() to support rdm reservation policy
  tools: introduce some new parameters to set rdm policy
  tools/libxl: detect and avoid conflicts with RDM
  tools: introduce a new parameter to set a predefined rdm boundary
  libxl: construct e820 map with RDM information for HVM guest
  xen/vtd: enable USB device assignment
  xen/vtd: prevent from assign the device with shared rmrr
  tools: parse to enable new rdm policy parameters

Jan Beulich (1):
  xen: introduce XENMEM_reserved_device_memory_map

 docs/man/xl.cfg.pod.5   | 103 
 docs/misc/vtd.txt   |  24 ++
 tools/firmware/hvmloader/e820.c | 115 +++--
 tools/firmware/hvmloader/e820.h |   7 +
 tools/firmware/hvmloader/hvmloader.c|   2 +
 tools/firmware/hvmloader/pci.c  | 194 +++---
 tools/firmware/hvmloader/util.c |  26 ++
 tools/firmware/hvmloader/util.h |  12 +
 tools/libxc/include/xenctrl.h   |  11 +-
 tools/libxc/xc_domain.c |  45 +++-
 tools/libxl/libxl.h |   6 +
 tools/libxl/libxl_create.c  |  13 +-
 tools/libxl/libxl_dm.c  | 264 
 tools/libxl/libxl_dom.c |  16 +-
 tools/libxl/libxl_internal.h|  37 ++-
 tools/libxl/libxl_pci.c |  12 +-
 tools/libxl/libxl_types.idl |  26 ++
 tools/libxl/libxl_x86.c |  83 ++
 tools/libxl/libxlu_pci.c|  90 +++
 tools/libxl/libxlutil.h |   4 +
 tools/libxl/xl_cmdimpl.c|  16 ++
 tools/ocaml/libs/xc/xenctrl_stubs.c

[Xen-devel] [v7][PATCH 13/16] libxl: construct e820 map with RDM information for HVM guest

2015-07-08 Thread Tiejun Chen
Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist, and hvmloader would need this info
later.

Note this guest e820 table would be same as before if the
platform has no any RDM or we disable RDM (by default).

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Nothing is changed.

v5:

* Rephrase patch's short log
* Make libxl__domain_construct_e820() hidden

v4:

* Use goto style error handling.
* Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820.

 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_internal.h | 24 +
 tools/libxl/libxl_x86.c  | 83 
 3 files changed, 112 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 62ef120..41da479 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
+if (libxl__domain_construct_e820(gc, d_config, domid, args)) {
+LOG(ERROR, setting domain memory map failed);
+goto out;
+}
+
 ret = hvm_build_set_params(ctx-xch, domid, info, state-store_port,
state-store_mfn, state-console_port,
state-console_mfn, state-store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index b4d8419..a50449a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3794,6 +3794,30 @@ static inline void libxl__update_config_vtpm(libxl__gc 
*gc,
  */
 void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
 const libxl_bitmap *sptr);
+
+/*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+_hidden int libxl__domain_construct_e820(libxl__gc *gc,
+ libxl_domain_config *d_config,
+ uint32_t domid,
+ struct xc_hvm_build_args *args);
+
 #endif
 
 /*
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..68bd1d2 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x10
+int libxl__domain_construct_e820(libxl__gc *gc,
+ libxl_domain_config *d_config,
+ uint32_t domid,
+ struct xc_hvm_build_args *args)
+{
+int rc = 0;
+unsigned int nr = 0, i;
+/* We always own at least one lowmem entry. */
+unsigned int e820_entries = 1;
+struct e820entry *e820 = NULL;
+uint64_t highmem_size =
+args-highmem_end ? args-highmem_end - (1ull  32) : 0;
+
+/* Add all rdm entries. */
+for (i = 0; i  d_config-num_rdms; i++)
+if (d_config-rdms[i].policy != LIBXL_RDM_RESERVE_POLICY_INVALID)
+e820_entries++;
+
+
+/* If we should have a highmem range. */
+if (highmem_size)
+e820_entries++;
+
+if (e820_entries = E820MAX) {
+LOG(ERROR, Ooops! Too many entries in the memory map!\n);
+rc = ERROR_INVAL;
+goto out;
+}
+
+e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+/* Low memory */
+e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT

[Xen-devel] [v7][PATCH 16/16] tools: parse to enable new rdm policy parameters

2015-07-08 Thread Tiejun Chen
This patch parses to enable user configurable parameters to specify
RDM resource and according policies,

Global RDM parameter:
rdm = strategy=host,policy=strict/relaxed
Per-device RDM parameter:
pci = [ 'sbdf, rdm_policy=strict/relaxed' ]

Default per-device RDM policy is same as default global RDM policy as being
'relaxed'. And the per-device policy would override the global policy like
others.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* Just sync those renames introduced by patch #10.

v5:

* Need a rebase after we make all rdm variables specific to .hvm.
* Like other pci option, the per-device policy always follows
  the global policy by default.

v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 90 
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 13 +++
 3 files changed, 107 insertions(+)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..b8933d2 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, 
unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE  9
+#define STATE_RDM_STRATEGY  10
+#define STATE_RESERVE_POLICY11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char 
*str)
 {
 unsigned state = STATE_DOMAIN;
@@ -143,6 +146,17 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci 
*pcidev, const char *str
 pcidev-permissive = atoi(tok);
 }else if ( !strcmp(optkey, seize) ) {
 pcidev-seize = atoi(tok);
+}else if ( !strcmp(optkey, rdm_policy) ) {
+if ( !strcmp(tok, strict) ) {
+pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if ( !strcmp(tok, relaxed) ) {
+pcidev-rdm_policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, %s is not an valid PCI RDM property
+   policy: 'strict' or 'relaxed'.,
+ tok);
+goto parse_error;
+}
 }else{
 XLU__PCI_ERR(cfg, Unknown PCI BDF option: %s, optkey);
 }
@@ -167,6 +181,82 @@ parse_error:
 return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+unsigned state = STATE_TYPE;
+char *buf2, *tok, *ptr, *end;
+
+if (NULL == (buf2 = ptr = strdup(str)))
+return ERROR_NOMEM;
+
+for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr  end; ptr++) {
+switch(state) {
+case STATE_TYPE:
+if (*ptr == '=') {
+state = STATE_RDM_STRATEGY;
+*ptr = '\0';
+if (strcmp(tok, strategy)) {
+XLU__PCI_ERR(cfg, Unknown RDM state option: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RDM_STRATEGY:
+if (*ptr == '\0' || *ptr == ',') {
+state = STATE_RESERVE_POLICY;
+*ptr = '\0';
+if (!strcmp(tok, host)) {
+rdm-strategy = LIBXL_RDM_RESERVE_STRATEGY_HOST;
+} else {
+XLU__PCI_ERR(cfg, Unknown RDM strategy option: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RESERVE_POLICY:
+if (*ptr == '=') {
+state = STATE_OPTIONS_V;
+*ptr = '\0';
+if (strcmp(tok, policy)) {
+XLU__PCI_ERR(cfg, Unknown RDM property value: %s, tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_OPTIONS_V:
+if (*ptr == ',' || *ptr == '\0') {
+state = STATE_TERMINAL;
+*ptr = '\0';
+if (!strcmp(tok, strict)) {
+rdm-policy = LIBXL_RDM_RESERVE_POLICY_STRICT;
+} else if (!strcmp(tok, relaxed)) {
+rdm-policy = LIBXL_RDM_RESERVE_POLICY_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, Unknown RDM property policy

[Xen-devel] [v7][PATCH 14/16] xen/vtd: enable USB device assignment

2015-07-08 Thread Tiejun Chen
USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Acked-by: Kevin Tian kevin.t...@intel.com
---
v5 ~ v7:

* Nothing is changed.

v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++-
 xen/drivers/passthrough/vtd/utils.c |  7 ---
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 56f5911..c833290 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2245,11 +2245,9 @@ static int reassign_device_ownership(
 /*
  * If the device belongs to the hardware domain, and it has RMRR, don't
  * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.
  */
-if ( !is_hardware_domain(source) 
- !is_usb_device(pdev-seg, pdev-bus, pdev-devfn) )
+if ( !is_hardware_domain(source) )
 {
 const struct acpi_rmrr_unit *rmrr;
 u16 bdf;
@@ -2303,13 +2301,8 @@ static int intel_iommu_assign_device(
 if ( ret )
 return ret;
 
-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
 seg = pdev-seg;
 bus = pdev-bus;
-if ( is_usb_device(seg, bus, pdev-devfn) )
-return 0;
 
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include extern.h
 #include asm/io_apic.h
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v7][PATCH 04/16] xen: enable XENMEM_memory_map in hvm

2015-07-08 Thread Tiejun Chen
This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Reviewed-by: Kevin Tian kevin.t...@intel.com
Acked-by: Jan Beulich jbeul...@suse.com
Acked-by: George Dunlap george.dun...@eu.citrix.com
---
v5 ~ v7:

* Nothing is changed.

v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c  | 6 --
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..638daee 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4741,7 +4741,6 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
@@ -4817,7 +4816,6 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd  MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index fd151c6..92eccd0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-if ( is_hvm_domain(d) )
-{
-rcu_unlock_domain(d);
-return -EPERM;
-}
-
 e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
 if ( e820 == NULL )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v7][PATCH 11/16] tools/libxl: detect and avoid conflicts with RDM

2015-07-08 Thread Tiejun Chen
While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RDM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

#1. Above a predefined boundary (2G)
- move lowmem_end below reserved region to solve conflict;

#2. Below a predefined boundary (2G)
- Check strict/relaxed policy.
strict policy leads to fail libxl. Note when both policies
are specified on a given region, 'strict' is always preferred.
relaxed policy issue a warning message and also mask this entry 
INVALID
to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Acked-by: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
Reviewed-by: Kevin Tian kevint.t...@intel.com
---
v7:

* Just sync with the fallout of renaming parameters from patch #10.

v6:

* fix some code stypes
* Refine libxl__xc_device_get_rdm()

v5:

* A little change to make sure the per-device policy always override the global
  policy and correct its associated code comments.
* Fix one typo in the patch head description
* Rename xc_device_get_rdm() with libxl__xc_device_get_rdm(), and then replace
  malloc() with libxl__malloc(), and finally cleanup this fallout.
* libxl__xc_device_get_rdm() should return proper libxl error code, ERROR_FAIL.
  Then instead, the allocated RDM entries would be returned with an out 
parameter.

v4:

* Consistent to use term RDM.
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c   | 264 +++
 tools/libxl/libxl_dom.c  |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 298 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index f75d4f1..c8a32d5 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -459,7 +459,7 @@ int libxl__domain_build(libxl__gc *gc,
 
 switch (info-type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-ret = libxl__build_hvm(gc, domid, info, state);
+ret = libxl__build_hvm(gc, domid, d_config, state);
 if (ret)
 goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 317a8eb..54b67ee 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,270 @@ const char *libxl__domain_device_model(libxl__gc *gc,
 return dm;
 }
 
+static int
+libxl__xc_device_get_rdm(libxl__gc *gc,
+ uint32_t flag,
+ uint16_t seg,
+ uint8_t bus,
+ uint8_t devfn,
+ unsigned int *nr_entries,
+ struct xen_reserved_device_memory **xrdm)
+{
+int rc = 0, r;
+
+/*
+ * We really can't presume how many entries we can get in advance.
+ */
+*nr_entries = 0;
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  NULL, nr_entries);
+assert(r = 0);
+/* 0 means we have no any rdm entry. */
+if (!r) goto out;
+
+if (errno != ENOBUFS) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+*xrdm = libxl__malloc(gc,
+  *nr_entries * sizeof(xen_reserved_device_memory_t));
+r = xc_reserved_device_memory_map(CTX-xch, flag, seg, bus, devfn,
+  *xrdm, nr_entries);
+if (r)
+rc = ERROR_FAIL;
+
+ out:
+if (rc) {
+*nr_entries = 0;
+*xrdm = NULL;
+LOG(ERROR, Could not get reserved device memory maps.\n);
+}
+return rc;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+ uint64_t rdm_start, uint64_t rdm_size

[Xen-devel] [v7][PATCH 02/16] xen/vtd: create RMRR mapping

2015-07-08 Thread Tiejun Chen
RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

And we also need to introduce a pair of helper to create/clear this
sort of identity mapping as follows:

set_identity_p2m_entry():

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

clear_identity_p2m_entry():

We just define macro to wrapper guest_physmap_remove_page() with
a returning value as necessary.

CC: Tim Deegan t...@xen.org
CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Yang Zhang yang.z.zh...@intel.com
CC: Kevin Tian kevin.t...@intel.com
Reviewed-by: Kevin Tian kevin.t...@intel.com
Reviewed-by: Tim Deegan t...@xen.org
Acked-by: George Dunlap george.dun...@eu.citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v7:

* Nothing is changed.

v5:

* Fold our original patch #2 and #3 as this new

* Introduce a new, clear_identity_p2m_entry, which can wrapper
  guest_physmap_remove_page(). And we use this to clean our
  identity mapping. 

v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m-domain) )
return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/arch/x86/mm/p2m.c   | 40 +++--
 xen/drivers/passthrough/vtd/iommu.c |  5 ++---
 xen/include/asm-x86/p2m.h   | 13 +---
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6b39733..99a26ca 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long 
gfn, unsigned long mfn,
  p2m-default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
   unsigned long mfn, unsigned int page_order)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int rc;
 gfn_lock(p2m, gfn, page_order);
-p2m_remove_page(p2m, gfn, mfn, page_order);
+rc = p2m_remove_page(p2m, gfn, mfn, page_order);
 gfn_unlock(p2m, gfn, page_order);
+return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma)
+{
+p2m_type_t p2mt;
+p2m_access_t a;
+mfn_t mfn;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+
+if ( !paging_mode_translate(p2m-domain) )
+return 0;
+
+gfn_lock(p2m, gfn, 0);
+
+mfn = p2m-get_entry(p2m, gfn, p2mt, a, 0, NULL);
+
+if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+p2m_mmio_direct, p2ma);
+else if ( mfn_x(mfn) == gfn  p2mt == p2m_mmio_direct  a == p2ma )
+ret = 0;
+else
+{
+ret = -EBUSY;
+printk(XENLOG_G_WARNING
+   Cannot setup identity map d%d:%lx,
+gfn already mapped to %lx.\n,
+   d-domain_id, gfn, mfn_x(mfn));
+}
+
+gfn_unlock(p2m, gfn, 0);
+return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..8415958 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( clear_identity_p2m_entry(d, base_pfn, 0) )
 ret = -ENXIO;
 base_pfn++;
 }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn  end_pfn )
 {
-int err = intel_iommu_map_page(d, base_pfn, base_pfn

[Xen-devel] [v7][PATCH 06/16] hvmloader/pci: skip reserved ranges

2015-07-08 Thread Tiejun Chen
When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v6 ~ v7:

* Nothing is changed.

v5:

* Rename that field, is_64bar, inside struct bars with flag, and
  then extend to also indicate if this bar is already allocated.

v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2.

  2. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3.

  3. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource-base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource-base to 
reserved_end
  just to reallocate this bar.

 tools/firmware/hvmloader/pci.c | 194 ++---
 1 file changed, 164 insertions(+), 30 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..397f3b7 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,31 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+static void relocate_ram_for_pci_memory(unsigned long cur_pci_mem_start)
+{
+struct xen_add_to_physmap xatp;
+unsigned int nr_pages = min_t(
+unsigned int,
+hvm_info-low_mem_pgend - (cur_pci_mem_start  PAGE_SHIFT),
+(1u  16) - 1);
+if ( hvm_info-high_mem_pgend == 0 )
+hvm_info-high_mem_pgend = 1ull  (32 - PAGE_SHIFT);
+hvm_info-low_mem_pgend -= nr_pages;
+printf(Relocating 0x%x pages from PRIllx to PRIllx\
+for lowmem MMIO hole\n,
+   nr_pages,
+   PRIllx_arg(((uint64_t)hvm_info-low_mem_pgend)PAGE_SHIFT),
+   PRIllx_arg(((uint64_t)hvm_info-high_mem_pgend)PAGE_SHIFT));
+xatp.domid = DOMID_SELF;
+xatp.space = XENMAPSPACE_gmfn_range;
+xatp.idx   = hvm_info-low_mem_pgend;
+xatp.gpfn  = hvm_info-high_mem_pgend;
+xatp.size  = nr_pages;
+if ( hypercall_memory_op(XENMEM_add_to_physmap, xatp) != 0 )
+BUG();
+hvm_info-high_mem_pgend += nr_pages;
+}
+
 void pci_setup(void)
 {
 uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -50,17 +75,22 @@ void pci_setup(void)
 /* Resources assignable to PCI devices via BARs. */
 struct resource {
 uint64_t base, max;
-} *resource, mem_resource, high_mem_resource, io_resource;
+} *resource, mem_resource, high_mem_resource, io_resource, 
exp_mem_resource;
 
 /* Create a list of device BARs in descending order of size. */
 struct bars {
-uint32_t is_64bar;
+#define PCI_BAR_IS_64BIT0x1
+#define PCI_BAR_IS_ALLOCATED0x2
+uint32_t flag;
 uint32_t devfn;
 uint32_t bar_reg;
 uint64_t bar_sz;
 } *bars = (struct bars *)scratch_start;
-unsigned int i, nr_bars = 0;
-uint64_t mmio_hole_size = 0;
+unsigned int i, j, n, nr_bars = 0;
+uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
+bool bar32_allocating = 0;
+uint64_t mmio32_unallocated_total = 0;
+unsigned long

[Xen-devel] [v7][PATCH 07/16] hvmloader/e820: construct guest e820 table

2015-07-08 Thread Tiejun Chen
Now we can use that memory map to build our final
e820 table but it may need to reorder all e820
entries.

CC: Keir Fraser k...@xen.org
CC: Jan Beulich jbeul...@suse.com
CC: Andrew Cooper andrew.coop...@citrix.com
CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v5 ~ v7:

* Nothing is changed.

v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.
 
 tools/firmware/hvmloader/e820.c | 80 +
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 3e53c47..aa2569f 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint64_t add_high_mem = 0;
+uint64_t low_mem_end = hvm_info-low_mem_pgend  PAGE_SHIFT;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;
 
-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info-low_mem_pgend  PAGE_SHIFT)  (2u  20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info-low_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -194,16 +189,73 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
-
-if ( hvm_info-high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1  32);
-e820[nr].size =
-((uint64_t)hvm_info-high_mem_pgend  PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
+e820[nr] = memory_map.map[i];
 nr++;
 }
 
+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end  (2u  20));
+
+/*
+ * We may need to adjust real lowmem end since we may
+ * populate RAM to get enough MMIO previously.
+ */
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+uint64_t end = e820[i].addr + e820[i].size;
+if ( e820[i].type == E820_RAM 
+ low_mem_end  e820[i].addr  low_mem_end  end )
+{
+add_high_mem = end - low_mem_end;
+e820[i].size = low_mem_end - e820[i].addr;
+}
+}
+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+for ( i = 0; i  memory_map.nr_map; i++ )
+{
+if ( e820[i].type == E820_RAM 
+ e820[i].addr  (1ull  32))
+e820[i].size += add_high_mem;
+}
+}
+
+/* Finally we need to reorder all e820 entries. */
+for ( j = 0; j  nr-1; j++ )
+{
+for ( i = j+1; i  nr; i++ )
+{
+if ( e820[j].addr  e820[i].addr )
+{
+struct e820entry tmp;
+tmp = e820[j];
+e820[j] = e820[i];
+e820[i] = tmp;
+}
+}
+}
+
 return nr;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [v5][PATCH 12/16] tools: introduce a new parameter to set a predefined rdm boundary

2015-07-07 Thread Tiejun Chen
Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter rdm_mem_boundary in .cfg file.

CC: Ian Jackson ian.jack...@eu.citrix.com
CC: Stefano Stabellini stefano.stabell...@eu.citrix.com
CC: Ian Campbell ian.campb...@citrix.com
CC: Wei Liu wei.l...@citrix.com
Signed-off-by: Tiejun Chen tiejun.c...@intel.com
---
v5:

* Make this variable rdm_mem_boundary_memkb specific to .hvm 

v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5   | 22 ++
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_create.c  |  4 
 tools/libxl/libxl_dom.c |  8 +---
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c|  3 +++
 6 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a2dc343..d1ba590 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -870,6 +870,28 @@ More information about Xen gfx_passthru feature is 
available
 on the XenVGAPassthrough Lhttp://wiki.xen.org/wiki/XenVGAPassthrough
 wiki page.
 
+=item Brdm_mem_boundary=MBYTES
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+#1. Above a predefined boundary
+- move lowmem_end below reserved region to solve conflict;
+
+#2. Below a predefined boundary
+- Check strict/relaxed policy.
+strict policy leads to fail libxl. Note when both policies
+are specified on a given region, 'strict' is always preferred.
+relaxed policy issue a warning message and also mask this
+entry INVALID to indicate we shouldn't expose this entry to
+hvmloader.
+
+Here the default is 2G.
+
 =item Bdtdev=[ DTDEV_PATH, DTDEV_PATH, ... ]
 
 Specifies the host device tree nodes to passthrough to this guest. Each
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index a1c5d15..6f157c9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -863,6 +863,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
 uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 7a0c57d..38a8c3a 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, 
libxl_domain_build_info *b_info)
 {
 if (b_info-u.hvm.rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
 b_info-u.hvm.rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+
+if (b_info-u.hvm.rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+b_info-u.hvm.rdm_mem_boundary_memkb =
+LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index f3c39a0..62ef120 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 int ret, rc = ERROR_FAIL;
 uint64_t mmio_start, lowmem_end, highmem_end;
 libxl_domain_build_info *const info = d_config-b_info;
-/*
- * Currently we fix this as 2G to guarantte how to handle
- * our rdm policy. But we'll provide a parameter to set
- * this dynamically.
- */
-uint64_t rdm_mem_boundary = 0x8000;
 
 memset(args, 0, sizeof(struct xc_hvm_build_args));
 /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 args.mmio_start = mmio_start;
 
 ret = libxl__domain_device_construct_rdm(gc, d_config,
- rdm_mem_boundary,
+ 
info-u.hvm.rdm_mem_boundary_memkb*1024,
  args);
 if (ret) {
 LOG(ERROR, checking reserved device memory failed);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 1a8a180..a1b95db 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -484,6 +484,7 @@ libxl_domain_build_info = Struct(domain_build_info,[
(ms_vm_genid,  libxl_ms_vm_genid),
(serial_list

  1   2   >