Re: [Xen-devel] [PATCH v1] tools/hotplug: convert proc-xen.mount to proc-xen.service

2017-11-08 Thread Olaf Hering
On Wed, Nov 08, Wei Liu wrote:

> But is there really no way to ask nicely to see if systemd would accept
> a change in behaviour? That is, to make proc-xen.mount (or any attempt
> to mount API fs) a nop when xenfs is added to API file system.

I have considered that as well. If the failing unit is "proc-xen.mount"
and /proc/xen exists, just ignore the error. I will check if and how
that can be done.


Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] tools/hotplug: convert proc-xen.mount to proc-xen.service

2017-11-08 Thread Olaf Hering
On Thu, Oct 26, Olaf Hering wrote:

> > If not, then out-of-tree packages are going to have compatibility
> > problems with this change.
> Only if they use Requires=proc-xen.mount.

Any other objections to this change?

How to proceed with this?

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] tools/hotplug: convert proc-xen.mount to proc-xen.service

2017-10-26 Thread Olaf Hering
On Thu, Oct 26, Andrew Cooper wrote:

> I've never really understood why xenfs exists in the first place
> (although I expect the answer is "Because that is how someone did it in
> the past"), and I'm not aware of any other project which needs its own
> custom filesystem driver for device nodes.

Perhaps in the early days, before udev, new nodes would not magically
appear in /dev. It was likely easy to be compatible that way, just like
claiming /dev/hda to please existing installation programs.

> Is it possible to express a dependency on proc-xen.mount ||
> proc-xen.service?

As ordering yes, an additional After=proc-xen.service line is needed.
An existing Requires=proc-xen.mount can not be used anymore, I have not
verified that.

> If not, then out-of-tree packages are going to have compatibility
> problems with this change.

Only if they use Requires=proc-xen.mount.

> Right, but ISTR that Systemd deals with /etc/fstab by auto-generating
> *.mount targets, and from what is said here, it is the proc-xen.mount
> target which is now broken by the change in systemd behaviour.

No, existing fstab entries will continue to work. /dev/shm is
automounted, and my own fstab entry for /dev/shm always worked.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] tools/hotplug: convert proc-xen.mount to proc-xen.service

2017-10-26 Thread Olaf Hering
On Thu, Oct 26, Andrew Cooper wrote:

> Can't all information be obtained from /sys/hypervisor?  If not, how
> hard would it be to make happen?

Likely not that hard. Not sure why that was not added in the first place.

> What happens to all the software which currently has a dependency on
> proc-xen.mount ?

All software gets converted by this change.

> Independently, how does this interact with having a xenfs entries in
> /etc/fstab, which might plausibly still exist for compatibility with
> other init systems?

mount(1) will continue to consider them.


Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1] tools/hotplug: convert proc-xen.mount to proc-xen.service

2017-10-26 Thread Olaf Hering
An upcoming change in systemd will mount xenfs right away, along with
all other system mounts. This improves the detection of the
virtualization environment, which is currently racy. Some parts of
systemd rely on the presence of /proc/xen/capabilities, which will only
exist if xenfs is mounted. Since xenfs is mounted by the proc-xen.mount
unit, it will be processed very late. Other units may be processed
earlier, and if they make use of ConditionVirtualization*= failures may
occour.

Unfortunately mounting xenfs by systemd as an API filesystem will lead
to errors when proc-xen.mount is processed. Since that mount point
already exists the unit is considered as failed, and other units that
depend on proc-xen.mount will not start. To avoid this the existing
proc-xen.mount will be converted into proc-xen.service, which just
mounts xenfs manually. All dependencies are updated by this change.

The existing conditionals in proc-xen.mount will prevent failures with
existing systemd based installations:
ConditionPathExists=!/proc/xen/capabilities will prevent execution with
a new systemd that mounts xenfs. And this conditional, in combination
with ConditionPathExists=/proc/xen, will trigger execution with an old
systemd.

An absolute path to the mount binary has to be used. /bin/mount is
expected to be universally available, nowaways it is a symlink to
/usr/bin/mount.

Signed-off-by: Olaf Hering 
---

based on 4.10.0-rc2
Please run autogen.sh:

 tools/configure.ac| 2 +-
 tools/hotplug/Linux/systemd/Makefile  | 6 +++---
 .../Linux/systemd/{proc-xen.mount.in => proc-xen.service.in}  | 8 
 tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in| 4 ++--
 tools/hotplug/Linux/systemd/xen-init-dom0.service.in  | 4 ++--
 tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in | 4 ++--
 tools/hotplug/Linux/systemd/xen-watchdog.service.in   | 4 ++--
 tools/hotplug/Linux/systemd/xenconsoled.service.in| 4 ++--
 tools/hotplug/Linux/systemd/xendomains.service.in | 4 ++--
 tools/hotplug/Linux/systemd/xendriverdomain.service.in| 4 ++--
 tools/hotplug/Linux/systemd/xenstored.service.in  | 6 +++---
 11 files changed, 25 insertions(+), 25 deletions(-)
 rename tools/hotplug/Linux/systemd/{proc-xen.mount.in => proc-xen.service.in} 
(60%)

diff --git a/tools/configure.ac b/tools/configure.ac
index d1a3a78d87..7b18421fa0 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -441,7 +441,7 @@ AX_AVAILABLE_SYSTEMD()
 
 AS_IF([test "x$systemd" = "xy"], [
 AC_CONFIG_FILES([
-hotplug/Linux/systemd/proc-xen.mount
+hotplug/Linux/systemd/proc-xen.service
 hotplug/Linux/systemd/var-lib-xenstored.mount
 hotplug/Linux/systemd/xen-init-dom0.service
 hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service
diff --git a/tools/hotplug/Linux/systemd/Makefile 
b/tools/hotplug/Linux/systemd/Makefile
index a5d41d86ef..855ff3747f 100644
--- a/tools/hotplug/Linux/systemd/Makefile
+++ b/tools/hotplug/Linux/systemd/Makefile
@@ -3,10 +3,10 @@ include $(XEN_ROOT)/tools/Rules.mk
 
 XEN_SYSTEMD_MODULES = xen.conf
 
-XEN_SYSTEMD_MOUNT =  proc-xen.mount
-XEN_SYSTEMD_MOUNT += var-lib-xenstored.mount
+XEN_SYSTEMD_MOUNT  = var-lib-xenstored.mount
 
-XEN_SYSTEMD_SERVICE  = xenstored.service
+XEN_SYSTEMD_SERVICE  = proc-xen.service
+XEN_SYSTEMD_SERVICE += xenstored.service
 XEN_SYSTEMD_SERVICE += xenconsoled.service
 XEN_SYSTEMD_SERVICE += xen-qemu-dom0-disk-backend.service
 XEN_SYSTEMD_SERVICE += xendomains.service
diff --git a/tools/hotplug/Linux/systemd/proc-xen.mount.in 
b/tools/hotplug/Linux/systemd/proc-xen.service.in
similarity index 60%
rename from tools/hotplug/Linux/systemd/proc-xen.mount.in
rename to tools/hotplug/Linux/systemd/proc-xen.service.in
index 64ebe7f9b1..76f0097b75 100644
--- a/tools/hotplug/Linux/systemd/proc-xen.mount.in
+++ b/tools/hotplug/Linux/systemd/proc-xen.service.in
@@ -4,7 +4,7 @@ ConditionPathExists=/proc/xen
 ConditionPathExists=!/proc/xen/capabilities
 RefuseManualStop=true
 
-[Mount]
-What=xenfs
-Where=/proc/xen
-Type=xenfs
+[Service]
+Type=oneshot
+RemainAfterExit=true
+ExecStart=/bin/mount -t xenfs xenfs /proc/xen
diff --git a/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in 
b/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in
index 11a7d50edc..5d171f82e8 100644
--- a/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in
+++ b/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in
@@ -1,7 +1,7 @@
 [Unit]
 Description=mount xenstore file system
-Requires=proc-xen.mount
-After=proc-xen.mount
+Requires=proc-xen.service
+After=proc-xen.service
 ConditionPathExists=/proc/xen/capabilities
 RefuseManualStop=true
 
diff --git a/tools/hotplug/Linux/systemd/xen-init-dom0.service.in 
b/tools/hotplug/Linux/systemd/xen-init-dom0.service.in
index 3befadcea3..c560fbe1b7 100644
--- a/tools

Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest

2017-10-11 Thread Olaf Hering
On Wed, Oct 11, Olaf Hering wrote:

> -#define MAX_BATCH_SIZE 1024   /* up to 1024 pages (4MB) at a time */
> +#define MAX_BATCH_SIZE SUPERPAGE_1GB_NR_PFNS   /* up to 1GB at a time */

Actually the error is something else, I missed this in the debug output:

xc: error: Failed to get types for pfn batch (7 = Argument list too long): 
Internal error

write_batch() should probably split the requests when filling types[] because
Xen has "1024" hardcoded in XEN_DOMCTL_getpageframeinfo3...


Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v10 0/3] tools/libxc: use superpages

2017-10-11 Thread Olaf Hering
Using superpages on the receiving dom0 will avoid performance regressions.

Olaf

v10:
 coding style in xc_sr_bitmap API
 reset bitmap size on free
 check for empty bitmap in xc_sr_bitmap API
 add comment to struct x86_hvm_sp, keep the short name
 style and type changes in x86_hvm_punch_hole
 do not mark VGA hole as busy in x86_hvm_setup
 call decrease_reservation once for all pfns
 rename variable in x86_hvm_populate_pfns
 call decrease_reservation in 2MB chucks if possible

v9:
 update hole checking in x86_hvm_populate_pfns
 add out of bounds check to xc_sr_test_and_set/clear_bit
v8:
 remove double check of 1G/2M idx in x86_hvm_populate_pfns
v7:
 cover holes that span multiple superpages
v6:
 handle freeing of partly populated superpages correctly
 more DPRINTFs
v5:
 send correct version, rebase was not fully finished
v4:
 restore trailing "_bit" in bitmap function names
 keep track of gaps between previous and current batch
 split alloc functionality in x86_hvm_allocate_pfn
v3:
 clear pointer in xc_sr_bitmap_free
 some coding style changes
 use getdomaininfo.max_pages to avoid Over-allocation check
 trim bitmap function names, drop trailing "_bit"
 add some comments
v2:
 split into individual commits

based on staging c39cf093fc ("x86/asm: add .file directives")


Olaf Hering (3):
  tools/libxc: move SUPERPAGE macros to common header
  tools/libxc: add API for bitmap access for restore
  tools/libxc: use superpages during restore of HVM guest

 tools/libxc/xc_dom_x86.c|   5 -
 tools/libxc/xc_private.h|   5 +
 tools/libxc/xc_sr_common.c  |  41 +++
 tools/libxc/xc_sr_common.h  | 103 ++-
 tools/libxc/xc_sr_restore.c | 141 +-
 tools/libxc/xc_sr_restore_x86_hvm.c | 536 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 -
 7 files changed, 755 insertions(+), 148 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v10 3/3] tools/libxc: use superpages during restore of HVM guest

2017-10-11 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.h  |  30 +-
 tools/libxc/xc_sr_restore.c |  75 +
 tools/libxc/xc_sr_restore_x86_hvm.c | 536 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 -
 4 files changed, 635 insertions(+), 78 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a728c93e53..0477c20617 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -139,6 +139,15 @@ struct xc_sr_restore_ops
  */
 int (*setup)(struct xc_sr_context *ctx);
 
+/**
+ * Populate PFNs
+ *
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
 /**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
@@ -224,6 +233,8 @@ struct xc_sr_context
 
 int send_back_fd;
 unsigned long p2m_size;
+unsigned long max_pages;
+unsigned long tot_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
 /* From Image Header. */
@@ -336,6 +347,17 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
+xen_pfn_t idx1G_prev, idx2M_prev;
+
+/* List of PFNs for decrease_reservation */
+xen_pfn_t *extents;
+unsigned long max_extents;
+unsigned long nr_extents;
 } restore;
 };
 } x86_hvm;
@@ -460,14 +482,6 @@ static inline int write_record(struct xc_sr_context *ctx,
  */
 int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
 
-/*
- * This would ideally be private in restore.c, but is needed by
- * x86_pv_localise_page() if we receive pagetables frames ahead of the
- * contents of the frames they point at.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types);
-
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53948e1a6..8cd9289d1a 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
-*pfns = malloc(count * sizeof(*pfns));
-unsigned i, nr_pfns = 0;
-int rc = -1;
-
-if ( !mfns || !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
- !pfn_is_populated(ctx, original_pfns[i]) )
-{
-rc = pfn_set_populated(ctx, original_pfns[i]);
-if ( rc )
-goto err;
-pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-++nr_pfns;
-}
-}
-
-if ( nr_pfns )
-{
-rc = xc_domain_populate_physmap_exact(
-xch, ctx->domid, nr_pfns, 0, 0, mfns);
-if ( rc )
-{
-PERROR("Failed to populate physmap");
-goto err;
-}
-
-for ( i = 0; i < nr_pfns; ++i )
-{
-if ( mfns[i] == INVALID_MFN )
-{
-ERROR("Popu

[Xen-devel] [PATCH v10 2/3] tools/libxc: add API for bitmap access for restore

2017-10-11 Thread Olaf Hering
Extend API for managing bitmaps. Each bitmap is now represented by a
generic struct xc_sr_bitmap.
Switch the existing populated_pfns to this API.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_sr_common.c  | 41 +
 tools/libxc/xc_sr_common.h  | 73 +++--
 tools/libxc/xc_sr_restore.c | 66 ++--
 3 files changed, 115 insertions(+), 65 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 79b9c3e940..28c7be2b15 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void)
 BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8);
 }
 
+/*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if ( bits > bm->bits )
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if ( !p )
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..a728c93e53 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +261,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -343,6 +348,70 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if ( bits > bm->bits )
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free( bm->p );
+bm->bits = 0;
+bm->p = NULL;
+}
+
+static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if ( !xc_sr_bitmap_resize(bm, bit) )
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if ( bit > bm->bits || !bm->bits )
+return false;
+return !!test_bit(bit, bm->p);
+}
+
+static inline bool xc_sr_test_and_clear_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+if ( bit > bm->bits || !bm->bits )
+return false;
+return !!test_and_clear_bit(bit, bm->p);
+}
+
+static inline bool xc_sr_test_and_set_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+if ( bit > bm->bits || !bm->bits )
+return false;
+return !!test_and_set_bit(bit, bm->p);
+}
+
+static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns);
+}
+
+static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+xc_interface *xch = ctx->xch;
+
+if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) )
+{
+ERROR("Failed to realloc populated_pfns bitmap");
+errno = ENOMEM;
+return -1;
+}
+return 0;
+}
+
 struct xc_sr_record
 {
 uint32_t type;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index a016678332..d53948e1a6 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;

[Xen-devel] [PATCH v10 1/3] tools/libxc: move SUPERPAGE macros to common header

2017-10-11 Thread Olaf Hering
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by
other code in libxc. Move the macros to a header file.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_dom_x86.c | 5 -
 tools/libxc/xc_private.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..5aff5cad58 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -43,11 +43,6 @@
 
 #define SUPERPAGE_BATCH_SIZE 512
 
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 1c27b0fded..d581f850b0 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -66,6 +66,11 @@ struct iovec {
 #define DECLARE_FLASK_OP struct xen_flask_op op
 #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
 #undef PAGE_SHIFT
 #undef PAGE_SIZE
 #undef PAGE_MASK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest

2017-10-11 Thread Olaf Hering
On Fri, Sep 08, Olaf Hering wrote:

> A related question: is it save to increase MAX_BATCH_SIZE from 1024 to
> (256*1024) to transfer a whole gigabyte at a time? That way it will be
> easier to handle holes within a 1GB superpage.

To answer my own question:

This change leads to this error:

-#define MAX_BATCH_SIZE 1024   /* up to 1024 pages (4MB) at a time */
+#define MAX_BATCH_SIZE SUPERPAGE_1GB_NR_PFNS   /* up to 1GB at a time */

...
xc: info: Found x86 HVM domain from Xen 4.10
xc: detail: dom 9 p2m_size fee01 max_pages 100100
xc: info: Restoring domain
xc: error: Failed to read Record Header from stream (0 = Success): Internal 
error
xc: error: Restore failed (0 = Success): Internal error
...

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 1/2] xen-disk: use g_new0 to fix build

2017-09-25 Thread Olaf Hering
On Wed, Sep 20, Stefano Stabellini wrote:

> From: Olaf Hering 
> g_malloc0_n is available since glib-2.24. To allow build with older glib
> versions use the generic g_new0, which is already used in many other
> places in the code.
> Fixes commit 3284fad728 ("xen-disk: add support for multi-page shared rings")

In case this missed the release, please backport to the relevant stable
branches as well. Many thanks.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest

2017-09-08 Thread Olaf Hering
On Wed, Sep 06, Andrew Cooper wrote:

> The stream has always been in-order for the first pass (even in the
> legacy days), and I don't forsee that changing.  Reliance on the order
> was suggested by both myself and Jan during the early design.

A related question: is it save to increase MAX_BATCH_SIZE from 1024 to
(256*1024) to transfer a whole gigabyte at a time? That way it will be
easier to handle holes within a 1GB superpage.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest

2017-09-06 Thread Olaf Hering
On Wed, Sep 06, Andrew Cooper wrote:

> If a PVH guest has got MTRRs disabled, then it genuinely can run on an
> unshattered 1G superpage at 0.

Ok, the code will detect the holes and will release memory as needed. I
will drop these two lines.

Olaf



signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest

2017-09-06 Thread Olaf Hering
On Wed, Sep 06, Andrew Cooper wrote:

> I still fail to understand why you need the bitmaps at all?  You can
> calculate everything you need from the pfn list alone, which will also
> let you spot the presence or absence of the VGA hole.

These bitmaps track if a range has been allocated as superpage or not.
If there is a given pfn within a range of either 1G or 2M there might be
double allocation of a 1G or 2M page. This is not related to the VGA
hole. These two lines are just hints that in this range no superpage can
be allocated.

> You need to track which pfns you've see so far in the stream, and which
> pfns have been populated.  When you find holes in the pfns in the
> stream, you need to undo the prospective superpage allocation.  Unless
> I've missed something?

This is whats happening, holes will be created as soon as they are seen
in the stream.

> Also, please take care to use 2M decrease reservations wherever
> possible, or you will end up shattering the host superpage as part of
> trying to remove the memory.

This is what Wei suggested, build a list of pfns instead of releasing
each pfn individually. I think with this new code it should be possible
to decrease in 2M steps as needed.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest

2017-09-06 Thread Olaf Hering
On Wed, Sep 06, Andrew Cooper wrote:

> On 01/09/17 17:08, Olaf Hering wrote:
> > +/* No superpage in 1st 2MB due to VGA hole */
> > +xc_sr_set_bit(0, &ctx->x86_hvm.restore.attempted_1g);
> > +xc_sr_set_bit(0, &ctx->x86_hvm.restore.attempted_2m);
> This is false for PVH guests.

How can I detect a PVH guest?

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 2/3] tools/libxc: add API for bitmap access for restore

2017-09-06 Thread Olaf Hering
On Wed, Sep 06, Andrew Cooper wrote:

> On 01/09/17 17:08, Olaf Hering wrote:
> > +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t 
> > pfn)
> > +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t 
> > pfn)
> Why are these moved?  They are still restore specific.

There is no tools/libxc/xc_sr_restore.h, should I create one?

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest

2017-09-06 Thread Olaf Hering
Am Wed, 6 Sep 2017 12:34:10 +0100
schrieb Wei Liu :

> > +struct x86_hvm_sp {  
> Forgot to ask: what does sp stand for?

superpage. I will check if there is room to expand this string.

> > + * Try to allocate superpages.
> > + * This works without memory map only if the pfns arrive in incremental 
> > order.
> > + */  
> I have said several times, one way or another, I don't want to make
> assumption on the stream of pfns. So I'm afraid I can't ack a patch like
> this.

It will work with any order, I think. Just with incremental order the 
superpages will not be split once they are allocated.

Thanks for the review. I will send another series shortly.

Olaf


pgp5WsQzensvH.pgp
Description: Digitale Signatur von OpenPGP
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] libxc/bitops: correct comment for bitmap_size

2017-09-05 Thread Olaf Hering
The returned value represents now units of bytes instead of longs.

Fixes commit 11d0044a16 ("tools/libxc: Modify bitmap operations to take void 
pointers")

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_bitops.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxc/xc_bitops.h b/tools/libxc/xc_bitops.h
index 3e7a544c9d..0951e8267d 100644
--- a/tools/libxc/xc_bitops.h
+++ b/tools/libxc/xc_bitops.h
@@ -13,7 +13,7 @@
 #define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr) / 8]
 #define BITMAP_SHIFT(_nr) ((_nr) % 8)
 
-/* calculate required space for number of longs needed to hold nr_bits */
+/* calculate required space for number of bytes needed to hold nr_bits */
 static inline int bitmap_size(int nr_bits)
 {
 return (nr_bits + 7) / 8;

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest

2017-09-01 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.h  |  26 ++-
 tools/libxc/xc_sr_restore.c |  75 +---
 tools/libxc/xc_sr_restore_x86_hvm.c | 341 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +++-
 4 files changed, 436 insertions(+), 78 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 734320947a..93141a6e25 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -139,6 +139,16 @@ struct xc_sr_restore_ops
  */
 int (*setup)(struct xc_sr_context *ctx);
 
+/**
+ * Populate PFNs
+ *
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+
 /**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
@@ -224,6 +234,8 @@ struct xc_sr_context
 
 int send_back_fd;
 unsigned long p2m_size;
+unsigned long max_pages;
+unsigned long tot_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
 /* From Image Header. */
@@ -336,6 +348,12 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
+xen_pfn_t idx1G_prev, idx2M_prev;
 } restore;
 };
 } x86_hvm;
@@ -459,14 +477,6 @@ static inline int write_record(struct xc_sr_context *ctx,
  */
 int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
 
-/*
- * This would ideally be private in restore.c, but is needed by
- * x86_pv_localise_page() if we receive pagetables frames ahead of the
- * contents of the frames they point at.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types);
-
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53948e1a6..8cd9289d1a 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
-*pfns = malloc(count * sizeof(*pfns));
-unsigned i, nr_pfns = 0;
-int rc = -1;
-
-if ( !mfns || !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
- !pfn_is_populated(ctx, original_pfns[i]) )
-{
-rc = pfn_set_populated(ctx, original_pfns[i]);
-if ( rc )
-goto err;
-pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-++nr_pfns;
-}
-}
-
-if ( nr_pfns )
-{
-rc = xc_domain_populate_physmap_exact(
-xch, ctx->domid, nr_pfns, 0, 0, mfns);
-if ( rc )
-{
-PERROR("Failed to populate physmap");
-goto err;
-}
-
-for ( i = 0; i < nr_pfns; ++i )
-{
-if ( mfns[i] == INVALID_MFN )
-{
-ERROR("Populate physmap failed for pfn %u", i);
-rc = -1;
-goto err;
-}
-
-ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
-}
-}
-

[Xen-devel] [PATCH v9 1/3] tools/libxc: move SUPERPAGE macros to common header

2017-09-01 Thread Olaf Hering
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by
other code in libxc. Move the macros to a header file.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_dom_x86.c | 5 -
 tools/libxc/xc_private.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..5aff5cad58 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -43,11 +43,6 @@
 
 #define SUPERPAGE_BATCH_SIZE 512
 
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 1c27b0fded..d581f850b0 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -66,6 +66,11 @@ struct iovec {
 #define DECLARE_FLASK_OP struct xen_flask_op op
 #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
 #undef PAGE_SHIFT
 #undef PAGE_SIZE
 #undef PAGE_MASK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 0/3] tools/libxc: use superpages

2017-09-01 Thread Olaf Hering
Using superpages on the receiving dom0 will avoid performance regressions.

Olaf

v9:
 update hole checking in x86_hvm_populate_pfns
 add out of bounds check to xc_sr_test_and_set/clear_bit
v8:
 remove double check of 1G/2M idx in x86_hvm_populate_pfns
v7:
 cover holes that span multiple superpages
v6:
 handle freeing of partly populated superpages correctly
 more DPRINTFs
v5:
 send correct version, rebase was not fully finished
v4:
 restore trailing "_bit" in bitmap function names
 keep track of gaps between previous and current batch
 split alloc functionality in x86_hvm_allocate_pfn
v3:
 clear pointer in xc_sr_bitmap_free
 some coding style changes
 use getdomaininfo.max_pages to avoid Over-allocation check
 trim bitmap function names, drop trailing "_bit"
 add some comments
v2:
 split into individual commits

based on staging c39cf093fc ("x86/asm: add .file directives")


Olaf Hering (3):
  tools/libxc: move SUPERPAGE macros to common header
  tools/libxc: add API for bitmap access for restore
  tools/libxc: use superpages during restore of HVM guest

 tools/libxc/xc_dom_x86.c|   5 -
 tools/libxc/xc_private.h|   5 +
 tools/libxc/xc_sr_common.c  |  41 +
 tools/libxc/xc_sr_common.h  |  98 +--
 tools/libxc/xc_sr_restore.c | 141 +--
 tools/libxc/xc_sr_restore_x86_hvm.c | 341 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +++-
 7 files changed, 555 insertions(+), 148 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v9 2/3] tools/libxc: add API for bitmap access for restore

2017-09-01 Thread Olaf Hering
Extend API for managing bitmaps. Each bitmap is now represented by a
generic struct xc_sr_bitmap.
Switch the existing populated_pfns to this API.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_sr_common.c  | 41 ++
 tools/libxc/xc_sr_common.h  | 72 +++--
 tools/libxc/xc_sr_restore.c | 66 ++---
 3 files changed, 114 insertions(+), 65 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 79b9c3e940..4d221ca90c 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void)
 BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8);
 }
 
+/*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if (bits > bm->bits)
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if (!p)
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..734320947a 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +261,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -343,6 +348,69 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if (bits > bm->bits)
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free(bm->p);
+bm->p = NULL;
+}
+
+static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (!xc_sr_bitmap_resize(bm, bit))
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_bit(bit, bm->p);
+}
+
+static inline bool xc_sr_test_and_clear_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_and_clear_bit(bit, bm->p);
+}
+
+static inline bool xc_sr_test_and_set_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_and_set_bit(bit, bm->p);
+}
+
+static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns);
+}
+
+static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+xc_interface *xch = ctx->xch;
+
+if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) )
+{
+ERROR("Failed to realloc populated_pfns bitmap");
+errno = ENOMEM;
+return -1;
+}
+return 0;
+}
+
 struct xc_sr_record
 {
 uint32_t type;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index a016678332..d53948e1a6 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Is a pfn populated?
- */
-static bool pfn_is_populated(const st

Re: [Xen-devel] [PATCH v8 3/3] tools/libxc: use superpages during restore of HVM guest

2017-09-01 Thread Olaf Hering
On Fri, Sep 01, Olaf Hering wrote:

> +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count,

> +/*
> + * If this next pfn is within another 1GB superpage it is 
> required
> + * to scan the entire previous superpage because there might be
> + * holes between max_pfn and the end of the superpage.
> + */
> +if ( idx1G_prev != idx1G )
> +{
> +order = SUPERPAGE_1GB_SHIFT;
> +max_pfn = (((max_pfn >> order) + 1) << order) - 1;
> +}
> +if ( x86_hvm_punch_hole(ctx, max_pfn) == false )


And thinking about this part: with this variant it is still possible
that Over-allocation happens. If the previous pfn was within a 2MB
range, and this pfn is in another 2MB range, then the hole after max_pfn
would not be covered. This part needs an 'else' with
SUPERPAGE_2MB_SHIFT.

This "reset to max" may trigger a bug in xc_sr_test_and_clear_bit(). It
has to check the size of the bitmap, just as xc_sr_test_bit() does.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 2/3] tools/libxc: add API for bitmap access for restore

2017-09-01 Thread Olaf Hering
Extend API for managing bitmaps. Each bitmap is now represented by a
generic struct xc_sr_bitmap.
Switch the existing populated_pfns to this API.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_sr_common.c  | 41 +++
 tools/libxc/xc_sr_common.h  | 68 +++--
 tools/libxc/xc_sr_restore.c | 66 ++-
 3 files changed, 110 insertions(+), 65 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 79b9c3e940..4d221ca90c 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void)
 BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8);
 }
 
+/*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if (bits > bm->bits)
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if (!p)
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..da2691ba79 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +261,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if (bits > bm->bits)
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free(bm->p);
+bm->p = NULL;
+}
+
+static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (!xc_sr_bitmap_resize(bm, bit))
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+return test_and_clear_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_set_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+return test_and_set_bit(bit, bm->p);
+}
+
+static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns);
+}
+
+static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+xc_interface *xch = ctx->xch;
+
+if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) )
+{
+ERROR("Failed to realloc populated_pfns bitmap");
+errno = ENOMEM;
+return -1;
+}
+return 0;
+}
+
 struct xc_sr_record
 {
 uint32_t type;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index a016678332..d53948e1a6 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Is a pfn populated?
- */
-static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn)
-{
-if ( pfn > ctx->restore.max_populated_pfn )
-

[Xen-devel] [PATCH v8 3/3] tools/libxc: use superpages during restore of HVM guest

2017-09-01 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.h  |  25 ++-
 tools/libxc/xc_sr_restore.c |  75 +---
 tools/libxc/xc_sr_restore_x86_hvm.c | 337 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +++-
 4 files changed, 431 insertions(+), 78 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index da2691ba79..0fa0fbea4d 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -139,6 +139,16 @@ struct xc_sr_restore_ops
  */
 int (*setup)(struct xc_sr_context *ctx);
 
+/**
+ * Populate PFNs
+ *
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+
 /**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
@@ -224,6 +234,8 @@ struct xc_sr_context
 
 int send_back_fd;
 unsigned long p2m_size;
+unsigned long max_pages;
+unsigned long tot_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
 /* From Image Header. */
@@ -336,6 +348,11 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
 } restore;
 };
 } x86_hvm;
@@ -455,14 +472,6 @@ static inline int write_record(struct xc_sr_context *ctx,
  */
 int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
 
-/*
- * This would ideally be private in restore.c, but is needed by
- * x86_pv_localise_page() if we receive pagetables frames ahead of the
- * contents of the frames they point at.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types);
-
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53948e1a6..8cd9289d1a 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
-*pfns = malloc(count * sizeof(*pfns));
-unsigned i, nr_pfns = 0;
-int rc = -1;
-
-if ( !mfns || !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
- !pfn_is_populated(ctx, original_pfns[i]) )
-{
-rc = pfn_set_populated(ctx, original_pfns[i]);
-if ( rc )
-goto err;
-pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-++nr_pfns;
-}
-}
-
-if ( nr_pfns )
-{
-rc = xc_domain_populate_physmap_exact(
-xch, ctx->domid, nr_pfns, 0, 0, mfns);
-if ( rc )
-{
-PERROR("Failed to populate physmap");
-goto err;
-}
-
-for ( i = 0; i < nr_pfns; ++i )
-{
-if ( mfns[i] == INVALID_MFN )
-{
-ERROR("Populate physmap failed for pfn %u", i);
-rc = -1;
-goto err;
-}
-
-ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
-}
-}
-
-rc = 0;
-
- err:
-free(pfns);
-free(mfns);

[Xen-devel] [PATCH v8 0/3] tools/libxc: use superpages

2017-09-01 Thread Olaf Hering
Using superpages on the receiving dom0 will avoid performance regressions.

Olaf

v8:
 remove double check of 1G/2M idx in x86_hvm_populate_pfns
v7:
 cover holes that span multiple superpages
v6:
 handle freeing of partly populated superpages correctly
 more DPRINTFs
v5:
 send correct version, rebase was not fully finished
v4:
 restore trailing "_bit" in bitmap function names
 keep track of gaps between previous and current batch
 split alloc functionality in x86_hvm_allocate_pfn
v3:
 clear pointer in xc_sr_bitmap_free
 some coding style changes
 use getdomaininfo.max_pages to avoid Over-allocation check
 trim bitmap function names, drop trailing "_bit"
 add some comments
v2:
 split into individual commits

based on staging c39cf093fc ("x86/asm: add .file directives")

Olaf Hering (3):
  tools/libxc: move SUPERPAGE macros to common header
  tools/libxc: add API for bitmap access for restore
  tools/libxc: use superpages during restore of HVM guest

 tools/libxc/xc_dom_x86.c|   5 -
 tools/libxc/xc_private.h|   5 +
 tools/libxc/xc_sr_common.c  |  41 +
 tools/libxc/xc_sr_common.h  |  93 --
 tools/libxc/xc_sr_restore.c | 141 +--
 tools/libxc/xc_sr_restore_x86_hvm.c | 337 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +++-
 7 files changed, 546 insertions(+), 148 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v8 1/3] tools/libxc: move SUPERPAGE macros to common header

2017-09-01 Thread Olaf Hering
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by
other code in libxc. Move the macros to a header file.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_dom_x86.c | 5 -
 tools/libxc/xc_private.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..5aff5cad58 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -43,11 +43,6 @@
 
 #define SUPERPAGE_BATCH_SIZE 512
 
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 1c27b0fded..d581f850b0 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -66,6 +66,11 @@ struct iovec {
 #define DECLARE_FLASK_OP struct xen_flask_op op
 #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
 #undef PAGE_SHIFT
 #undef PAGE_SIZE
 #undef PAGE_MASK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v7 2/3] tools/libxc: add API for bitmap access for restore

2017-08-31 Thread Olaf Hering
Extend API for managing bitmaps. Each bitmap is now represented by a
generic struct xc_sr_bitmap.
Switch the existing populated_pfns to this API.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_sr_common.c  | 41 +++
 tools/libxc/xc_sr_common.h  | 68 +++--
 tools/libxc/xc_sr_restore.c | 66 ++-
 3 files changed, 110 insertions(+), 65 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 79b9c3e940..4d221ca90c 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void)
 BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8);
 }
 
+/*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if (bits > bm->bits)
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if (!p)
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..da2691ba79 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +261,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if (bits > bm->bits)
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free(bm->p);
+bm->p = NULL;
+}
+
+static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (!xc_sr_bitmap_resize(bm, bit))
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+return test_and_clear_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_set_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+return test_and_set_bit(bit, bm->p);
+}
+
+static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns);
+}
+
+static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+xc_interface *xch = ctx->xch;
+
+if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) )
+{
+ERROR("Failed to realloc populated_pfns bitmap");
+errno = ENOMEM;
+return -1;
+}
+return 0;
+}
+
 struct xc_sr_record
 {
 uint32_t type;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index a016678332..d53948e1a6 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Is a pfn populated?
- */
-static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn)
-{
-if ( pfn > ctx->restore.max_populated_pfn )
-

[Xen-devel] [PATCH v7 1/3] tools/libxc: move SUPERPAGE macros to common header

2017-08-31 Thread Olaf Hering
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by
other code in libxc. Move the macros to a header file.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_dom_x86.c | 5 -
 tools/libxc/xc_private.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..5aff5cad58 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -43,11 +43,6 @@
 
 #define SUPERPAGE_BATCH_SIZE 512
 
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 1c27b0fded..d581f850b0 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -66,6 +66,11 @@ struct iovec {
 #define DECLARE_FLASK_OP struct xen_flask_op op
 #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
 #undef PAGE_SHIFT
 #undef PAGE_SIZE
 #undef PAGE_MASK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v7 0/3] tools/libxc: use superpages

2017-08-31 Thread Olaf Hering
Using superpages on the receiving dom0 will avoid performance regressions.

Olaf

v7:
 cover holes that span multiple superpages
v6:
 handle freeing of partly populated superpages correctly
 more DPRINTFs
v5:
 send correct version, rebase was not fully finished
v4:
 restore trailing "_bit" in bitmap function names
 keep track of gaps between previous and current batch
 split alloc functionality in x86_hvm_allocate_pfn
v3:
 clear pointer in xc_sr_bitmap_free
 some coding style changes
 use getdomaininfo.max_pages to avoid Over-allocation check
 trim bitmap function names, drop trailing "_bit"
 add some comments
v2:
 split into individual commits


Olaf Hering (3):
  tools/libxc: move SUPERPAGE macros to common header
  tools/libxc: add API for bitmap access for restore
  tools/libxc: use superpages during restore of HVM guest

 tools/libxc/xc_dom_x86.c|   5 -
 tools/libxc/xc_private.h|   5 +
 tools/libxc/xc_sr_common.c  |  41 +
 tools/libxc/xc_sr_common.h  |  93 --
 tools/libxc/xc_sr_restore.c | 141 +--
 tools/libxc/xc_sr_restore_x86_hvm.c | 340 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +++-
 7 files changed, 549 insertions(+), 148 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v7 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-31 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.h  |  25 ++-
 tools/libxc/xc_sr_restore.c |  75 +---
 tools/libxc/xc_sr_restore_x86_hvm.c | 340 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +++-
 4 files changed, 434 insertions(+), 78 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index da2691ba79..0fa0fbea4d 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -139,6 +139,16 @@ struct xc_sr_restore_ops
  */
 int (*setup)(struct xc_sr_context *ctx);
 
+/**
+ * Populate PFNs
+ *
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+
 /**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
@@ -224,6 +234,8 @@ struct xc_sr_context
 
 int send_back_fd;
 unsigned long p2m_size;
+unsigned long max_pages;
+unsigned long tot_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
 /* From Image Header. */
@@ -336,6 +348,11 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
 } restore;
 };
 } x86_hvm;
@@ -455,14 +472,6 @@ static inline int write_record(struct xc_sr_context *ctx,
  */
 int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
 
-/*
- * This would ideally be private in restore.c, but is needed by
- * x86_pv_localise_page() if we receive pagetables frames ahead of the
- * contents of the frames they point at.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types);
-
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53948e1a6..8cd9289d1a 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
-*pfns = malloc(count * sizeof(*pfns));
-unsigned i, nr_pfns = 0;
-int rc = -1;
-
-if ( !mfns || !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
- !pfn_is_populated(ctx, original_pfns[i]) )
-{
-rc = pfn_set_populated(ctx, original_pfns[i]);
-if ( rc )
-goto err;
-pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-++nr_pfns;
-}
-}
-
-if ( nr_pfns )
-{
-rc = xc_domain_populate_physmap_exact(
-xch, ctx->domid, nr_pfns, 0, 0, mfns);
-if ( rc )
-{
-PERROR("Failed to populate physmap");
-goto err;
-}
-
-for ( i = 0; i < nr_pfns; ++i )
-{
-if ( mfns[i] == INVALID_MFN )
-{
-ERROR("Populate physmap failed for pfn %u", i);
-rc = -1;
-goto err;
-}
-
-ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
-}
-}
-
-rc = 0;
-
- err:
-free(pfns);
-free(mfns);

[Xen-devel] ballooning specific PFNs in a HVM domU

2017-08-31 Thread Olaf Hering
Does the Linux kernel provide an API to claim specific pages? Right now
it just does alloc_page(), which I think returns any random page that
happens to be unused.

I want to create a specific memory layout with holes to verify my
migration patches.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-30 Thread Olaf Hering
On Wed, Aug 30, Wei Liu wrote:

> > Can this actually happen with the available senders? If not, this is
> > again the missing memory map.
> Probably not now, but as said, you shouldn't rely on the structure of
> the stream unless it is stated in the spec.

Well, what can happen with todays implementation on the sender side is
the case of a ballooned guest with enough holes within a batch. These
will trigger 1G allocations before the releasing of memory happens. To
solve this, the releasing of memory has to happen more often, probably
after crossing each 2M boundary.


Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-30 Thread Olaf Hering
On Wed, Aug 30, Wei Liu wrote:

> As far as I can tell the algorithm in the patch can't handle:
> 
> 1. First pfn in a batch points to start of second 1G address space
> 2. Second pfn in a batch points to a page in the middle of first 1G
> 3. Guest can only use 1G ram

In which way does it not handle it? Over-allocation is supposed to be
handled by the "ctx->restore.tot_pages + sp->count >
ctx->restore.max_pages" checks. Do you mean the second 1G is allocated,
then max_pages is reached, and allocation in other areas is not possible
anymore?
Can this actually happen with the available senders? If not, this is
again the missing memory map.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-29 Thread Olaf Hering
On Sat, Aug 26, Olaf Hering wrote:

> +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count,

> +/*
> + * Scan the entire superpage because several batches will fit into
> + * a superpage, and it is unknown which pfn triggered the allocation.
> + */
> +order = SUPERPAGE_1GB_SHIFT;
> +pfn = min_pfn = (min_pfn >> order) << order;

Scanning an entire superpage again and again looked expensive, but with
the debug change below it turned out that the loop which peeks at each
single bit in populated_pfns is likely not a bootleneck.

Migrating a domU with a simple workload that touches pages to mark them
dirty will set the min_pfn/max_pfn to a large range anyway after the
first iteration. This large range may also happen with an idle domU. A
small domU takes 78 seconds to migrate, and just the freeing part takes
1.4 seconds. Similar for a large domain, the loop takes 1% of the time.

 78 seconds, 1.4 seconds, 2119 calls  (8GB, 12*512M memdirty)
695 seconds, 7.6 seconds, 18076 calls (72GB, 12*5G memdirty)

Olaf

track time spent if decrease_reservation is needed

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 0fa0fbea4d..5ec8b6fee6 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -353,6 +353,9 @@ struct xc_sr_context
 struct xc_sr_bitmap attempted_1g;
 struct xc_sr_bitmap attempted_2m;
 struct xc_sr_bitmap allocated_pfns;
+
+unsigned long tv_nsec;
+unsigned long iterations;
 } restore;
 };
 } x86_hvm;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 8cd9289d1a..f6aad329e2 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -769,6 +769,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
uint32_t dom,
 {
 ctx.restore.ops = restore_ops_x86_hvm;
 if ( restore(&ctx) )
+;
 return -1;
 }
 else
diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c 
b/tools/libxc/xc_sr_restore_x86_hvm.c
index 2b0eca0c7c..11758b3f7d 100644
--- a/tools/libxc/xc_sr_restore_x86_hvm.c
+++ b/tools/libxc/xc_sr_restore_x86_hvm.c
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 
 #include "xc_sr_common_x86.h"
 
@@ -248,6 +249,12 @@ static int x86_hvm_stream_complete(struct xc_sr_context 
*ctx)
 
 static int x86_hvm_cleanup(struct xc_sr_context *ctx)
 {
+xc_interface *xch = ctx->xch;
+errno = 0;
+PERROR("tv_nsec %lu.%lu iterations %lu",
+ctx->x86_hvm.restore.tv_nsec / 10UL,
+ctx->x86_hvm.restore.tv_nsec % 10UL,
+ctx->x86_hvm.restore.iterations);
 free(ctx->x86_hvm.restore.context);
 xc_sr_bitmap_free(&ctx->x86_hvm.restore.attempted_1g);
 xc_sr_bitmap_free(&ctx->x86_hvm.restore.attempted_2m);
@@ -440,6 +447,28 @@ static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, 
xen_pfn_t pfn)
 return rc;
 }
 
+static void diff_timespec(struct xc_sr_context *ctx, const struct timespec 
*old, const struct timespec *new, struct timespec *diff)
+{
+xc_interface *xch = ctx->xch;
+if (new->tv_sec == old->tv_sec && new->tv_nsec == old->tv_nsec)
+PERROR("%s: time did not move: %ld/%ld == %ld/%ld", __func__, 
old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec);
+if ( (new->tv_sec < old->tv_sec) || (new->tv_sec == old->tv_sec && 
new->tv_nsec < old->tv_nsec) )
+{
+PERROR("%s: time went backwards: %ld/%ld -> %ld/%ld", __func__, 
old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec);
+diff->tv_sec = diff->tv_nsec = 0;
+return;
+}
+if ((new->tv_nsec - old->tv_nsec) < 0) {
+diff->tv_sec = new->tv_sec - old->tv_sec - 1;
+diff->tv_nsec = new->tv_nsec - old->tv_nsec + 10UL;
+} else {
+diff->tv_sec = new->tv_sec - old->tv_sec;
+diff->tv_nsec = new->tv_nsec - old->tv_nsec;
+}
+if (diff->tv_sec < 0)
+PERROR("%s: time diff broken. old: %ld/%ld new: %ld/%ld diff: %ld/%ld 
", __func__, old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec, 
diff->tv_sec, diff->tv_nsec);
+}
+
 static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count,
  const xen_pfn_t *original_pfns,
  const uint32_t *types)
@@ -448,6 +477,7 @@ static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, 
unsigned count,
 xen_pfn_t pfn, min_pfn = original_pfns[0], max_pfn = original_pfns[0];
 unsigned i, freed = 0, order;
 int rc = -1;
+struct timespec a, b, d;
 
 for ( i = 0; i < count; ++i )
   

[Xen-devel] [PATCH v6 0/3] tools/libxc: use superpages

2017-08-26 Thread Olaf Hering
Using superpages on the receiving dom0 will avoid performance regressions.

Olaf

v6:
 handle freeing of partly populated superpages correctly
 more DPRINTFs
v5:
 send correct version, rebase was not fully finished
v4:
 restore trailing "_bit" in bitmap function names
 keep track of gaps between previous and current batch
 split alloc functionality in x86_hvm_allocate_pfn
v3:
 clear pointer in xc_sr_bitmap_free
 some coding style changes
 use getdomaininfo.max_pages to avoid Over-allocation check
 trim bitmap function names, drop trailing "_bit"
 add some comments
v2:
 split into individual commits


Olaf Hering (3):
  tools/libxc: move SUPERPAGE macros to common header
  tools/libxc: add API for bitmap access for restore
  tools/libxc: use superpages during restore of HVM guest

 tools/libxc/xc_dom_x86.c|   5 -
 tools/libxc/xc_private.h|   5 +
 tools/libxc/xc_sr_common.c  |  41 +
 tools/libxc/xc_sr_common.h  |  93 ++--
 tools/libxc/xc_sr_restore.c | 141 ++
 tools/libxc/xc_sr_restore_x86_hvm.c | 288 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 -
 7 files changed, 497 insertions(+), 148 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 1/3] tools/libxc: move SUPERPAGE macros to common header

2017-08-26 Thread Olaf Hering
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by
other code in libxc. Move the macros to a header file.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_dom_x86.c | 5 -
 tools/libxc/xc_private.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..5aff5cad58 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -43,11 +43,6 @@
 
 #define SUPERPAGE_BATCH_SIZE 512
 
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 1c27b0fded..d581f850b0 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -66,6 +66,11 @@ struct iovec {
 #define DECLARE_FLASK_OP struct xen_flask_op op
 #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
 #undef PAGE_SHIFT
 #undef PAGE_SIZE
 #undef PAGE_MASK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 2/3] tools/libxc: add API for bitmap access for restore

2017-08-26 Thread Olaf Hering
Extend API for managing bitmaps. Each bitmap is now represented by a
generic struct xc_sr_bitmap.
Switch the existing populated_pfns to this API.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_sr_common.c  | 41 +++
 tools/libxc/xc_sr_common.h  | 68 +++--
 tools/libxc/xc_sr_restore.c | 66 ++-
 3 files changed, 110 insertions(+), 65 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 79b9c3e940..4d221ca90c 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void)
 BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8);
 }
 
+/*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if (bits > bm->bits)
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if (!p)
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..da2691ba79 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +261,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if (bits > bm->bits)
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free(bm->p);
+bm->p = NULL;
+}
+
+static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (!xc_sr_bitmap_resize(bm, bit))
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+return test_and_clear_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_set_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+return test_and_set_bit(bit, bm->p);
+}
+
+static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns);
+}
+
+static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+xc_interface *xch = ctx->xch;
+
+if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) )
+{
+ERROR("Failed to realloc populated_pfns bitmap");
+errno = ENOMEM;
+return -1;
+}
+return 0;
+}
+
 struct xc_sr_record
 {
 uint32_t type;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index a016678332..d53948e1a6 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Is a pfn populated?
- */
-static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn)
-{
-if ( pfn > ctx->restore.max_populated_pfn )
-

[Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-26 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.h  |  25 +++-
 tools/libxc/xc_sr_restore.c |  75 +-
 tools/libxc/xc_sr_restore_x86_hvm.c | 288 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 -
 4 files changed, 382 insertions(+), 78 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index da2691ba79..0fa0fbea4d 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -139,6 +139,16 @@ struct xc_sr_restore_ops
  */
 int (*setup)(struct xc_sr_context *ctx);
 
+/**
+ * Populate PFNs
+ *
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+
 /**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
@@ -224,6 +234,8 @@ struct xc_sr_context
 
 int send_back_fd;
 unsigned long p2m_size;
+unsigned long max_pages;
+unsigned long tot_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
 /* From Image Header. */
@@ -336,6 +348,11 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
 } restore;
 };
 } x86_hvm;
@@ -455,14 +472,6 @@ static inline int write_record(struct xc_sr_context *ctx,
  */
 int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
 
-/*
- * This would ideally be private in restore.c, but is needed by
- * x86_pv_localise_page() if we receive pagetables frames ahead of the
- * contents of the frames they point at.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types);
-
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53948e1a6..8cd9289d1a 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
-*pfns = malloc(count * sizeof(*pfns));
-unsigned i, nr_pfns = 0;
-int rc = -1;
-
-if ( !mfns || !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
- !pfn_is_populated(ctx, original_pfns[i]) )
-{
-rc = pfn_set_populated(ctx, original_pfns[i]);
-if ( rc )
-goto err;
-pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-++nr_pfns;
-}
-}
-
-if ( nr_pfns )
-{
-rc = xc_domain_populate_physmap_exact(
-xch, ctx->domid, nr_pfns, 0, 0, mfns);
-if ( rc )
-{
-PERROR("Failed to populate physmap");
-goto err;
-}
-
-for ( i = 0; i < nr_pfns; ++i )
-{
-if ( mfns[i] == INVALID_MFN )
-{
-ERROR("Populate physmap failed for pfn %u", i);
-rc = -1;
-goto err;
-}
-
-ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
-}
-}
-
-rc = 0;
-
- err:
-free(pfns);
-free(mf

Re: [Xen-devel] [PATCH v5 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-25 Thread Olaf Hering
On Fri, Aug 25, Olaf Hering wrote:

> +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count,
> + const xen_pfn_t *original_pfns,
> + const uint32_t *types)
> +{

> +while ( min_pfn < max_pfn )

Beside this off-by-one error, there is still a bug in accounting
somewhere. Ballooned guests sometimes fails due to allocation errors.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 0/3] tools/libxc: use superpages

2017-08-25 Thread Olaf Hering
Using superpages on the receiving dom0 will avoid performance regressions.

Olaf

v5:
 send correct version, rebase was not fully finished
v4:
 restore trailing "_bit" in bitmap function names
 keep track of gaps between previous and current batch
 split alloc functionality in x86_hvm_allocate_pfn
v3:
 clear pointer in xc_sr_bitmap_free
 some coding style changes
 use getdomaininfo.max_pages to avoid Over-allocation check
 trim bitmap function names, drop trailing "_bit"
 add some comments
v2:
 split into individual commits


Olaf Hering (3):
  tools/libxc: move SUPERPAGE macros to common header
  tools/libxc: add API for bitmap access for restore
  tools/libxc: use superpages during restore of HVM guest

 tools/libxc/xc_dom_x86.c|   5 -
 tools/libxc/xc_private.h|   5 +
 tools/libxc/xc_sr_common.c  |  41 ++
 tools/libxc/xc_sr_common.h  |  94 ++--
 tools/libxc/xc_sr_restore.c | 141 ++
 tools/libxc/xc_sr_restore_x86_hvm.c | 276 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +-
 7 files changed, 486 insertions(+), 148 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-25 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.h  |  26 ++--
 tools/libxc/xc_sr_restore.c |  75 +-
 tools/libxc/xc_sr_restore_x86_hvm.c | 276 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +-
 4 files changed, 371 insertions(+), 78 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index da2691ba79..26526d8896 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -139,6 +139,16 @@ struct xc_sr_restore_ops
  */
 int (*setup)(struct xc_sr_context *ctx);
 
+/**
+ * Populate PFNs
+ *
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+
 /**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
@@ -224,6 +234,8 @@ struct xc_sr_context
 
 int send_back_fd;
 unsigned long p2m_size;
+unsigned long max_pages;
+unsigned long tot_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
 /* From Image Header. */
@@ -336,6 +348,12 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
+xen_pfn_t min_pfn;
 } restore;
 };
 } x86_hvm;
@@ -455,14 +473,6 @@ static inline int write_record(struct xc_sr_context *ctx,
  */
 int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
 
-/*
- * This would ideally be private in restore.c, but is needed by
- * x86_pv_localise_page() if we receive pagetables frames ahead of the
- * contents of the frames they point at.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types);
-
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53948e1a6..8cd9289d1a 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
-*pfns = malloc(count * sizeof(*pfns));
-unsigned i, nr_pfns = 0;
-int rc = -1;
-
-if ( !mfns || !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
- !pfn_is_populated(ctx, original_pfns[i]) )
-{
-rc = pfn_set_populated(ctx, original_pfns[i]);
-if ( rc )
-goto err;
-pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-++nr_pfns;
-}
-}
-
-if ( nr_pfns )
-{
-rc = xc_domain_populate_physmap_exact(
-xch, ctx->domid, nr_pfns, 0, 0, mfns);
-if ( rc )
-{
-PERROR("Failed to populate physmap");
-goto err;
-}
-
-for ( i = 0; i < nr_pfns; ++i )
-{
-if ( mfns[i] == INVALID_MFN )
-{
-ERROR("Populate physmap failed for pfn %u", i);
-rc = -1;
-goto err;
-}
-
-ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
-}
-}
-
-  

[Xen-devel] [PATCH v5 2/3] tools/libxc: add API for bitmap access for restore

2017-08-25 Thread Olaf Hering
Extend API for managing bitmaps. Each bitmap is now represented by a
generic struct xc_sr_bitmap.
Switch the existing populated_pfns to this API.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_sr_common.c  | 41 +++
 tools/libxc/xc_sr_common.h  | 68 +++--
 tools/libxc/xc_sr_restore.c | 66 ++-
 3 files changed, 110 insertions(+), 65 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 79b9c3e940..4d221ca90c 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void)
 BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8);
 }
 
+/*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if (bits > bm->bits)
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if (!p)
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..da2691ba79 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +261,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if (bits > bm->bits)
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free(bm->p);
+bm->p = NULL;
+}
+
+static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (!xc_sr_bitmap_resize(bm, bit))
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+return test_and_clear_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_set_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+return test_and_set_bit(bit, bm->p);
+}
+
+static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns);
+}
+
+static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+xc_interface *xch = ctx->xch;
+
+if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) )
+{
+ERROR("Failed to realloc populated_pfns bitmap");
+errno = ENOMEM;
+return -1;
+}
+return 0;
+}
+
 struct xc_sr_record
 {
 uint32_t type;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index a016678332..d53948e1a6 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Is a pfn populated?
- */
-static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn)
-{
-if ( pfn > ctx->restore.max_populated_pfn )
-

[Xen-devel] [PATCH v5 1/3] tools/libxc: move SUPERPAGE macros to common header

2017-08-25 Thread Olaf Hering
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by
other code in libxc. Move the macros to a header file.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_dom_x86.c | 5 -
 tools/libxc/xc_private.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..5aff5cad58 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -43,11 +43,6 @@
 
 #define SUPERPAGE_BATCH_SIZE 512
 
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 1c27b0fded..d581f850b0 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -66,6 +66,11 @@ struct iovec {
 #define DECLARE_FLASK_OP struct xen_flask_op op
 #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
 #undef PAGE_SHIFT
 #undef PAGE_SIZE
 #undef PAGE_MASK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 2/3] tools/libxc: add API for bitmap access for restore

2017-08-25 Thread Olaf Hering
Extend API for managing bitmaps. Each bitmap is now represented by a
generic struct xc_sr_bitmap.
Switch the existing populated_pfns to this API.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_sr_common.c  | 41 +++
 tools/libxc/xc_sr_common.h  | 68 +++--
 tools/libxc/xc_sr_restore.c | 66 ++-
 3 files changed, 110 insertions(+), 65 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 79b9c3e940..4d221ca90c 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void)
 BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8);
 }
 
+/*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if (bits > bm->bits)
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if (!p)
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..8901af112a 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +261,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if (bits > bm->bits)
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free(bm->p);
+bm->p = NULL;
+}
+
+static inline bool xc_sr_set(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (!xc_sr_bitmap_resize(bm, bit))
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_clear(unsigned long bit, struct xc_sr_bitmap 
*bm)
+{
+return test_and_clear_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_set(unsigned long bit, struct xc_sr_bitmap 
*bm)
+{
+return test_and_set_bit(bit, bm->p);
+}
+
+static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+return xc_sr_test(pfn, &ctx->restore.populated_pfns);
+}
+
+static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+xc_interface *xch = ctx->xch;
+
+if ( !xc_sr_set(pfn, &ctx->restore.populated_pfns) )
+{
+ERROR("Failed to realloc populated_pfns bitmap");
+errno = ENOMEM;
+return -1;
+}
+return 0;
+}
+
 struct xc_sr_record
 {
 uint32_t type;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index a016678332..d53948e1a6 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Is a pfn populated?
- */
-static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn)
-{
-if ( pfn > ctx->restore.max_populated_pfn )
-return false;
-retu

[Xen-devel] [PATCH v4 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-25 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.h  |  26 ++--
 tools/libxc/xc_sr_restore.c |  75 +-
 tools/libxc/xc_sr_restore_x86_hvm.c | 274 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +-
 4 files changed, 369 insertions(+), 78 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 8901af112a..4c99f3653e 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -139,6 +139,16 @@ struct xc_sr_restore_ops
  */
 int (*setup)(struct xc_sr_context *ctx);
 
+/**
+ * Populate PFNs
+ *
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+
 /**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
@@ -224,6 +234,8 @@ struct xc_sr_context
 
 int send_back_fd;
 unsigned long p2m_size;
+unsigned long max_pages;
+unsigned long tot_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
 /* From Image Header. */
@@ -336,6 +348,12 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
+xen_pfn_t min_pfn;
 } restore;
 };
 } x86_hvm;
@@ -455,14 +473,6 @@ static inline int write_record(struct xc_sr_context *ctx,
  */
 int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
 
-/*
- * This would ideally be private in restore.c, but is needed by
- * x86_pv_localise_page() if we receive pagetables frames ahead of the
- * contents of the frames they point at.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types);
-
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53948e1a6..8cd9289d1a 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
-*pfns = malloc(count * sizeof(*pfns));
-unsigned i, nr_pfns = 0;
-int rc = -1;
-
-if ( !mfns || !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
- !pfn_is_populated(ctx, original_pfns[i]) )
-{
-rc = pfn_set_populated(ctx, original_pfns[i]);
-if ( rc )
-goto err;
-pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-++nr_pfns;
-}
-}
-
-if ( nr_pfns )
-{
-rc = xc_domain_populate_physmap_exact(
-xch, ctx->domid, nr_pfns, 0, 0, mfns);
-if ( rc )
-{
-PERROR("Failed to populate physmap");
-goto err;
-}
-
-for ( i = 0; i < nr_pfns; ++i )
-{
-if ( mfns[i] == INVALID_MFN )
-{
-ERROR("Populate physmap failed for pfn %u", i);
-rc = -1;
-goto err;
-}
-
-ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
-}
-}
-
-  

[Xen-devel] [PATCH v4 0/3] tools/libxc: use superpages

2017-08-25 Thread Olaf Hering
Using superpages on the receiving dom0 will avoid performance regressions.

Olaf

v4:
 restore trailing "_bit" in bitmap function names
 keep track of gaps between previous and current batch
 split alloc functionality in x86_hvm_allocate_pfn

v3:
 clear pointer in xc_sr_bitmap_free
 some coding style changes
 use getdomaininfo.max_pages to avoid Over-allocation check
 trim bitmap function names, drop trailing "_bit"
 add some comments
v2:
 split into individual commits


Olaf Hering (3):
  tools/libxc: move SUPERPAGE macros to common header
  tools/libxc: add API for bitmap access for restore
  tools/libxc: use superpages during restore of HVM guest

 tools/libxc/xc_dom_x86.c|   5 -
 tools/libxc/xc_private.h|   5 +
 tools/libxc/xc_sr_common.c  |  41 ++
 tools/libxc/xc_sr_common.h  |  94 +++--
 tools/libxc/xc_sr_restore.c | 141 ++-
 tools/libxc/xc_sr_restore_x86_hvm.c | 274 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +-
 7 files changed, 484 insertions(+), 148 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 1/3] tools/libxc: move SUPERPAGE macros to common header

2017-08-25 Thread Olaf Hering
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by
other code in libxc. Move the macros to a header file.

Signed-off-by: Olaf Hering 
Acked-by: Wei Liu 
---
 tools/libxc/xc_dom_x86.c | 5 -
 tools/libxc/xc_private.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..5aff5cad58 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -43,11 +43,6 @@
 
 #define SUPERPAGE_BATCH_SIZE 512
 
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 1c27b0fded..d581f850b0 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -66,6 +66,11 @@ struct iovec {
 #define DECLARE_FLASK_OP struct xen_flask_op op
 #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
 #undef PAGE_SHIFT
 #undef PAGE_SIZE
 #undef PAGE_MASK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-25 Thread Olaf Hering
On Fri, Aug 25, Olaf Hering wrote:

> I think with the new check of max_pages an overallocation can not happen
> anymore. If at some point the domU still has room for a superpage, it
> will be allocated. In case the batch does not fully fill the superpage,
> the holes will be freed. In the next batch no superpage can be allocated
> anymore, but single pages will be used.

There is one case where Over-allocation will happen: assume
x86_hvm_populate_pfns gets a batch of pfns that fit trigger the
allocation of a 1G page. All pfns will fit into that partly populated
superpage. Then the guest has a hole right after the max_pfn of that
batch. The next batch will start in a new superpage. As a result the
freeing part of x86_hvm_populate_pfns will not consider the previous
superpage anymore. Now 512MB are allocated, but unpopulated.

To handle this case the min_pfn/max_pfn have to be global so that the
current batch can free allocated pfns from previous batches.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-25 Thread Olaf Hering
On Fri, Aug 25, Wei Liu wrote:

> Maybe a middle ground is to scan the batch to see if pfns can be fit
> into a whole super page? I don't think you can get a batch as big as 1G
> but there should be a lot of 2M batches?

I think with the new check of max_pages an overallocation can not happen
anymore. If at some point the domU still has room for a superpage, it
will be allocated. In case the batch does not fully fill the superpage,
the holes will be freed. In the next batch no superpage can be allocated
anymore, but single pages will be used.

This punching of holes might be inefficent, the win is the usage of
superpages in case of contiguous pfns.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-25 Thread Olaf Hering
On Fri, Aug 25, Wei Liu wrote:

> I'm still unconvinced this works all the time because it still needs the
> assumption that the stream contains contiguous pfns.

This is how it is done today. If the pfns start to arrive in another
order the format has to be changed to send a memory layout in advance.

I will check if some sort of retry logic can be added.


Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 2/3] tools/libxc: add API for bitmap access for restore

2017-08-24 Thread Olaf Hering
Extend API for managing bitmaps. Each bitmap is now represented by a
generic struct xc_sr_bitmap.
Switch the existing populated_pfns to this API.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.c  | 41 +++
 tools/libxc/xc_sr_common.h  | 68 +++--
 tools/libxc/xc_sr_restore.c | 66 ++-
 3 files changed, 110 insertions(+), 65 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 79b9c3e940..4d221ca90c 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void)
 BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8);
 }
 
+/*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if (bits > bm->bits)
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if (!p)
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..8901af112a 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +261,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if (bits > bm->bits)
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free(bm->p);
+bm->p = NULL;
+}
+
+static inline bool xc_sr_set(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (!xc_sr_bitmap_resize(bm, bit))
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_clear(unsigned long bit, struct xc_sr_bitmap 
*bm)
+{
+return test_and_clear_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_set(unsigned long bit, struct xc_sr_bitmap 
*bm)
+{
+return test_and_set_bit(bit, bm->p);
+}
+
+static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+return xc_sr_test(pfn, &ctx->restore.populated_pfns);
+}
+
+static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+xc_interface *xch = ctx->xch;
+
+if ( !xc_sr_set(pfn, &ctx->restore.populated_pfns) )
+{
+ERROR("Failed to realloc populated_pfns bitmap");
+errno = ENOMEM;
+return -1;
+}
+return 0;
+}
+
 struct xc_sr_record
 {
 uint32_t type;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index a016678332..d53948e1a6 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Is a pfn populated?
- */
-static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn)
-{
-if ( pfn > ctx->restore.max_populated_pfn )
-return false;
-return test_bit(pfn, ctx-

[Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-24 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.h  |  25 +++--
 tools/libxc/xc_sr_restore.c |  75 ++---
 tools/libxc/xc_sr_restore_x86_hvm.c | 202 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 -
 4 files changed, 296 insertions(+), 78 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 8901af112a..bf2758e91a 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -139,6 +139,16 @@ struct xc_sr_restore_ops
  */
 int (*setup)(struct xc_sr_context *ctx);
 
+/**
+ * Populate PFNs
+ *
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+
 /**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
@@ -224,6 +234,8 @@ struct xc_sr_context
 
 int send_back_fd;
 unsigned long p2m_size;
+unsigned long max_pages;
+unsigned long tot_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
 /* From Image Header. */
@@ -336,6 +348,11 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
 } restore;
 };
 } x86_hvm;
@@ -455,14 +472,6 @@ static inline int write_record(struct xc_sr_context *ctx,
  */
 int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec);
 
-/*
- * This would ideally be private in restore.c, but is needed by
- * x86_pv_localise_page() if we receive pagetables frames ahead of the
- * contents of the frames they point at.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types);
-
 #endif
 /*
  * Local variables:
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53948e1a6..8cd9289d1a 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
-*pfns = malloc(count * sizeof(*pfns));
-unsigned i, nr_pfns = 0;
-int rc = -1;
-
-if ( !mfns || !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
- !pfn_is_populated(ctx, original_pfns[i]) )
-{
-rc = pfn_set_populated(ctx, original_pfns[i]);
-if ( rc )
-goto err;
-pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-++nr_pfns;
-}
-}
-
-if ( nr_pfns )
-{
-rc = xc_domain_populate_physmap_exact(
-xch, ctx->domid, nr_pfns, 0, 0, mfns);
-if ( rc )
-{
-PERROR("Failed to populate physmap");
-goto err;
-}
-
-for ( i = 0; i < nr_pfns; ++i )
-{
-if ( mfns[i] == INVALID_MFN )
-{
-ERROR("Populate physmap failed for pfn %u", i);
-rc = -1;
-goto err;
-}
-
-ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
-}
-}
-
-rc = 0;
-
- err:
-free(pfns);
-f

[Xen-devel] [PATCH v3 0/3] tools/libxc: use superpages

2017-08-24 Thread Olaf Hering
Using superpages on the receiving dom0 will avoid performance regressions.

Olaf

v3:
 clear pointer in xc_sr_bitmap_free
 some coding style changes
 use getdomaininfo.max_pages to avoid Over-allocation check
 trim bitmap function names, drop trailing "_bit"
 add some comments
v2:
 split into individual commits

Olaf Hering (3):
  tools/libxc: move SUPERPAGE macros to common header
  tools/libxc: add API for bitmap access for restore
  tools/libxc: use superpages during restore of HVM guest

 tools/libxc/xc_dom_x86.c|   5 -
 tools/libxc/xc_private.h|   5 +
 tools/libxc/xc_sr_common.c  |  41 
 tools/libxc/xc_sr_common.h  |  93 +++--
 tools/libxc/xc_sr_restore.c | 141 ++---
 tools/libxc/xc_sr_restore_x86_hvm.c | 202 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 -
 7 files changed, 411 insertions(+), 148 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 1/3] tools/libxc: move SUPERPAGE macros to common header

2017-08-24 Thread Olaf Hering
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by
other code in libxc. Move the macros to a header file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_dom_x86.c | 5 -
 tools/libxc/xc_private.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..5aff5cad58 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -43,11 +43,6 @@
 
 #define SUPERPAGE_BATCH_SIZE 512
 
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 1c27b0fded..d581f850b0 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -66,6 +66,11 @@ struct iovec {
 #define DECLARE_FLASK_OP struct xen_flask_op op
 #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
 #undef PAGE_SHIFT
 #undef PAGE_SIZE
 #undef PAGE_MASK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/3] tools/libxc: add API for bitmap access for restore

2017-08-23 Thread Olaf Hering
On Thu, Aug 17, Olaf Hering wrote:

> Extend API for managing bitmaps. Each bitmap is now represented by a
> generic struct xc_sr_bitmap.

> +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned 
> long bits)
> +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)

> +static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
> +static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm)
> +static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct 
> xc_sr_bitmap *bm)
> +static inline int xc_sr_test_and_set_bit(unsigned long bit, struct 
> xc_sr_bitmap *bm)

Any objection to remove the trailing '_bit' from these four function names?

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-23 Thread Olaf Hering
On Wed, Aug 23, Olaf Hering wrote:

> The value of p2m_size does not represent the actual number of pages
> assigned to a domU. This info is stored in getdomaininfo.max_pages,
> which is currently not used by restore. I will see if using this value
> will avoid triggering the Over-allocation check.

This untested change ontop of this series (done with git diff -w -b
base..HEAD) does some accounting to avoid Over-allocation:

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 26c45fdd6d..e0321ea224 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -234,6 +234,8 @@ struct xc_sr_context
 
 int send_back_fd;
 unsigned long p2m_size;
+unsigned long max_pages;
+unsigned long tot_pages;
 xc_hypercall_buffer_t dirty_bitmap_hbuf;
 
 /* From Image Header. */
@@ -375,6 +377,7 @@ static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap 
*bm, unsigned long bi
 static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
 {
 free(bm->p);
+bm->p = NULL;
 }
 
 static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index 1f9fe25b8f..eff24d3805 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -758,6 +758,9 @@ int xc_domain_restore(xc_interface *xch, int io_fd, 
uint32_t dom,
 return -1;
 }
 
+/* See xc_domain_getinfo */
+ctx.restore.max_pages = ctx.dominfo.max_memkb >> (PAGE_SHIFT-10);
+ctx.restore.tot_pages = ctx.dominfo.nr_pages;
 ctx.restore.p2m_size = nr_pfns;
 
 if ( ctx.dominfo.hvm )
diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c 
b/tools/libxc/xc_sr_restore_x86_hvm.c
index 60454148db..f2932dafb7 100644
--- a/tools/libxc/xc_sr_restore_x86_hvm.c
+++ b/tools/libxc/xc_sr_restore_x86_hvm.c
@@ -278,7 +278,8 @@ static int pfn_set_allocated(struct xc_sr_context *ctx, 
xen_pfn_t pfn)
 static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, xen_pfn_t pfn)
 {
 xc_interface *xch = ctx->xch;
-bool success = false;
+struct xc_sr_bitmap *bm;
+bool success = false, do_sp;
 int rc = -1, done;
 unsigned int order;
 unsigned long i;
@@ -303,15 +304,18 @@ static int x86_hvm_allocate_pfn(struct xc_sr_context 
*ctx, xen_pfn_t pfn)
 return -1;
 }
 DPRINTF("idx_1g %lu idx_2m %lu\n", idx_1g, idx_2m);
-if (!xc_sr_test_and_set_bit(idx_1g, &ctx->x86_hvm.restore.attempted_1g)) {
+
+bm = &ctx->x86_hvm.restore.attempted_1g;
 order = SUPERPAGE_1GB_SHIFT;
 count = 1UL << order;
+do_sp = ctx->restore.tot_pages + count <= ctx->restore.max_pages;
+if ( do_sp && !xc_sr_test_and_set_bit(idx_1g, bm) ) {
 base_pfn = (pfn >> order) << order;
 extnt = base_pfn;
 done = xc_domain_populate_physmap(xch, ctx->domid, 1, order, 0, 
&extnt);
 DPRINTF("1G base_pfn %" PRI_xen_pfn " done %d\n", base_pfn, done);
 if ( done > 0 ) {
-struct xc_sr_bitmap *bm = &ctx->x86_hvm.restore.attempted_2m;
+bm = &ctx->x86_hvm.restore.attempted_2m;
 success = true;
 stat_1g = done;
 for ( i = 0; i < (count >> SUPERPAGE_2MB_SHIFT); i++ )
@@ -319,9 +323,11 @@ static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, 
xen_pfn_t pfn)
 }
 }
 
-if (!xc_sr_test_and_set_bit(idx_2m, &ctx->x86_hvm.restore.attempted_2m)) {
+bm = &ctx->x86_hvm.restore.attempted_2m;
 order = SUPERPAGE_2MB_SHIFT;
 count = 1UL << order;
+do_sp = ctx->restore.tot_pages + count <= ctx->restore.max_pages;
+if ( do_sp && !xc_sr_test_and_set_bit(idx_2m, bm) ) {
 base_pfn = (pfn >> order) << order;
 extnt = base_pfn;
 done = xc_domain_populate_physmap(xch, ctx->domid, 1, order, 0, 
&extnt);
@@ -344,6 +350,7 @@ static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, 
xen_pfn_t pfn)
 if ( success == true ) {
 do {
 count--;
+ctx->restore.tot_pages++;
 rc = pfn_set_allocated(ctx, base_pfn + count);
 if ( rc )
 break;
@@ -396,6 +403,7 @@ static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, 
unsigned count,
 PERROR("Failed to release pfn %" PRI_xen_pfn, min_pfn);
 goto err;
 }
+ctx->restore.tot_pages--;
 }
 min_pfn++;
 }

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-23 Thread Olaf Hering
On Wed, Aug 23, Wei Liu wrote:

> On Tue, Aug 22, 2017 at 05:53:25PM +0200, Olaf Hering wrote:
> > In my testing I have seen the case of over-allocation. Thats why I
> > implemented the freeing of unpopulated parts. It would be nice to know
> > how many pages are actually coming. I think this info is not available.
> Not sure I follow. What do you mean by "how many pages are actually
> coming"?

This meant the expected number of pages to populate.

The value of p2m_size does not represent the actual number of pages
assigned to a domU. This info is stored in getdomaininfo.max_pages,
which is currently not used by restore. I will see if using this value
will avoid triggering the Over-allocation check.

> > On the other side, the first iteration sends the pfns linear. This is
> > when the allocation actually happens. So the over-allocation will only
> > trigger near the end, if a 1G range is allocated but only a few pages
> > will be stored into this range.
> This could be making too many assumptions on the data stream.

With the usage of max_pages some assumptions can be avoided.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-23 Thread Olaf Hering
On Tue, Aug 22, Olaf Hering wrote:

> In my testing I have seen the case of over-allocation. Thats why I
> implemented the freeing of unpopulated parts. It would be nice to know
> how many pages are actually coming. I think this info is not available.

If the receiving dom0 recognizes an over-allocation it must know how
much memory a domU is supposed to have. Perhaps there is a way to
retreive this info.

An interesting case is ballooning during migration. Is the new amount of
pages per domU actually transfered to the receiving domU? If the domU is
ballooned up the other side may see the incoming domU as over-allocated.
If it is ballooned down pages may be missing. Was this ever considered?


Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-22 Thread Olaf Hering
On Tue, Aug 22, Wei Liu wrote:

> On Thu, Aug 17, 2017 at 07:01:33PM +0200, Olaf Hering wrote:
> > +/* No superpage in 1st 2MB due to VGA hole */
> > +xc_sr_set_bit(0, &ctx->x86_hvm.restore.attempted_1g);
> > +xc_sr_set_bit(0, &ctx->x86_hvm.restore.attempted_2m);
> I don't quite get this. What about other holes such as MMIO?

This just copies what meminit_hvm does. Is there a way to know where the
MMIO hole is? Maybe I just missed the MMIO part. In the worst case I
think a super page is allocated, which is later split into single pages.

> One potential issue I can see with your algorithm is, if the stream of
> page info contains pages from different super pages, the risk of going
> over memory limit is high (hence failing the migration).
> 
> Is my concern unfounded?

In my testing I have seen the case of over-allocation. Thats why I
implemented the freeing of unpopulated parts. It would be nice to know
how many pages are actually coming. I think this info is not available.

On the other side, the first iteration sends the pfns linear. This is
when the allocation actually happens. So the over-allocation will only
trigger near the end, if a 1G range is allocated but only a few pages
will be stored into this range.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 0/3] tools/libxc: use superpages

2017-08-17 Thread Olaf Hering
Using superpages on the receiving dom0 will avoid performance regressions.

Olaf

v2:
 split into individual commits

Olaf Hering (3):
  tools/libxc: move SUPERPAGE macros to common header
  tools/libxc: add API for bitmap access for restore
  tools/libxc: use superpages during restore of HVM guest

 tools/libxc/xc_dom_x86.c|   5 -
 tools/libxc/xc_private.h|   5 +
 tools/libxc/xc_sr_common.c  |  41 
 tools/libxc/xc_sr_common.h  |  82 +++-
 tools/libxc/xc_sr_restore.c | 136 +--
 tools/libxc/xc_sr_restore_x86_hvm.c | 180 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 ++-
 7 files changed, 381 insertions(+), 140 deletions(-)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 3/3] tools/libxc: use superpages during restore of HVM guest

2017-08-17 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.h  |  15 +++
 tools/libxc/xc_sr_restore.c |  70 +-
 tools/libxc/xc_sr_restore_x86_hvm.c | 180 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 ++-
 4 files changed, 267 insertions(+), 70 deletions(-)

diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index 5d78f461af..26c45fdd6d 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -139,6 +139,16 @@ struct xc_sr_restore_ops
  */
 int (*setup)(struct xc_sr_context *ctx);
 
+/**
+ * Populate PFNs
+ *
+ * Given a set of pfns, obtain memory from Xen to fill the physmap for the
+ * unpopulated subset.
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+
 /**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
@@ -336,6 +346,11 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
 } restore;
 };
 } x86_hvm;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index d53948e1a6..1f9fe25b8f 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Given a set of pfns, obtain memory from Xen to fill the physmap for the
- * unpopulated subset.  If types is NULL, no page type checking is performed
- * and all unpopulated pfns are populated.
- */
-int populate_pfns(struct xc_sr_context *ctx, unsigned count,
-  const xen_pfn_t *original_pfns, const uint32_t *types)
-{
-xc_interface *xch = ctx->xch;
-xen_pfn_t *mfns = malloc(count * sizeof(*mfns)),
-*pfns = malloc(count * sizeof(*pfns));
-unsigned i, nr_pfns = 0;
-int rc = -1;
-
-if ( !mfns || !pfns )
-{
-ERROR("Failed to allocate %zu bytes for populating the physmap",
-  2 * count * sizeof(*mfns));
-goto err;
-}
-
-for ( i = 0; i < count; ++i )
-{
-if ( (!types || (types &&
- (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
-  types[i] != XEN_DOMCTL_PFINFO_BROKEN))) &&
- !pfn_is_populated(ctx, original_pfns[i]) )
-{
-rc = pfn_set_populated(ctx, original_pfns[i]);
-if ( rc )
-goto err;
-pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i];
-++nr_pfns;
-}
-}
-
-if ( nr_pfns )
-{
-rc = xc_domain_populate_physmap_exact(
-xch, ctx->domid, nr_pfns, 0, 0, mfns);
-if ( rc )
-{
-PERROR("Failed to populate physmap");
-goto err;
-}
-
-for ( i = 0; i < nr_pfns; ++i )
-{
-if ( mfns[i] == INVALID_MFN )
-{
-ERROR("Populate physmap failed for pfn %u", i);
-rc = -1;
-goto err;
-}
-
-ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]);
-}
-}
-
-rc = 0;
-
- err:
-free(pfns);
-free(mfns);
-
-return rc;
-}
-
 /*
  * Given a list of pfns, their types, and a block of page data from the
  * stream, populate and record their types, map the relevant subset and copy
@@ -161,7 +93,7 @@ static int process_page_data(struct xc_sr_context *ctx, 
unsigned count,
 goto err;
 }
 
-rc = populate_pfns(ctx, count, pfns, types);
+rc = ctx->restore.ops.populate_pfns(ctx, count, pfns, types);
 if ( rc )
 {
 ERROR("Failed to populate pfns for batch of %u pages", count);
diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c 
b/tools/libxc/xc_sr_restore_x86_hvm.c
index 1dca85354a..60454148db 100644
--- a/tools/libxc/xc_sr_restore_x86_hvm.c
+++ b/tools/libxc/xc_sr_restore_x86_hvm.c
@@ -135,6 +135,8 @@ static int x86_hvm_localise_p

[Xen-devel] [PATCH v2 1/3] tools/libxc: move SUPERPAGE macros to common header

2017-08-17 Thread Olaf Hering
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by
other code in libxc. Move the macros to a header file.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_dom_x86.c | 5 -
 tools/libxc/xc_private.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index cb68efcbd3..5aff5cad58 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -43,11 +43,6 @@
 
 #define SUPERPAGE_BATCH_SIZE 512
 
-#define SUPERPAGE_2MB_SHIFT   9
-#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
-#define SUPERPAGE_1GB_SHIFT   18
-#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
-
 #define X86_CR0_PE 0x01
 #define X86_CR0_ET 0x10
 
diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h
index 1c27b0fded..d581f850b0 100644
--- a/tools/libxc/xc_private.h
+++ b/tools/libxc/xc_private.h
@@ -66,6 +66,11 @@ struct iovec {
 #define DECLARE_FLASK_OP struct xen_flask_op op
 #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op
 
+#define SUPERPAGE_2MB_SHIFT   9
+#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
+#define SUPERPAGE_1GB_SHIFT   18
+#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)
+
 #undef PAGE_SHIFT
 #undef PAGE_SIZE
 #undef PAGE_MASK

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/3] tools/libxc: add API for bitmap access for restore

2017-08-17 Thread Olaf Hering
Extend API for managing bitmaps. Each bitmap is now represented by a
generic struct xc_sr_bitmap.
Switch the existing populated_pfns to this API.

Signed-off-by: Olaf Hering 
---
 tools/libxc/xc_sr_common.c  | 41 +++
 tools/libxc/xc_sr_common.h  | 67 +++--
 tools/libxc/xc_sr_restore.c | 66 ++--
 3 files changed, 109 insertions(+), 65 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 79b9c3e940..4d221ca90c 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void)
 BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8);
 }
 
+/*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if (bits > bm->bits)
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if (!p)
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..5d78f461af 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +261,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -343,6 +348,64 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if (bits > bm->bits)
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free(bm->p);
+}
+
+static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (!xc_sr_bitmap_resize(bm, bit))
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (bit > bm->bits)
+return false;
+return !!test_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+return test_and_clear_bit(bit, bm->p);
+}
+
+static inline int xc_sr_test_and_set_bit(unsigned long bit, struct 
xc_sr_bitmap *bm)
+{
+return test_and_set_bit(bit, bm->p);
+}
+
+static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns);
+}
+
+static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn)
+{
+xc_interface *xch = ctx->xch;
+
+if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) )
+{
+ERROR("Failed to realloc populated_pfns bitmap");
+errno = ENOMEM;
+return -1;
+}
+return 0;
+}
+
 struct xc_sr_record
 {
 uint32_t type;
diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c
index a016678332..d53948e1a6 100644
--- a/tools/libxc/xc_sr_restore.c
+++ b/tools/libxc/xc_sr_restore.c
@@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx)
 return 0;
 }
 
-/*
- * Is a pfn populated?
- */
-static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn)
-{
-if ( pfn > ctx->restore.max_populated_pfn )
-return false;
-return test_bit(pf

Re: [Xen-devel] [PATCH v1] tools/libxc: use superpages during restore of HVM guest

2017-08-04 Thread Olaf Hering
On Fri, Aug 04, Wei Liu wrote:

> Can you split this patch into several:
> 1. code movement
> 2. refactoring / introduction of new hooks
> 3. implementing the new algorithm

I tried that, it did not work well. But, I can try again if required.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v1] tools/libxc: use superpages during restore of HVM guest

2017-08-03 Thread Olaf Hering
On Wed, Aug 02, Olaf Hering wrote:

> +++ b/tools/libxc/xc_sr_restore_x86_hvm.c

> +#define SUPERPAGE_2MB_SHIFT   9
> +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT)
> +#define SUPERPAGE_1GB_SHIFT   18
> +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT)

I think these can be moved to a header file. xc_dom_x86.c and
xc_sr_restore_x86_hvm.c use xc_dom.h.

> +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count,
> + const xen_pfn_t *original_pfns,
> + const uint32_t *types)
> +{
> +xc_interface *xch = ctx->xch;
> +xen_pfn_t min_pfn = original_pfns[0], max_pfn = original_pfns[0];
> +unsigned i;
> +int rc = -1;
> +
> +for ( i = 0; i < count; ++i )
> +{
> +if (original_pfns[i] < min_pfn)
> +min_pfn = original_pfns[i];
> +if (original_pfns[i] > max_pfn)
> +max_pfn = original_pfns[i];
> +if ( (types[i] != XEN_DOMCTL_PFINFO_XTAB &&
> +  types[i] != XEN_DOMCTL_PFINFO_BROKEN) &&
> + !pfn_is_populated(ctx, original_pfns[i]) )

Are these types used at all for a HVM domU? Otherwise this condition can
be simplified to just check the populated state.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] backport docs changes for Xen 4.9.1

2017-08-03 Thread Olaf Hering
On Thu, Aug 03, Jan Beulich wrote:

> >>> On 01.08.17 at 11:43,  wrote:
> > Please backport the following changes for docs/ for the Xen 4.9.1
> > release:
> > 
> > aa4eb460bc docs: add pod variant of xl-numa-placement
> > 458df9f374 docs: add pod variant of xl-network-configuration.5
> > 4359b86f31 docs: add pod variant of xen-pv-channel.7
> I'm not convinced these qualify for backporting. What's the
> justification?

Less paperwork for me, avoids maintaining three patches. And it fixes the
references within man pages for those who have no pandoc while building
Xen. Not sure if that is just SUSE.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v1] tools/libxc: use superpages during restore of HVM guest

2017-08-02 Thread Olaf Hering
During creating of a HVM domU meminit_hvm() tries to map superpages.
After save/restore or migration this mapping is lost, everything is
allocated in single pages. This causes a performance degradition after
migration.

Add neccessary code to preallocate a superpage for the chunk of pfns
that is received. In case a pfn was not populated on the sending side it
must be freed on the receiving side to avoid over-allocation.

The existing code for x86_pv is moved unmodified into its own file.

Signed-off-by: Olaf Hering 
---

based on RELEASE-4.9.0


 tools/libxc/xc_sr_common.c  |  41 
 tools/libxc/xc_sr_common.h  |  79 +++-
 tools/libxc/xc_sr_restore.c | 135 +-
 tools/libxc/xc_sr_restore_x86_hvm.c | 183 
 tools/libxc/xc_sr_restore_x86_pv.c  |  72 +-
 5 files changed, 376 insertions(+), 134 deletions(-)

diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c
index 48fa676f4e..9b68a064eb 100644
--- a/tools/libxc/xc_sr_common.c
+++ b/tools/libxc/xc_sr_common.c
@@ -156,6 +156,47 @@ static void __attribute__((unused)) build_assertions(void)
 }
 
 /*
+ * Expand the tracking structures as needed.
+ * To avoid realloc()ing too excessively, the size increased to the nearest 
power
+ * of two large enough to contain the required number of bits.
+ */
+bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits)
+{
+if (bits > bm->bits)
+{
+size_t new_max;
+size_t old_sz, new_sz;
+void *p;
+
+/* Round up to the nearest power of two larger than bit, less 1. */
+new_max = bits;
+new_max |= new_max >> 1;
+new_max |= new_max >> 2;
+new_max |= new_max >> 4;
+new_max |= new_max >> 8;
+new_max |= new_max >> 16;
+#ifdef __x86_64__
+new_max |= new_max >> 32;
+#endif
+
+old_sz = bitmap_size(bm->bits + 1);
+new_sz = bitmap_size(new_max + 1);
+p = realloc(bm->p, new_sz);
+if (!p)
+return false;
+
+if (bm->p)
+memset(p + old_sz, 0, new_sz - old_sz);
+else
+memset(p, 0, new_sz);
+
+bm->p = p;
+bm->bits = new_max;
+}
+return true;
+}
+
+/*
  * Local variables:
  * mode: C
  * c-file-style: "BSD"
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22af4e..ad1a2e6e02 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -140,6 +140,13 @@ struct xc_sr_restore_ops
 int (*setup)(struct xc_sr_context *ctx);
 
 /**
+ * Populate PFNs
+ *
+ */
+int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count,
+ const xen_pfn_t *original_pfns, const uint32_t 
*types);
+
+/**
  * Process an individual record from the stream.  The caller shall take
  * care of processing common records (e.g. END, PAGE_DATA).
  *
@@ -172,6 +179,12 @@ struct xc_sr_x86_pv_restore_vcpu
 size_t basicsz, extdsz, xsavesz, msrsz;
 };
 
+struct xc_sr_bitmap
+{
+void *p;
+unsigned long bits;
+};
+
 struct xc_sr_context
 {
 xc_interface *xch;
@@ -255,8 +268,7 @@ struct xc_sr_context
 domid_t  xenstore_domid,  console_domid;
 
 /* Bitmap of currently populated PFNs during restore. */
-unsigned long *populated_pfns;
-xen_pfn_t max_populated_pfn;
+struct xc_sr_bitmap populated_pfns;
 
 /* Sender has invoked verify mode on the stream. */
 bool verify;
@@ -331,6 +343,11 @@ struct xc_sr_context
 /* HVM context blob. */
 void *context;
 size_t contextsz;
+
+/* Bitmap of currently allocated PFNs during restore. */
+struct xc_sr_bitmap attempted_1g;
+struct xc_sr_bitmap attempted_2m;
+struct xc_sr_bitmap allocated_pfns;
 } restore;
 };
 } x86_hvm;
@@ -343,6 +360,64 @@ extern struct xc_sr_save_ops save_ops_x86_hvm;
 extern struct xc_sr_restore_ops restore_ops_x86_pv;
 extern struct xc_sr_restore_ops restore_ops_x86_hvm;
 
+extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits);
+
+static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long 
bits)
+{
+if (bits > bm->bits)
+return _xc_sr_bitmap_resize(bm, bits);
+return true;
+}
+
+static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm)
+{
+free(bm->p);
+}
+
+static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (!xc_sr_bitmap_resize(bm, bit))
+return false;
+
+set_bit(bit, bm->p);
+return true;
+}
+
+static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm)
+{
+if (bit > bm->bit

Re: [Xen-devel] [PATCH] vtpmmgr: make inline functions static

2017-08-01 Thread Olaf Hering
Ping

On Fri, Jun 23, Olaf Hering wrote:

> gcc7 is more strict with functions marked as inline. They are not
> automatically inlined. Instead a function call is generated, but the
> actual code is not visible by the linker.
> 
> Do a mechanical change and mark every 'inline' as 'static inline'. For
> simpler review the static goes into an extra line.
> 
> Signed-off-by: Olaf Hering 
> ---
>  stubdom/vtpmmgr/marshal.h  | 76 
> ++
>  stubdom/vtpmmgr/tcg.h  | 14 
>  stubdom/vtpmmgr/tpm2_marshal.h | 58 
>  stubdom/vtpmmgr/tpmrsa.h   |  1 +
>  4 files changed, 149 insertions(+)
> 
> diff --git a/stubdom/vtpmmgr/marshal.h b/stubdom/vtpmmgr/marshal.h
> index d826f19d89..dce19c6439 100644
> --- a/stubdom/vtpmmgr/marshal.h
> +++ b/stubdom/vtpmmgr/marshal.h
> @@ -47,16 +47,19 @@ typedef enum UnpackPtr {
>   UNPACK_ALLOC
>  } UnpackPtr;
>  
> +static
>  inline BYTE* pack_BYTE(BYTE* ptr, BYTE t) {

...


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] Stable Branch Maintainer in Wiki and MAINTAINERS

2017-08-01 Thread Olaf Hering
According to
https://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases in the
"Stable Branch Maintainer" section someone is supposed to be added to
the MAINTAINERS file. Where in the staging-4.9 branch was this change
done? I guess an equivalent of 1f4ea16035 ("update Xen version to
4.8.1-pre") is missing.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] backport docs changes for Xen 4.9.1

2017-08-01 Thread Olaf Hering
Please backport the following changes for docs/ for the Xen 4.9.1
release:

aa4eb460bc docs: add pod variant of xl-numa-placement
458df9f374 docs: add pod variant of xl-network-configuration.5
4359b86f31 docs: add pod variant of xen-pv-channel.7
55924baf22 docs: correct paragraph indention in xen-tscmode
763267e315 docs: replace xm with xl in xen-tscmode

Thanks.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] xen-disk: use g_new0 to fix build

2017-07-28 Thread Olaf Hering
g_malloc0_n is available since glib-2.24. To allow build with older glib
versions use the generic g_new0, which is already used in many other
places in the code.

Fixes commit 3284fad728 ("xen-disk: add support for multi-page shared rings")

Signed-off-by: Olaf Hering 
---
 hw/block/xen_disk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index d42ed7070d..536e2ee735 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1232,7 +1232,7 @@ static int blk_connect(struct XenDevice *xendev)
 return -1;
 }
 
-domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
+domids = g_new0(uint32_t, blkdev->nr_ring_ref);
 for (i = 0; i < blkdev->nr_ring_ref; i++) {
 domids[i] = blkdev->xendev.dom;
 }

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [PATCH] xen-disk: use g_malloc0 to fix build

2017-07-28 Thread Olaf Hering
On Fri, Jul 28, Eric Blake wrote:

> This version is prone to multiplication overflow (well, maybe not, but
> you have to audit for that).  Wouldn't it be better to use:

What could go wrong?
qemu will die either way, I think.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen-disk: use g_malloc0 to fix build

2017-07-28 Thread Olaf Hering
g_malloc0_n is available since glib-2.24. To allow build with older glib
versions use the generic g_malloc0, which is already used in many other
places in the code.

Fixes commit 3284fad728 ("xen-disk: add support for multi-page shared rings")

Signed-off-by: Olaf Hering 
---
 hw/block/xen_disk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index d42ed7070d..71deec17b0 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1232,7 +1232,7 @@ static int blk_connect(struct XenDevice *xendev)
 return -1;
 }
 
-domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
+domids = g_malloc0(blkdev->nr_ring_ref * sizeof(uint32_t));
 for (i = 0; i < blkdev->nr_ring_ref; i++) {
 domids[i] = blkdev->xendev.dom;
 }

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PULL 3/3] xen-disk: add support for multi-page shared rings

2017-07-26 Thread Olaf Hering
On Tue, Jun 27, Stefano Stabellini wrote:

> From: Paul Durrant 
> The blkif protocol has had provision for negotiation of multi-page shared
> rings for some time now and many guest OS have support in their frontend
> drivers.

> +++ b/hw/block/xen_disk.c

> +domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));

According to [1] g_malloc0_n requires at least glib-2.24. As a result
compilation of qemu-2.10 fails in SLE11, which has just glib-2.22.

Olaf

[1] https://developer.gnome.org/glib/stable/glib-Memory-Allocation.html


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 0/3] docs: convert manpages to pod

2017-07-26 Thread Olaf Hering
To remove the buildtime dependency to pandoc/ghc some manpages are
converted from markdown to pod format. This will provide more manpages
which are referenced in xl(1) and xl.cfg(5).

This series does not cover xen-vbd-interface.7 because converting the
lists used in this manpage was not straight forward.

Olaf

v3:
 - add NAME/DESCRIPION, minor formating tweaks, whitespace
v2:
 fold each add/remove into a single commit

Cc: Ian Jackson 
Cc: Wei Liu 
To: xen-devel@lists.xen.org

Olaf Hering (3):
  docs: add pod variant of xen-pv-channel.7
  docs: add pod variant of xl-network-configuration.5
  docs: add pod variant of xl-numa-placement

 docs/man/xen-pv-channel.markdown.7 | 106 ---
 docs/man/xen-pv-channel.pod.7  | 188 
 ...n.markdown.5 => xl-network-configuration.pod.5} | 196 ++---
 ...lacement.markdown.7 => xl-numa-placement.pod.7} | 166 +++--
 4 files changed, 435 insertions(+), 221 deletions(-)
 delete mode 100644 docs/man/xen-pv-channel.markdown.7
 create mode 100644 docs/man/xen-pv-channel.pod.7
 rename docs/man/{xl-network-configuration.markdown.5 => 
xl-network-configuration.pod.5} (55%)
 rename docs/man/{xl-numa-placement.markdown.7 => xl-numa-placement.pod.7} (74%)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 1/3] docs: add pod variant of xen-pv-channel.7

2017-07-26 Thread Olaf Hering
Convert source for xen-pv-channel.7 from markdown to pod.
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.

Signed-off-by: Olaf Hering 
---
 docs/man/xen-pv-channel.markdown.7 | 106 -
 docs/man/xen-pv-channel.pod.7  | 188 +
 2 files changed, 188 insertions(+), 106 deletions(-)
 delete mode 100644 docs/man/xen-pv-channel.markdown.7
 create mode 100644 docs/man/xen-pv-channel.pod.7

diff --git a/docs/man/xen-pv-channel.markdown.7 
b/docs/man/xen-pv-channel.markdown.7
deleted file mode 100644
index 1c6149dae0..00
--- a/docs/man/xen-pv-channel.markdown.7
+++ /dev/null
@@ -1,106 +0,0 @@
-Xen PV Channels
-===
-
-A channel is a low-bandwidth private byte stream similar to a serial
-link. Typical uses of channels are
-
-  1. to provide initial configuration information to a VM on boot
- (example use: CloudStack's cloud-early-config service)
-  2. to signal/query an in-guest agent
- (example use: oVirt's guest agent)
-
-Channels are similar to virtio-serial devices and emulated serial links.
-Channels are intended to be used in the implementation of libvirt s
-when running on Xen.
-
-Note: if an application requires a high-bandwidth link then it should use
-vchan instead.
-
-How to use channels: an example

-
-Consider a cloud deployment where VMs are cloned from pre-made templates,
-and customised on first boot by an in-guest agent which sets the IP address,
-hostname, ssh keys etc. To install the system the cloud administrator would
-first:
-
-  1. Install a guest as normal (no channel configuration necessary)
-  2. Install the in-guest agent specific to the cloud software. This will
- prepare the guest to communicate over the channel, and also prepare
- the guest to be cloned safely (sometimes known as "sysprepping")
-  3. Shutdown the guest
-  4. Register the guest as a template with the cloud orchestration software
-  5. Install the cloud orchestration agent in dom0
-
-At runtime, when a cloud tenant requests that a VM is created from the 
template,
-the sequence of events would be: (assuming a Linux domU)
-
-  1. A VM is "cloned" from the template
-  2. A unique Unix domain socket path in dom0 is allocated
- (e.g. /my/cloud/software/talk/to/domain/)
-  3. Domain configuration is created for the VM, listing the channel
- name expected by the in-guest agent. In xl syntax this would be:
-
- channel = [ "connection=socket, name=org.my.cloud.software.agent.version1,
-  path = /my/cloud/software/talk/to/domain/" ]
-
-  4. The VM is started
-  5. In dom0 the cloud orchestration agent connects to the Unix domain
- socket, writes a handshake message and waits for a reply
-  6. Assuming the guest kernel has CONFIG_HVC_XEN_FRONTEND set then the console
- driver will generate a hotplug event
-  7. A udev rule is activated by the hotplug event.
-
- The udev rule would look something like:
-
- SUBSYSTEM=="xen", DEVPATH=="/devices/console-[0-9]", 
RUN+="xen-console-setup"
-
- where the "xen-console-setup" script would read the channel name and
- make a symlink in /dev/xen-channel/org.my.cloud.software.agent.version1
-
-  8. The in-guest agent uses inotify to see the creation of the 
/dev/xen-channel
- symlink and opens the device.
-  9. The in-guest agent completes the handshake with the dom0 agent
- 10. The dom0 agent transmits the unique VM configuration: hostname, IP
- address, ssh keys etc etc
- 11. The in-guest agent receives the configuration and applies it.
-
-Using channels avoids having to use a temporary disk device or network
-connection.
-
-Design recommendations and pitfalls

-
-It's necessary to install channel-specific software (an "agent") into the guest
-before you can use a channel. By default a channel will appear as a device
-which could be mistaken for a serial port or regular console. It is known
-that some software will proactively seek out serial ports and issue AT commands
-at them; make sure such software is disabled!
-
-Since channels are identified by names, application authors must ensure their
-channel names are unique to avoid clashes. We recommend that channel names
-include parts unique to the application such as a domain names. To assist
-prevent clashes we recommend authors add their names to our global channel
-registry at the end of this document.
-
-Limitations

-
-Hotplug and unplug of channels is not currently implemented.
-
-Channel name registry
--
-
-It is important that channel names are globally unique. To help ensure
-that no-one's name clashes with yours, please add yours to this list.
-
-Key:
-N: Name
-C: Contact
-  

[Xen-devel] [PATCH v3 2/3] docs: add pod variant of xl-network-configuration.5

2017-07-26 Thread Olaf Hering
Convert source for xl-network-configuration.5 from markdown to pod.
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.

Signed-off-by: Olaf Hering 
---
 ...n.markdown.5 => xl-network-configuration.pod.5} | 196 ++---
 1 file changed, 137 insertions(+), 59 deletions(-)
 rename docs/man/{xl-network-configuration.markdown.5 => 
xl-network-configuration.pod.5} (55%)

diff --git a/docs/man/xl-network-configuration.markdown.5 
b/docs/man/xl-network-configuration.pod.5
similarity index 55%
rename from docs/man/xl-network-configuration.markdown.5
rename to docs/man/xl-network-configuration.pod.5
index 84c2645ad8..e9ac3c5b9e 100644
--- a/docs/man/xl-network-configuration.markdown.5
+++ b/docs/man/xl-network-configuration.pod.5
@@ -1,6 +1,11 @@
-# XL Network Configuration
+=encoding utf8
 
-## Syntax Overview
+=head1 NAME
+
+xl-network-configuration - XL Network Configuration Syntax
+
+
+=head1 SYNTAX
 
 This document specifies the xl config file format vif configuration
 option.  It has the following form:
@@ -8,7 +13,7 @@ option.  It has the following form:
 vif = [ '', '', ... ]
 
 where each vifspec is in this form:
-
+
 [=|,]
 
 For example:
@@ -24,11 +29,13 @@ These might be specified in the domain config file like 
this:
 More formally, the string is a series of comma-separated keyword/value
 pairs. All keywords are optional.
 
-Each device has a `DEVID` which is its index within the vif list, starting 
from 0.
+Each device has a C which is its index within the vif list, starting 
from 0.
 
-## Keywords
 
-### mac
+=head1 Keywords
+
+
+=head2 mac
 
 If specified then this option specifies the MAC address inside the
 guest of this VIF device. The value is a 48-bit number represented as
@@ -36,89 +43,137 @@ six groups of two hexadecimal digits, separated by colons 
(:).
 
 The default if this keyword is not specified is to be automatically
 generate a MAC address inside the space assigned to Xen's
-[Organizationally Unique Identifier][oui] (00:16:3e).
+Lhttp://en.wikipedia.org/wiki/Organizationally_Unique_Identifier> 
(00:16:3e).
 
 If you are choosing a MAC address then it is strongly recommend to
 follow one of the following strategies:
 
-  * Generate a random sequence of 6 byte, set the locally administered
-bit (bit 2 of the first byte) and clear the multicast bit (bit 1
-of the first byte). In other words the first byte should have the
-bit pattern xx10 (where x is a randomly generated bit) and the
-remaining 5 bytes are randomly generated See
-[http://en.wikipedia.org/wiki/MAC_address] for more details the
-structure of a MAC address.
-  * Allocate an address from within the space defined by your
-organization's OUI (if you have one) following your organization's
-procedures for doing so.
-  * Allocate an address from within the space defined by Xen's OUI
-(00:16:3e). Taking care not to clash with other users of the
-physical network segment where this VIF will reside.
+=over
+
+=item *
+
+Generate a random sequence of 6 byte, set the locally administered
+bit (bit 2 of the first byte) and clear the multicast bit (bit 1
+of the first byte). In other words the first byte should have the
+bit pattern xx10 (where x is a randomly generated bit) and the
+remaining 5 bytes are randomly generated See
+[http://en.wikipedia.org/wiki/MAC_address] for more details the
+structure of a MAC address.
+
+
+=item *
+
+Allocate an address from within the space defined by your
+organization's OUI (if you have one) following your organization's
+procedures for doing so.
+
+
+=item *
+
+Allocate an address from within the space defined by Xen's OUI
+(00:16:3e). Taking care not to clash with other users of the
+physical network segment where this VIF will reside.
+
+
+=back
 
 If you have an OUI for your own use then that is the preferred
 strategy. Otherwise in general you should prefer to generate a random
 MAC and set the locally administered bit since this allows for more
 bits of randomness than using the Xen OUI.
 
-### bridge
+
+=head2 bridge
 
 Specifies the name of the network bridge which this VIF should be
-added to. The default is `xenbr0`. The bridge must be configured using
-your distribution's network configuration tools. See the [wiki][net]
+added to. The default is C. The bridge must be configured using
+your distribution's network configuration tools. See the 
Lhttp://wiki.xen.org/wiki/HostConfiguration/Networking>
 for guidance and examples.
 
-### gatewaydev
+
+=head2 gatewaydev
 
 Specifies the name of the network interface which has an IP and which
 is in the network the VIF should communicate with. This is used in the host
-by the vif-route hotplug script. See [wiki][vifroute] for guidance and
+by the vif-route hotplug script. See 
Lhttp://wiki.xen.org/wiki/Vif-rou

[Xen-devel] [PATCH v3 3/3] docs: add pod variant of xl-numa-placement

2017-07-26 Thread Olaf Hering
Convert source for xl-numa-placement.7 from markdown to pod.
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.

Signed-off-by: Olaf Hering 
---
 ...lacement.markdown.7 => xl-numa-placement.pod.7} | 166 ++---
 1 file changed, 110 insertions(+), 56 deletions(-)
 rename docs/man/{xl-numa-placement.markdown.7 => xl-numa-placement.pod.7} (74%)

diff --git a/docs/man/xl-numa-placement.markdown.7 
b/docs/man/xl-numa-placement.pod.7
similarity index 74%
rename from docs/man/xl-numa-placement.markdown.7
rename to docs/man/xl-numa-placement.pod.7
index f863492093..54a444172e 100644
--- a/docs/man/xl-numa-placement.markdown.7
+++ b/docs/man/xl-numa-placement.pod.7
@@ -1,6 +1,12 @@
-# Guest Automatic NUMA Placement in libxl and xl #
+=encoding utf8
 
-## Rationale ##
+=head1 NAME
+
+Guest Automatic NUMA Placement in libxl and xl
+
+=head1 DESCRIPTION
+
+=head2 Rationale
 
 NUMA (which stands for Non-Uniform Memory Access) means that the memory
 accessing times of a program running on a CPU depends on the relative
@@ -17,13 +23,14 @@ running memory-intensive workloads on a shared host. In 
fact, the cost
 of accessing non node-local memory locations is very high, and the
 performance degradation is likely to be noticeable.
 
-For more information, have a look at the [Xen NUMA Introduction][numa_intro]
+For more information, have a look at the Lhttp://wiki.xen.org/wiki/Xen_NUMA_Introduction>
 page on the Wiki.
 
-## Xen and NUMA machines: the concept of _node-affinity_ ##
+
+=head2 Xen and NUMA machines: the concept of I
 
 The Xen hypervisor deals with NUMA machines throughout the concept of
-_node-affinity_. The node-affinity of a domain is the set of NUMA nodes
+I. The node-affinity of a domain is the set of NUMA nodes
 of the host where the memory for the domain is being allocated (mostly,
 at domain creation time). This is, at least in principle, different and
 unrelated with the vCPU (hard and soft, see below) scheduling affinity,
@@ -42,15 +49,16 @@ it is very important to "place" the domain correctly when 
it is fist
 created, as the most of its memory is allocated at that time and can
 not (for now) be moved easily.
 
-### Placing via pinning and cpupools ###
+
+=head2 Placing via pinning and cpupools
 
 The simplest way of placing a domain on a NUMA node is setting the hard
 scheduling affinity of the domain's vCPUs to the pCPUs of the node. This
 also goes under the name of vCPU pinning, and can be done through the
 "cpus=" option in the config file (more about this below). Another option
 is to pool together the pCPUs spanning the node and put the domain in
-such a _cpupool_ with the "pool=" config option (as documented in our
-[Wiki][cpupools_howto]).
+such a I with the "pool=" config option (as documented in our
+Lhttp://wiki.xen.org/wiki/Cpupools_Howto>).
 
 In both the above cases, the domain will not be able to execute outside
 the specified set of pCPUs for any reasons, even if all those pCPUs are
@@ -59,7 +67,8 @@ busy doing something else while there are others, idle, pCPUs.
 So, when doing this, local memory accesses are 100% guaranteed, but that
 may come at he cost of some load imbalances.
 
-### NUMA aware scheduling ###
+
+=head2 NUMA aware scheduling
 
 If using the credit1 scheduler, and starting from Xen 4.3, the scheduler
 itself always tries to run the domain's vCPUs on one of the nodes in
@@ -87,21 +96,37 @@ workload.
 
 Notice that, for each vCPU, the following three scenarios are possbile:
 
-  * a vCPU *is pinned* to some pCPUs and *does not have* any soft affinity
-In this case, the vCPU is always scheduled on one of the pCPUs to which
-it is pinned, without any specific peference among them.
-  * a vCPU *has* its own soft affinity and *is not* pinned to any particular
-pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the
-scheduler will try to have it running on one of the pCPUs in its soft
-affinity;
-  * a vCPU *has* its own vCPU soft affinity and *is also* pinned to some
-pCPUs. In this case, the vCPU is always scheduled on one of the pCPUs
-onto which it is pinned, with, among them, a preference for the ones
-that also forms its soft affinity. In case pinning and soft affinity
-form two disjoint sets of pCPUs, pinning "wins", and the soft affinity
-is just ignored.
-
-## Guest placement in xl ##
+=over
+
+=item *
+
+a vCPU I to some pCPUs and I any soft affinity
+In this case, the vCPU is always scheduled on one of the pCPUs to which
+it is pinned, without any specific peference among them.
+
+
+=item *
+
+a vCPU I its own soft affinity and I pinned to any particular
+pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the
+scheduler will try to have it running on one of the pCPUs in its soft
+affinity;
+
+
+=item *
+
+a vCPU I its own vCPU sof

[Xen-devel] [PATCH v2 3/3] docs: add pod variant of xl-numa-placement

2017-07-24 Thread Olaf Hering
Convert source for xl-numa-placement.7 from markdown to pod.
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.

Signed-off-by: Olaf Hering 
---
 ...lacement.markdown.7 => xl-numa-placement.pod.7} | 164 ++---
 1 file changed, 108 insertions(+), 56 deletions(-)
 rename docs/man/{xl-numa-placement.markdown.7 => xl-numa-placement.pod.7} (74%)

diff --git a/docs/man/xl-numa-placement.markdown.7 
b/docs/man/xl-numa-placement.pod.7
similarity index 74%
rename from docs/man/xl-numa-placement.markdown.7
rename to docs/man/xl-numa-placement.pod.7
index f863492093..5cad33be48 100644
--- a/docs/man/xl-numa-placement.markdown.7
+++ b/docs/man/xl-numa-placement.pod.7
@@ -1,6 +1,10 @@
-# Guest Automatic NUMA Placement in libxl and xl #
+=encoding utf8
 
-## Rationale ##
+
+=head1 Guest Automatic NUMA Placement in libxl and xl
+
+
+=head2 Rationale
 
 NUMA (which stands for Non-Uniform Memory Access) means that the memory
 accessing times of a program running on a CPU depends on the relative
@@ -17,13 +21,14 @@ running memory-intensive workloads on a shared host. In 
fact, the cost
 of accessing non node-local memory locations is very high, and the
 performance degradation is likely to be noticeable.
 
-For more information, have a look at the [Xen NUMA Introduction][numa_intro]
+For more information, have a look at the Lhttp://wiki.xen.org/wiki/Xen_NUMA_Introduction>
 page on the Wiki.
 
-## Xen and NUMA machines: the concept of _node-affinity_ ##
+
+=head2 Xen and NUMA machines: the concept of I
 
 The Xen hypervisor deals with NUMA machines throughout the concept of
-_node-affinity_. The node-affinity of a domain is the set of NUMA nodes
+I. The node-affinity of a domain is the set of NUMA nodes
 of the host where the memory for the domain is being allocated (mostly,
 at domain creation time). This is, at least in principle, different and
 unrelated with the vCPU (hard and soft, see below) scheduling affinity,
@@ -42,15 +47,16 @@ it is very important to "place" the domain correctly when 
it is fist
 created, as the most of its memory is allocated at that time and can
 not (for now) be moved easily.
 
-### Placing via pinning and cpupools ###
+
+=head2 Placing via pinning and cpupools
 
 The simplest way of placing a domain on a NUMA node is setting the hard
 scheduling affinity of the domain's vCPUs to the pCPUs of the node. This
 also goes under the name of vCPU pinning, and can be done through the
 "cpus=" option in the config file (more about this below). Another option
 is to pool together the pCPUs spanning the node and put the domain in
-such a _cpupool_ with the "pool=" config option (as documented in our
-[Wiki][cpupools_howto]).
+such a I with the "pool=" config option (as documented in our
+Lhttp://wiki.xen.org/wiki/Cpupools_Howto>).
 
 In both the above cases, the domain will not be able to execute outside
 the specified set of pCPUs for any reasons, even if all those pCPUs are
@@ -59,7 +65,8 @@ busy doing something else while there are others, idle, pCPUs.
 So, when doing this, local memory accesses are 100% guaranteed, but that
 may come at he cost of some load imbalances.
 
-### NUMA aware scheduling ###
+
+=head2 NUMA aware scheduling
 
 If using the credit1 scheduler, and starting from Xen 4.3, the scheduler
 itself always tries to run the domain's vCPUs on one of the nodes in
@@ -87,21 +94,37 @@ workload.
 
 Notice that, for each vCPU, the following three scenarios are possbile:
 
-  * a vCPU *is pinned* to some pCPUs and *does not have* any soft affinity
-In this case, the vCPU is always scheduled on one of the pCPUs to which
-it is pinned, without any specific peference among them.
-  * a vCPU *has* its own soft affinity and *is not* pinned to any particular
-pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the
-scheduler will try to have it running on one of the pCPUs in its soft
-affinity;
-  * a vCPU *has* its own vCPU soft affinity and *is also* pinned to some
-pCPUs. In this case, the vCPU is always scheduled on one of the pCPUs
-onto which it is pinned, with, among them, a preference for the ones
-that also forms its soft affinity. In case pinning and soft affinity
-form two disjoint sets of pCPUs, pinning "wins", and the soft affinity
-is just ignored.
-
-## Guest placement in xl ##
+=over
+
+=item *
+
+a vCPU I to some pCPUs and I any soft affinity
+In this case, the vCPU is always scheduled on one of the pCPUs to which
+it is pinned, without any specific peference among them.
+
+
+=item *
+
+a vCPU I its own soft affinity and I pinned to any particular
+pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the
+scheduler will try to have it running on one of the pCPUs in its soft
+affinity;
+
+
+=item *
+
+a vCPU I its own vCPU soft affinity and I pinned 

[Xen-devel] [PATCH v2 1/3] docs: add pod variant of xen-pv-channel.7

2017-07-24 Thread Olaf Hering
Convert source for xen-pv-channel.7 from markdown to pod.
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.

Signed-off-by: Olaf Hering 
---
 docs/man/xen-pv-channel.markdown.7 | 106 -
 docs/man/xen-pv-channel.pod.7  | 189 +
 2 files changed, 189 insertions(+), 106 deletions(-)
 delete mode 100644 docs/man/xen-pv-channel.markdown.7
 create mode 100644 docs/man/xen-pv-channel.pod.7

diff --git a/docs/man/xen-pv-channel.markdown.7 
b/docs/man/xen-pv-channel.markdown.7
deleted file mode 100644
index 1c6149dae0..00
--- a/docs/man/xen-pv-channel.markdown.7
+++ /dev/null
@@ -1,106 +0,0 @@
-Xen PV Channels
-===
-
-A channel is a low-bandwidth private byte stream similar to a serial
-link. Typical uses of channels are
-
-  1. to provide initial configuration information to a VM on boot
- (example use: CloudStack's cloud-early-config service)
-  2. to signal/query an in-guest agent
- (example use: oVirt's guest agent)
-
-Channels are similar to virtio-serial devices and emulated serial links.
-Channels are intended to be used in the implementation of libvirt s
-when running on Xen.
-
-Note: if an application requires a high-bandwidth link then it should use
-vchan instead.
-
-How to use channels: an example

-
-Consider a cloud deployment where VMs are cloned from pre-made templates,
-and customised on first boot by an in-guest agent which sets the IP address,
-hostname, ssh keys etc. To install the system the cloud administrator would
-first:
-
-  1. Install a guest as normal (no channel configuration necessary)
-  2. Install the in-guest agent specific to the cloud software. This will
- prepare the guest to communicate over the channel, and also prepare
- the guest to be cloned safely (sometimes known as "sysprepping")
-  3. Shutdown the guest
-  4. Register the guest as a template with the cloud orchestration software
-  5. Install the cloud orchestration agent in dom0
-
-At runtime, when a cloud tenant requests that a VM is created from the 
template,
-the sequence of events would be: (assuming a Linux domU)
-
-  1. A VM is "cloned" from the template
-  2. A unique Unix domain socket path in dom0 is allocated
- (e.g. /my/cloud/software/talk/to/domain/)
-  3. Domain configuration is created for the VM, listing the channel
- name expected by the in-guest agent. In xl syntax this would be:
-
- channel = [ "connection=socket, name=org.my.cloud.software.agent.version1,
-  path = /my/cloud/software/talk/to/domain/" ]
-
-  4. The VM is started
-  5. In dom0 the cloud orchestration agent connects to the Unix domain
- socket, writes a handshake message and waits for a reply
-  6. Assuming the guest kernel has CONFIG_HVC_XEN_FRONTEND set then the console
- driver will generate a hotplug event
-  7. A udev rule is activated by the hotplug event.
-
- The udev rule would look something like:
-
- SUBSYSTEM=="xen", DEVPATH=="/devices/console-[0-9]", 
RUN+="xen-console-setup"
-
- where the "xen-console-setup" script would read the channel name and
- make a symlink in /dev/xen-channel/org.my.cloud.software.agent.version1
-
-  8. The in-guest agent uses inotify to see the creation of the 
/dev/xen-channel
- symlink and opens the device.
-  9. The in-guest agent completes the handshake with the dom0 agent
- 10. The dom0 agent transmits the unique VM configuration: hostname, IP
- address, ssh keys etc etc
- 11. The in-guest agent receives the configuration and applies it.
-
-Using channels avoids having to use a temporary disk device or network
-connection.
-
-Design recommendations and pitfalls

-
-It's necessary to install channel-specific software (an "agent") into the guest
-before you can use a channel. By default a channel will appear as a device
-which could be mistaken for a serial port or regular console. It is known
-that some software will proactively seek out serial ports and issue AT commands
-at them; make sure such software is disabled!
-
-Since channels are identified by names, application authors must ensure their
-channel names are unique to avoid clashes. We recommend that channel names
-include parts unique to the application such as a domain names. To assist
-prevent clashes we recommend authors add their names to our global channel
-registry at the end of this document.
-
-Limitations

-
-Hotplug and unplug of channels is not currently implemented.
-
-Channel name registry
--
-
-It is important that channel names are globally unique. To help ensure
-that no-one's name clashes with yours, please add yours to this list.
-
-Key:
-N: Name
-C: Contact
-  

[Xen-devel] [PATCH v2 0/6] docs: convert manpages to pod

2017-07-24 Thread Olaf Hering
To remove the buildtime dependency to pandoc/ghc some manpages are
converted from markdown to pod format. This will provide more manpages
which are referenced in xl(1) and xl.cfg(5).

This series does not cover xen-vbd-interface.7 because converting the
lists used in this manpage was not straight forward.

Olaf

v2:
 fold each add/remove into a single commit


Cc: Ian Jackson 
Cc: Wei Liu 
To: xen-devel@lists.xen.org

Olaf Hering (6):
  docs: add pod variant of xen-pv-channel.7
  docs: add pod variant of xl-network-configuration.5
  docs: add pod variant of xl-numa-placement
  docs: remove markdown variant of xen-pv-channel.7
  docs: remove markdown variant of xl-network-configuration.5
  docs: remove markdown variant of xl-numa-placement.7

 docs/man/xen-pv-channel.markdown.7 | 106 ---
 docs/man/xen-pv-channel.pod.7  | 189 
 ...n.markdown.5 => xl-network-configuration.pod.5} | 195 ++---
 ...lacement.markdown.7 => xl-numa-placement.pod.7} | 164 +++--
 4 files changed, 433 insertions(+), 221 deletions(-)
 delete mode 100644 docs/man/xen-pv-channel.markdown.7
 create mode 100644 docs/man/xen-pv-channel.pod.7
 rename docs/man/{xl-network-configuration.markdown.5 => 
xl-network-configuration.pod.5} (55%)
 rename docs/man/{xl-numa-placement.markdown.7 => xl-numa-placement.pod.7} (74%)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/3] docs: add pod variant of xl-network-configuration.5

2017-07-24 Thread Olaf Hering
Convert source for xl-network-configuration.5 from markdown to pod.
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.

Signed-off-by: Olaf Hering 
---
 ...n.markdown.5 => xl-network-configuration.pod.5} | 195 ++---
 1 file changed, 136 insertions(+), 59 deletions(-)
 rename docs/man/{xl-network-configuration.markdown.5 => 
xl-network-configuration.pod.5} (55%)

diff --git a/docs/man/xl-network-configuration.markdown.5 
b/docs/man/xl-network-configuration.pod.5
similarity index 55%
rename from docs/man/xl-network-configuration.markdown.5
rename to docs/man/xl-network-configuration.pod.5
index 84c2645ad8..9fa373e20d 100644
--- a/docs/man/xl-network-configuration.markdown.5
+++ b/docs/man/xl-network-configuration.pod.5
@@ -1,6 +1,10 @@
-# XL Network Configuration
+=encoding utf8
 
-## Syntax Overview
+
+=head1 XL Network Configuration
+
+
+=head2 Syntax Overview
 
 This document specifies the xl config file format vif configuration
 option.  It has the following form:
@@ -8,7 +12,7 @@ option.  It has the following form:
 vif = [ '', '', ... ]
 
 where each vifspec is in this form:
-
+
 [=|,]
 
 For example:
@@ -24,11 +28,13 @@ These might be specified in the domain config file like 
this:
 More formally, the string is a series of comma-separated keyword/value
 pairs. All keywords are optional.
 
-Each device has a `DEVID` which is its index within the vif list, starting 
from 0.
+Each device has a C which is its index within the vif list, starting 
from 0.
 
-## Keywords
 
-### mac
+=head2 Keywords
+
+
+=head2 mac
 
 If specified then this option specifies the MAC address inside the
 guest of this VIF device. The value is a 48-bit number represented as
@@ -36,89 +42,137 @@ six groups of two hexadecimal digits, separated by colons 
(:).
 
 The default if this keyword is not specified is to be automatically
 generate a MAC address inside the space assigned to Xen's
-[Organizationally Unique Identifier][oui] (00:16:3e).
+Lhttp://en.wikipedia.org/wiki/Organizationally_Unique_Identifier> 
(00:16:3e).
 
 If you are choosing a MAC address then it is strongly recommend to
 follow one of the following strategies:
 
-  * Generate a random sequence of 6 byte, set the locally administered
-bit (bit 2 of the first byte) and clear the multicast bit (bit 1
-of the first byte). In other words the first byte should have the
-bit pattern xx10 (where x is a randomly generated bit) and the
-remaining 5 bytes are randomly generated See
-[http://en.wikipedia.org/wiki/MAC_address] for more details the
-structure of a MAC address.
-  * Allocate an address from within the space defined by your
-organization's OUI (if you have one) following your organization's
-procedures for doing so.
-  * Allocate an address from within the space defined by Xen's OUI
-(00:16:3e). Taking care not to clash with other users of the
-physical network segment where this VIF will reside.
+=over
+
+=item *
+
+Generate a random sequence of 6 byte, set the locally administered
+bit (bit 2 of the first byte) and clear the multicast bit (bit 1
+of the first byte). In other words the first byte should have the
+bit pattern xx10 (where x is a randomly generated bit) and the
+remaining 5 bytes are randomly generated See
+[http://en.wikipedia.org/wiki/MAC_address] for more details the
+structure of a MAC address.
+
+
+=item *
+
+Allocate an address from within the space defined by your
+organization's OUI (if you have one) following your organization's
+procedures for doing so.
+
+
+=item *
+
+Allocate an address from within the space defined by Xen's OUI
+(00:16:3e). Taking care not to clash with other users of the
+physical network segment where this VIF will reside.
+
+
+=back
 
 If you have an OUI for your own use then that is the preferred
 strategy. Otherwise in general you should prefer to generate a random
 MAC and set the locally administered bit since this allows for more
 bits of randomness than using the Xen OUI.
 
-### bridge
+
+=head2 bridge
 
 Specifies the name of the network bridge which this VIF should be
-added to. The default is `xenbr0`. The bridge must be configured using
-your distribution's network configuration tools. See the [wiki][net]
+added to. The default is C. The bridge must be configured using
+your distribution's network configuration tools. See the 
Lhttp://wiki.xen.org/wiki/HostConfiguration/Networking>
 for guidance and examples.
 
-### gatewaydev
+
+=head2 gatewaydev
 
 Specifies the name of the network interface which has an IP and which
 is in the network the VIF should communicate with. This is used in the host
-by the vif-route hotplug script. See [wiki][vifroute] for guidance and
+by the vif-route hotplug script. See 
Lhttp://wiki.xen.org/wiki/Vif-route> for guidance and
 examples.

Re: [Xen-devel] [PATCH 0/6] docs: convert manpages to pod

2017-07-24 Thread Olaf Hering
On Mon, Jul 24, Ian Jackson wrote:

> * There are a lot of other documents in docs/misc/ which are in
> markdown format.  Some of them are internal.  I'm pretty sure we don't
> want them _all_ converted.  So even if you convert the manpages, these
> documents will remain.

I did not intent to change other files outside of docs/man/.
Just the references to non-existant manpages triggered this series.

Sometimes I wish that xen-command-line.5 exists, but google always
helped in such occasions.

> * It may be that there are other markdown processors which could be
> substituted for pandoc - either at runtime or by changing the Xen
> Project's default, upstream.

After a quick research there is a ruby "ronn" and go/ruby "md2man". Both
would have the same dependency issue. Perhaps ruby is less troublesome
because YaST is written in ruby.

> * Our markdown documents are, I think, intended to be plain text which
> can be simply shipped as-is.  So for things other than manpages you
> can probably just ship them as if they were text files.  If the end
> user wants to read them in a fancy format (eg HTML) they could install
> the relevant processor.

Yes. I have to see what HTML we ship. So far it did not cause trouble.

> * I don't understand why promoting GHC would be a problem.  But, in
> the worst case, rather than demoting Xen, you could simply not ship
> certain docs (although - see above about plain text).

The package ghc is in the tree since nearly 5 years, pandoc since 3
years. The hurdle is likely that a 4GB DVD is filled quickly. It is
always a fight to get everyone happy, and ghc is seen as leaf package.


Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/6] docs: convert manpages to pod

2017-07-24 Thread Olaf Hering
On Mon, Jul 24, Ian Jackson wrote:

> Olaf Hering writes ("[PATCH 0/6] docs: convert manpages to pod"):
> > To remove the buildtime dependency to pandoc/ghc some manpages are
> > converted from markdown to pod format. This will provide more manpages
> > which are referenced in xl(1) and xl.cfg(5).
> 
> Sorry to ask this at this stage, but: did I miss some discussion of
> why this was desirable ?

Likely yes: https://build.opensuse.org/request/show/511948
The point is: if all manpages need to be build then Xen needs to depend
on pandoc, which in turn depends on ghc. Neither of them is seen as a
"core" package, while "Xen" is a core package. Either ghc becomes a core
package, or Xen is moved out of core. In this context "core" means it is
part of a install DVD, if I understand the concept of "rings" correctly.


Do you see any downside of this series? There is currently a mix of pod
and markdown format for the manpages. This change gets it closer to have
them all as pod.


Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 6/6] docs: remove markdown variant of xl-numa-placement.7

2017-07-24 Thread Olaf Hering
A variant in pod format exists now.

Signed-off-by: Olaf Hering 
---
 docs/man/xl-numa-placement.markdown.7 | 239 --
 1 file changed, 239 deletions(-)
 delete mode 100644 docs/man/xl-numa-placement.markdown.7

diff --git a/docs/man/xl-numa-placement.markdown.7 
b/docs/man/xl-numa-placement.markdown.7
deleted file mode 100644
index f863492093..00
--- a/docs/man/xl-numa-placement.markdown.7
+++ /dev/null
@@ -1,239 +0,0 @@
-# Guest Automatic NUMA Placement in libxl and xl #
-
-## Rationale ##
-
-NUMA (which stands for Non-Uniform Memory Access) means that the memory
-accessing times of a program running on a CPU depends on the relative
-distance between that CPU and that memory. In fact, most of the NUMA
-systems are built in such a way that each processor has its local memory,
-on which it can operate very fast. On the other hand, getting and storing
-data from and on remote memory (that is, memory local to some other processor)
-is quite more complex and slow. On these machines, a NUMA node is usually
-defined as a set of processor cores (typically a physical CPU package) and
-the memory directly attached to the set of cores.
-
-NUMA awareness becomes very important as soon as many domains start
-running memory-intensive workloads on a shared host. In fact, the cost
-of accessing non node-local memory locations is very high, and the
-performance degradation is likely to be noticeable.
-
-For more information, have a look at the [Xen NUMA Introduction][numa_intro]
-page on the Wiki.
-
-## Xen and NUMA machines: the concept of _node-affinity_ ##
-
-The Xen hypervisor deals with NUMA machines throughout the concept of
-_node-affinity_. The node-affinity of a domain is the set of NUMA nodes
-of the host where the memory for the domain is being allocated (mostly,
-at domain creation time). This is, at least in principle, different and
-unrelated with the vCPU (hard and soft, see below) scheduling affinity,
-which instead is the set of pCPUs where the vCPU is allowed (or prefers)
-to run.
-
-Of course, despite the fact that they belong to and affect different
-subsystems, the domain node-affinity and the vCPUs affinity are not
-completely independent.
-In fact, if the domain node-affinity is not explicitly specified by the
-user, via the proper libxl calls or xl config item, it will be computed
-basing on the vCPUs' scheduling affinity.
-
-Notice that, even if the node affinity of a domain may change on-line,
-it is very important to "place" the domain correctly when it is fist
-created, as the most of its memory is allocated at that time and can
-not (for now) be moved easily.
-
-### Placing via pinning and cpupools ###
-
-The simplest way of placing a domain on a NUMA node is setting the hard
-scheduling affinity of the domain's vCPUs to the pCPUs of the node. This
-also goes under the name of vCPU pinning, and can be done through the
-"cpus=" option in the config file (more about this below). Another option
-is to pool together the pCPUs spanning the node and put the domain in
-such a _cpupool_ with the "pool=" config option (as documented in our
-[Wiki][cpupools_howto]).
-
-In both the above cases, the domain will not be able to execute outside
-the specified set of pCPUs for any reasons, even if all those pCPUs are
-busy doing something else while there are others, idle, pCPUs.
-
-So, when doing this, local memory accesses are 100% guaranteed, but that
-may come at he cost of some load imbalances.
-
-### NUMA aware scheduling ###
-
-If using the credit1 scheduler, and starting from Xen 4.3, the scheduler
-itself always tries to run the domain's vCPUs on one of the nodes in
-its node-affinity. Only if that turns out to be impossible, it will just
-pick any free pCPU. Locality of access is less guaranteed than in the
-pinning case, but that comes along with better chances to exploit all
-the host resources (e.g., the pCPUs).
-
-Starting from Xen 4.5, credit1 supports two forms of affinity: hard and
-soft, both on a per-vCPU basis. This means each vCPU can have its own
-soft affinity, stating where such vCPU prefers to execute on. This is
-less strict than what it (also starting from 4.5) is called hard affinity,
-as the vCPU can potentially run everywhere, it just prefers some pCPUs
-rather than others.
-In Xen 4.5, therefore, NUMA-aware scheduling is achieved by matching the
-soft affinity of the vCPUs of a domain with its node-affinity.
-
-In fact, as it was for 4.3, if all the pCPUs in a vCPU's soft affinity
-are busy, it is possible for the domain to run outside from there. The
-idea is that slower execution (due to remote memory accesses) is still
-better than no execution at all (as it would happen with pinning). For
-this reason, NUMA aware scheduling has the potential of bringing
-substantial performances benefits, although this will depend on the
-workload.
-
-Notice that, for each vCPU, the following three sc

[Xen-devel] [PATCH 3/6] docs: add pod variant of xl-numa-placement

2017-07-24 Thread Olaf Hering
Add source in pod format for xl-numa-placement.7
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.

Signed-off-by: Olaf Hering 
---
 docs/man/xl-numa-placement.pod.7 | 291 +++
 1 file changed, 291 insertions(+)
 create mode 100644 docs/man/xl-numa-placement.pod.7

diff --git a/docs/man/xl-numa-placement.pod.7 b/docs/man/xl-numa-placement.pod.7
new file mode 100644
index 00..5cad33be48
--- /dev/null
+++ b/docs/man/xl-numa-placement.pod.7
@@ -0,0 +1,291 @@
+=encoding utf8
+
+
+=head1 Guest Automatic NUMA Placement in libxl and xl
+
+
+=head2 Rationale
+
+NUMA (which stands for Non-Uniform Memory Access) means that the memory
+accessing times of a program running on a CPU depends on the relative
+distance between that CPU and that memory. In fact, most of the NUMA
+systems are built in such a way that each processor has its local memory,
+on which it can operate very fast. On the other hand, getting and storing
+data from and on remote memory (that is, memory local to some other processor)
+is quite more complex and slow. On these machines, a NUMA node is usually
+defined as a set of processor cores (typically a physical CPU package) and
+the memory directly attached to the set of cores.
+
+NUMA awareness becomes very important as soon as many domains start
+running memory-intensive workloads on a shared host. In fact, the cost
+of accessing non node-local memory locations is very high, and the
+performance degradation is likely to be noticeable.
+
+For more information, have a look at the Lhttp://wiki.xen.org/wiki/Xen_NUMA_Introduction>
+page on the Wiki.
+
+
+=head2 Xen and NUMA machines: the concept of I
+
+The Xen hypervisor deals with NUMA machines throughout the concept of
+I. The node-affinity of a domain is the set of NUMA nodes
+of the host where the memory for the domain is being allocated (mostly,
+at domain creation time). This is, at least in principle, different and
+unrelated with the vCPU (hard and soft, see below) scheduling affinity,
+which instead is the set of pCPUs where the vCPU is allowed (or prefers)
+to run.
+
+Of course, despite the fact that they belong to and affect different
+subsystems, the domain node-affinity and the vCPUs affinity are not
+completely independent.
+In fact, if the domain node-affinity is not explicitly specified by the
+user, via the proper libxl calls or xl config item, it will be computed
+basing on the vCPUs' scheduling affinity.
+
+Notice that, even if the node affinity of a domain may change on-line,
+it is very important to "place" the domain correctly when it is fist
+created, as the most of its memory is allocated at that time and can
+not (for now) be moved easily.
+
+
+=head2 Placing via pinning and cpupools
+
+The simplest way of placing a domain on a NUMA node is setting the hard
+scheduling affinity of the domain's vCPUs to the pCPUs of the node. This
+also goes under the name of vCPU pinning, and can be done through the
+"cpus=" option in the config file (more about this below). Another option
+is to pool together the pCPUs spanning the node and put the domain in
+such a I with the "pool=" config option (as documented in our
+Lhttp://wiki.xen.org/wiki/Cpupools_Howto>).
+
+In both the above cases, the domain will not be able to execute outside
+the specified set of pCPUs for any reasons, even if all those pCPUs are
+busy doing something else while there are others, idle, pCPUs.
+
+So, when doing this, local memory accesses are 100% guaranteed, but that
+may come at he cost of some load imbalances.
+
+
+=head2 NUMA aware scheduling
+
+If using the credit1 scheduler, and starting from Xen 4.3, the scheduler
+itself always tries to run the domain's vCPUs on one of the nodes in
+its node-affinity. Only if that turns out to be impossible, it will just
+pick any free pCPU. Locality of access is less guaranteed than in the
+pinning case, but that comes along with better chances to exploit all
+the host resources (e.g., the pCPUs).
+
+Starting from Xen 4.5, credit1 supports two forms of affinity: hard and
+soft, both on a per-vCPU basis. This means each vCPU can have its own
+soft affinity, stating where such vCPU prefers to execute on. This is
+less strict than what it (also starting from 4.5) is called hard affinity,
+as the vCPU can potentially run everywhere, it just prefers some pCPUs
+rather than others.
+In Xen 4.5, therefore, NUMA-aware scheduling is achieved by matching the
+soft affinity of the vCPUs of a domain with its node-affinity.
+
+In fact, as it was for 4.3, if all the pCPUs in a vCPU's soft affinity
+are busy, it is possible for the domain to run outside from there. The
+idea is that slower execution (due to remote memory accesses) is still
+better than no execution at all (as it would happen with pinning). For
+this reason, NUMA aware scheduling has the 

[Xen-devel] [PATCH 4/6] docs: remove markdown variant of xen-pv-channel.7

2017-07-24 Thread Olaf Hering
A variant in pod format exists now.

Signed-off-by: Olaf Hering 
---
 docs/man/xen-pv-channel.markdown.7 | 106 -
 1 file changed, 106 deletions(-)
 delete mode 100644 docs/man/xen-pv-channel.markdown.7

diff --git a/docs/man/xen-pv-channel.markdown.7 
b/docs/man/xen-pv-channel.markdown.7
deleted file mode 100644
index 1c6149dae0..00
--- a/docs/man/xen-pv-channel.markdown.7
+++ /dev/null
@@ -1,106 +0,0 @@
-Xen PV Channels
-===
-
-A channel is a low-bandwidth private byte stream similar to a serial
-link. Typical uses of channels are
-
-  1. to provide initial configuration information to a VM on boot
- (example use: CloudStack's cloud-early-config service)
-  2. to signal/query an in-guest agent
- (example use: oVirt's guest agent)
-
-Channels are similar to virtio-serial devices and emulated serial links.
-Channels are intended to be used in the implementation of libvirt s
-when running on Xen.
-
-Note: if an application requires a high-bandwidth link then it should use
-vchan instead.
-
-How to use channels: an example

-
-Consider a cloud deployment where VMs are cloned from pre-made templates,
-and customised on first boot by an in-guest agent which sets the IP address,
-hostname, ssh keys etc. To install the system the cloud administrator would
-first:
-
-  1. Install a guest as normal (no channel configuration necessary)
-  2. Install the in-guest agent specific to the cloud software. This will
- prepare the guest to communicate over the channel, and also prepare
- the guest to be cloned safely (sometimes known as "sysprepping")
-  3. Shutdown the guest
-  4. Register the guest as a template with the cloud orchestration software
-  5. Install the cloud orchestration agent in dom0
-
-At runtime, when a cloud tenant requests that a VM is created from the 
template,
-the sequence of events would be: (assuming a Linux domU)
-
-  1. A VM is "cloned" from the template
-  2. A unique Unix domain socket path in dom0 is allocated
- (e.g. /my/cloud/software/talk/to/domain/)
-  3. Domain configuration is created for the VM, listing the channel
- name expected by the in-guest agent. In xl syntax this would be:
-
- channel = [ "connection=socket, name=org.my.cloud.software.agent.version1,
-  path = /my/cloud/software/talk/to/domain/" ]
-
-  4. The VM is started
-  5. In dom0 the cloud orchestration agent connects to the Unix domain
- socket, writes a handshake message and waits for a reply
-  6. Assuming the guest kernel has CONFIG_HVC_XEN_FRONTEND set then the console
- driver will generate a hotplug event
-  7. A udev rule is activated by the hotplug event.
-
- The udev rule would look something like:
-
- SUBSYSTEM=="xen", DEVPATH=="/devices/console-[0-9]", 
RUN+="xen-console-setup"
-
- where the "xen-console-setup" script would read the channel name and
- make a symlink in /dev/xen-channel/org.my.cloud.software.agent.version1
-
-  8. The in-guest agent uses inotify to see the creation of the 
/dev/xen-channel
- symlink and opens the device.
-  9. The in-guest agent completes the handshake with the dom0 agent
- 10. The dom0 agent transmits the unique VM configuration: hostname, IP
- address, ssh keys etc etc
- 11. The in-guest agent receives the configuration and applies it.
-
-Using channels avoids having to use a temporary disk device or network
-connection.
-
-Design recommendations and pitfalls

-
-It's necessary to install channel-specific software (an "agent") into the guest
-before you can use a channel. By default a channel will appear as a device
-which could be mistaken for a serial port or regular console. It is known
-that some software will proactively seek out serial ports and issue AT commands
-at them; make sure such software is disabled!
-
-Since channels are identified by names, application authors must ensure their
-channel names are unique to avoid clashes. We recommend that channel names
-include parts unique to the application such as a domain names. To assist
-prevent clashes we recommend authors add their names to our global channel
-registry at the end of this document.
-
-Limitations

-
-Hotplug and unplug of channels is not currently implemented.
-
-Channel name registry
--
-
-It is important that channel names are globally unique. To help ensure
-that no-one's name clashes with yours, please add yours to this list.
-
-Key:
-N: Name
-C: Contact
-D: Short description of use, possibly including a URL to your software
-   or API
-
-N: org.xenproject.guest.clipboard.0.1
-C: David Scott 
-D: Share clipboard data via an in-guest agent. See:
-   http://wiki.xenproject.org/wiki/Clipboard_sharing_protocol


[Xen-devel] [PATCH 5/6] docs: remove markdown variant of xl-network-configuration.5

2017-07-24 Thread Olaf Hering
A variant in pod format exists now.

Signed-off-by: Olaf Hering 
---
 docs/man/xl-network-configuration.markdown.5 | 173 ---
 1 file changed, 173 deletions(-)
 delete mode 100644 docs/man/xl-network-configuration.markdown.5

diff --git a/docs/man/xl-network-configuration.markdown.5 
b/docs/man/xl-network-configuration.markdown.5
deleted file mode 100644
index 84c2645ad8..00
--- a/docs/man/xl-network-configuration.markdown.5
+++ /dev/null
@@ -1,173 +0,0 @@
-# XL Network Configuration
-
-## Syntax Overview
-
-This document specifies the xl config file format vif configuration
-option.  It has the following form:
-
-vif = [ '', '', ... ]
-
-where each vifspec is in this form:
-
-[=|,]
-
-For example:
-
-'mac=00:16:3E:74:3d:76,model=rtl8139,bridge=xenbr0'
-'mac=00:16:3E:74:34:32'
-'' # The empty string
-
-These might be specified in the domain config file like this:
-
-vif = [ 'mac=00:16:3E:74:34:32', 'mac=00:16:3e:5f:48:e4,bridge=xenbr1' 
]
-
-More formally, the string is a series of comma-separated keyword/value
-pairs. All keywords are optional.
-
-Each device has a `DEVID` which is its index within the vif list, starting 
from 0.
-
-## Keywords
-
-### mac
-
-If specified then this option specifies the MAC address inside the
-guest of this VIF device. The value is a 48-bit number represented as
-six groups of two hexadecimal digits, separated by colons (:).
-
-The default if this keyword is not specified is to be automatically
-generate a MAC address inside the space assigned to Xen's
-[Organizationally Unique Identifier][oui] (00:16:3e).
-
-If you are choosing a MAC address then it is strongly recommend to
-follow one of the following strategies:
-
-  * Generate a random sequence of 6 byte, set the locally administered
-bit (bit 2 of the first byte) and clear the multicast bit (bit 1
-of the first byte). In other words the first byte should have the
-bit pattern xx10 (where x is a randomly generated bit) and the
-remaining 5 bytes are randomly generated See
-[http://en.wikipedia.org/wiki/MAC_address] for more details the
-structure of a MAC address.
-  * Allocate an address from within the space defined by your
-organization's OUI (if you have one) following your organization's
-procedures for doing so.
-  * Allocate an address from within the space defined by Xen's OUI
-(00:16:3e). Taking care not to clash with other users of the
-physical network segment where this VIF will reside.
-
-If you have an OUI for your own use then that is the preferred
-strategy. Otherwise in general you should prefer to generate a random
-MAC and set the locally administered bit since this allows for more
-bits of randomness than using the Xen OUI.
-
-### bridge
-
-Specifies the name of the network bridge which this VIF should be
-added to. The default is `xenbr0`. The bridge must be configured using
-your distribution's network configuration tools. See the [wiki][net]
-for guidance and examples.
-
-### gatewaydev
-
-Specifies the name of the network interface which has an IP and which
-is in the network the VIF should communicate with. This is used in the host
-by the vif-route hotplug script. See [wiki][vifroute] for guidance and
-examples.
-
-NOTE: netdev is a deprecated alias of this option.
-
-### type
-
-This keyword is valid for HVM guests only.
-
-Specifies the type of device to valid values are:
-
-  * `ioemu` (default) -- this device will be provided as an emulate
-device to the guest and also as a paravirtualised device which the
-guest may choose to use instead if it has suitable drivers
-available.
-  * `vif` -- this device will be provided as a paravirtualised device
-only.
-
-### model
-
-This keyword is valid for HVM guest devices with `type=ioemu` only.
-
-Specifies the type device to emulated for this guest. Valid values
-are:
-
-  * `rtl8139` (default) -- Realtek RTL8139
-  * `e1000` -- Intel E1000 
-  * in principle any device supported by your device model
-
-### vifname
-
-Specifies the backend device name for the virtual device.
-
-If the domain is an HVM domain then the associated emulated (tap)
-device will have a "-emu" suffice added.
-
-The default name for the virtual device is `vifDOMID.DEVID` where
-`DOMID` is the guest domain ID and `DEVID` is the device
-number. Likewise the default tap name is `vifDOMID.DEVID-emu`.
-
-### script
-
-Specifies the hotplug script to run to configure this device (e.g. to
-add it to the relevant bridge). Defaults to
-`XEN_SCRIPT_DIR/vif-bridge` but can be set to any script. Some example
-scripts are installed in `XEN_SCRIPT_DIR`.
-
-### ip
-
-Specifies the IP address for the device, the default is not to
-specify an IP address.
-
-What, if any, effect this has depends on the hotplug script which is
-configured. A typic

[Xen-devel] [PATCH 1/6] docs: add pod variant of xen-pv-channel.7

2017-07-24 Thread Olaf Hering
Add source in pod format for xen-pv-channel.7
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.

Signed-off-by: Olaf Hering 
---
 docs/man/xen-pv-channel.pod.7 | 189 ++
 1 file changed, 189 insertions(+)
 create mode 100644 docs/man/xen-pv-channel.pod.7

diff --git a/docs/man/xen-pv-channel.pod.7 b/docs/man/xen-pv-channel.pod.7
new file mode 100644
index 00..8b0b74aa27
--- /dev/null
+++ b/docs/man/xen-pv-channel.pod.7
@@ -0,0 +1,189 @@
+=encoding utf8
+
+
+=head1 Xen PV Channels
+
+A channel is a low-bandwidth private byte stream similar to a serial
+link. Typical uses of channels are
+
+=over
+
+=item 1.
+
+to provide initial configuration information to a VM on boot
+ (example use: CloudStack's cloud-early-config service)
+
+
+=item 2.
+
+to signal/query an in-guest agent
+ (example use: oVirt's guest agent)
+
+
+=back
+
+Channels are similar to virtio-serial devices and emulated serial links.
+Channels are intended to be used in the implementation of libvirt s
+when running on Xen.
+
+Note: if an application requires a high-bandwidth link then it should use
+vchan instead.
+
+
+=head2 How to use channels: an example
+
+Consider a cloud deployment where VMs are cloned from pre-made templates,
+and customised on first boot by an in-guest agent which sets the IP address,
+hostname, ssh keys etc. To install the system the cloud administrator would
+first:
+
+=over
+
+=item 1.
+
+Install a guest as normal (no channel configuration necessary)
+
+
+=item 2.
+
+Install the in-guest agent specific to the cloud software. This will
+ prepare the guest to communicate over the channel, and also prepare
+ the guest to be cloned safely (sometimes known as "sysprepping")
+
+
+=item 3.
+
+Shutdown the guest
+
+
+=item 4.
+
+Register the guest as a template with the cloud orchestration software
+
+
+=item 5.
+
+Install the cloud orchestration agent in dom0
+
+
+=back
+
+At runtime, when a cloud tenant requests that a VM is created from the 
template,
+the sequence of events would be: (assuming a Linux domU)
+
+=over
+
+=item 1.
+
+A VM is "cloned" from the template
+
+
+=item 2.
+
+A unique Unix domain socket path in dom0 is allocated
+ (e.g. /my/cloud/software/talk/to/domain/)
+
+
+=item 3.
+
+Domain configuration is created for the VM, listing the channel
+ name expected by the in-guest agent. In xl syntax this would be:
+
+ channel = [ "connection=socket, name=org.my.cloud.software.agent.version1,
+  path = /my/cloud/software/talk/to/domain/" ]
+
+
+
+=item 4.
+
+The VM is started
+
+
+=item 5.
+
+In dom0 the cloud orchestration agent connects to the Unix domain
+ socket, writes a handshake message and waits for a reply
+
+
+=item 6.
+
+Assuming the guest kernel has CONFIGIXEN_FRONTEND set then the console
+ driver will generate a hotplug event
+
+
+=item 7.
+
+A udev rule is activated by the hotplug event.
+
+ The udev rule would look something like:
+
+ SUBSYSTEM=="xen", DEVPATH=="/devices/console-[0-9]", RUN+="xen-console-setup"
+
+ where the "xen-console-setup" script would read the channel name and
+ make a symlink in /dev/xen-channel/org.my.cloud.software.agent.version1
+
+
+
+=item 8.
+
+The in-guest agent uses inotify to see the creation of the /dev/xen-channel
+ symlink and opens the device.
+
+
+=item 9.
+
+The in-guest agent completes the handshake with the dom0 agent
+
+
+=item 10.
+
+The dom0 agent transmits the unique VM configuration: hostname, IP
+ address, ssh keys etc etc
+
+
+=item 11.
+
+The in-guest agent receives the configuration and applies it.
+
+
+=back
+
+Using channels avoids having to use a temporary disk device or network
+connection.
+
+
+=head2 Design recommendations and pitfalls
+
+It's necessary to install channel-specific software (an "agent") into the guest
+before you can use a channel. By default a channel will appear as a device
+which could be mistaken for a serial port or regular console. It is known
+that some software will proactively seek out serial ports and issue AT commands
+at them; make sure such software is disabled!
+
+Since channels are identified by names, application authors must ensure their
+channel names are unique to avoid clashes. We recommend that channel names
+include parts unique to the application such as a domain names. To assist
+prevent clashes we recommend authors add their names to our global channel
+registry at the end of this document.
+
+
+=head2 Limitations
+
+Hotplug and unplug of channels is not currently implemented.
+
+
+=head2 Channel name registry
+
+It is important that channel names are globally unique. To help ensure
+that no-one's name clashes with yours, please add yours to this list.
+
+Key:
+N: Name
+C: Contact
+D: Short description of use, possibly including a URL to your softwar

[Xen-devel] [PATCH 2/6] docs: add pod variant of xl-network-configuration.5

2017-07-24 Thread Olaf Hering
Add source in pod format for xl-network-configuration.5
This removes the buildtime requirement for pandoc, and subsequently the
need for ghc, in the chain for BuildRequires of xen.rpm.

Signed-off-by: Olaf Hering 
---
 docs/man/xl-network-configuration.pod.5 | 250 
 1 file changed, 250 insertions(+)
 create mode 100644 docs/man/xl-network-configuration.pod.5

diff --git a/docs/man/xl-network-configuration.pod.5 
b/docs/man/xl-network-configuration.pod.5
new file mode 100644
index 00..9fa373e20d
--- /dev/null
+++ b/docs/man/xl-network-configuration.pod.5
@@ -0,0 +1,250 @@
+=encoding utf8
+
+
+=head1 XL Network Configuration
+
+
+=head2 Syntax Overview
+
+This document specifies the xl config file format vif configuration
+option.  It has the following form:
+
+vif = [ '', '', ... ]
+
+where each vifspec is in this form:
+
+[=|,]
+
+For example:
+
+'mac=00:16:3E:74:3d:76,model=rtl8139,bridge=xenbr0'
+'mac=00:16:3E:74:34:32'
+'' # The empty string
+
+These might be specified in the domain config file like this:
+
+vif = [ 'mac=00:16:3E:74:34:32', 'mac=00:16:3e:5f:48:e4,bridge=xenbr1' 
]
+
+More formally, the string is a series of comma-separated keyword/value
+pairs. All keywords are optional.
+
+Each device has a C which is its index within the vif list, starting 
from 0.
+
+
+=head2 Keywords
+
+
+=head2 mac
+
+If specified then this option specifies the MAC address inside the
+guest of this VIF device. The value is a 48-bit number represented as
+six groups of two hexadecimal digits, separated by colons (:).
+
+The default if this keyword is not specified is to be automatically
+generate a MAC address inside the space assigned to Xen's
+Lhttp://en.wikipedia.org/wiki/Organizationally_Unique_Identifier> 
(00:16:3e).
+
+If you are choosing a MAC address then it is strongly recommend to
+follow one of the following strategies:
+
+=over
+
+=item *
+
+Generate a random sequence of 6 byte, set the locally administered
+bit (bit 2 of the first byte) and clear the multicast bit (bit 1
+of the first byte). In other words the first byte should have the
+bit pattern xx10 (where x is a randomly generated bit) and the
+remaining 5 bytes are randomly generated See
+[http://en.wikipedia.org/wiki/MAC_address] for more details the
+structure of a MAC address.
+
+
+=item *
+
+Allocate an address from within the space defined by your
+organization's OUI (if you have one) following your organization's
+procedures for doing so.
+
+
+=item *
+
+Allocate an address from within the space defined by Xen's OUI
+(00:16:3e). Taking care not to clash with other users of the
+physical network segment where this VIF will reside.
+
+
+=back
+
+If you have an OUI for your own use then that is the preferred
+strategy. Otherwise in general you should prefer to generate a random
+MAC and set the locally administered bit since this allows for more
+bits of randomness than using the Xen OUI.
+
+
+=head2 bridge
+
+Specifies the name of the network bridge which this VIF should be
+added to. The default is C. The bridge must be configured using
+your distribution's network configuration tools. See the 
Lhttp://wiki.xen.org/wiki/HostConfiguration/Networking>
+for guidance and examples.
+
+
+=head2 gatewaydev
+
+Specifies the name of the network interface which has an IP and which
+is in the network the VIF should communicate with. This is used in the host
+by the vif-route hotplug script. See 
Lhttp://wiki.xen.org/wiki/Vif-route> for guidance and
+examples.
+
+NOTE: netdev is a deprecated alias of this option.
+
+
+=head2 type
+
+This keyword is valid for HVM guests only.
+
+Specifies the type of device to valid values are:
+
+=over
+
+=item *
+
+C (default) -- this device will be provided as an emulate
+device to the guest and also as a paravirtualised device which the
+guest may choose to use instead if it has suitable drivers
+available.
+
+
+=item *
+
+C -- this device will be provided as a paravirtualised device
+only.
+
+
+=back
+
+
+=head2 model
+
+This keyword is valid for HVM guest devices with C only.
+
+Specifies the type device to emulated for this guest. Valid values
+are:
+
+=over
+
+=item *
+
+C (default) -- Realtek RTL8139
+
+
+=item *
+
+C -- Intel E1000 
+
+
+=item *
+
+in principle any device supported by your device model
+
+
+=back
+
+
+=head2 vifname
+
+Specifies the backend device name for the virtual device.
+
+If the domain is an HVM domain then the associated emulated (tap)
+device will have a "-emu" suffice added.
+
+The default name for the virtual device is C where
+C is the guest domain ID and C is the device
+number. Likewise the default tap name is C.
+
+
+=head2 script
+
+Specifies the hotplug script to run to configure this device (e.g. to
+add it to the relevant bridge). Defaults to
+C but can be set to an

[Xen-devel] [PATCH 0/6] docs: convert manpages to pod

2017-07-24 Thread Olaf Hering
To remove the buildtime dependency to pandoc/ghc some manpages are
converted from markdown to pod format. This will provide more manpages
which are referenced in xl(1) and xl.cfg(5).

This series does not cover xen-vbd-interface.7 because converting the
lists used in this manpage was not straight forward.

Olaf

Cc: Ian Jackson 
Cc: Wei Liu 
To: xen-devel@lists.xen.org

Olaf Hering (6):
  docs: add pod variant of xen-pv-channel.7
  docs: add pod variant of xl-network-configuration.5
  docs: add pod variant of xl-numa-placement
  docs: remove markdown variant of xen-pv-channel.7
  docs: remove markdown variant of xl-network-configuration.5
  docs: remove markdown variant of xl-numa-placement.7

 docs/man/xen-pv-channel.markdown.7 | 106 ---
 docs/man/xen-pv-channel.pod.7  | 189 
 ...n.markdown.5 => xl-network-configuration.pod.5} | 195 ++---
 ...lacement.markdown.7 => xl-numa-placement.pod.7} | 164 +++--
 4 files changed, 433 insertions(+), 221 deletions(-)
 delete mode 100644 docs/man/xen-pv-channel.markdown.7
 create mode 100644 docs/man/xen-pv-channel.pod.7
 rename docs/man/{xl-network-configuration.markdown.5 => 
xl-network-configuration.pod.5} (55%)
 rename docs/man/{xl-numa-placement.markdown.7 => xl-numa-placement.pod.7} (74%)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.9] docs: replace xm with xl in xen-tscmode [and 1 more messages]

2017-07-24 Thread Olaf Hering
On Thu, May 25, Julien Grall wrote:

> Hi Ian,
> 
> On 24/05/2017 12:07, Ian Jackson wrote:
> > Olaf Hering writes ("[PATCH] docs: replace xm with xl in xen-tscmode"):
> > > Signed-off-by: Olaf Hering 
> > Olaf Hering writes ("[PATCH] docs: correct paragraph indention in 
> > xen-tscmode"):
> > > Signed-off-by: Olaf Hering 
> > Both:
> > Acked-by: Ian Jackson 
> > 
> > I think these good for 4.9 and are covered by Julien's exception for
> > docs.  So Wei or I will commit them soon.
> Yes that's correct.

Both missed the 4.9 release. Please apply now.


Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] API to query NUMA node of mfn

2017-07-10 Thread Olaf Hering
On Mon, Jul 10, Konrad Rzeszutek Wilk wrote:

> Soo I wrote some code for exactly this for Xen 4.4.4 , along with
> creation of a PGM map to see the NUMA nodes locality.

Are you planning to prepare that for staging at some point? I have not
checked this series is already merged.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] API to query NUMA node of mfn

2017-07-10 Thread Olaf Hering
I would like to verify on which NUMA node the PFNs used by a HVM guest
are located. Is there an API for that? Something like:

  foreach (pfn, domid)
mfns_per_node[pfn_to_node(pfn)]++
  foreach (node)
printk("%x %x\n", node, mfns_per_node[node])

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] time does not move forward in HVM guests

2017-07-05 Thread Olaf Hering
Am Wed, 05 Jul 2017 02:14:23 -0600
schrieb "Jan Beulich" :

> Oh, even for HVM. Doesn't that go back to the missing vDSO
> support then again, which we had discussed just last week?

Yes. This is part of it. With clocksource=tsc there is a performance boost 
because vdso is used. With clocksource=hpet there is a performance drop to 20%, 
depending on the workload, due to the emulation of it (I guess).

Olaf


pgpUQf64XmILx.pgp
Description: Digitale Signatur von OpenPGP
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] time does not move forward in HVM guests

2017-07-05 Thread Olaf Hering
On Wed, Jul 05, Jan Beulich wrote:

> > clock_getres(CLOCK_MONOTONIC) indicates a resolution of 1ns.
> But what's the implied meaning of resolution here? See below.

I have no ide what the returned value is supposed to promise.

> Or did you perhaps test with an older version, where the time
> handling backports from master hadn't been there yet?

It was weeks ago, and I have not seen it since then. I think it is fixed
in one way or another.

> > A workaround is booting the domU kernel with 'clocksource=tsc nohz=off 
> > highres=off'.
> What clocksource does the system use by default? HPET?

HPET would be really really slow. The default clocksource is "xen".

> According to what the hypervisor tells the guest, vHPET
> resolution is 16ns. That still wouldn't explain a steady value
> over a period of 100ns, but it's at least a hint that what the
> kernel tells you may not be what underlying (virtual)
> hardware reports.

If clocksource=xen relies on the hypervisor, perhaps the kernel should
be aware of it in some way. So far I have not checked where clock_getres
gets its data.


> Additionally - are all three options indeed required to work
> around this, i.e. no pair out of the three is enough?

Yes, otherwise the kernel would complain, forgot the exact error
message.

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] valgrind support for xen4.7+

2017-07-05 Thread Olaf Hering
On Wed, Apr 12, Glenn Enright wrote:

> Has anyone seen or been working on patches for valgrind for recent versions
> of xen?

Upstream requires paperwork, via kde.org bugzilla. This is my variant,
which is enough to run 'xl create' with valgrind.

Olaf

--- coregrind/m_syswrap/syswrap-xen.c.orig
+++ coregrind/m_syswrap/syswrap-xen.c
@@ -584,6 +584,8 @@ PRE(sysctl) {
case 0x0009:
case 0x000a:
case 0x000b:
+   case 0x000c:
+   case 0x000d:
   break;
default:
   bad_intf_version(tid, layout, arrghs, status, flags,
@@ -626,6 +628,8 @@ PRE(sysctl) {
 break;
   case 0x000a:
   case 0x000b:
+  case 0x000c:
+  case 0x000d:
 PRE_XEN_SYSCTL_READ(getdomaininfolist_000a, first_domain);
 PRE_XEN_SYSCTL_READ(getdomaininfolist_000a, max_domains);
 PRE_XEN_SYSCTL_READ(getdomaininfolist_000a, buffer);
@@ -728,6 +732,9 @@ PRE(domctl)
case 0x0008:
case 0x0009:
case 0x000a:
+   case 0x000b:
+   case 0x000c:
+   case 0x000d:
   break;
default:
   bad_intf_version(tid, layout, arrghs, status, flags,
@@ -1534,6 +1541,8 @@ POST(sysctl)
case 0x0009:
case 0x000a:
case 0x000b:
+   case 0x000c:
+   case 0x000d:
   break;
default:
   return;
@@ -1568,6 +1577,8 @@ POST(sysctl)
 break;
   case 0x000a:
   case 0x000b:
+  case 0x000c:
+  case 0x000d:
 POST_XEN_SYSCTL_WRITE(getdomaininfolist_000a, num_domains);
 POST_MEM_WRITE((Addr)sysctl->u.getdomaininfolist_000a.buffer.p,
sizeof(*sysctl->u.getdomaininfolist_000a.buffer.p)

Olaf


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] time does not move forward in HVM guests

2017-07-04 Thread Olaf Hering
In my testing with sysbench in a HVM domU running a linux-4.4 based
pvops kernel on a xen-4.7 based dom0 the time does not move forward
properly:

There (URL below) is basically code like this:
  clock_gettime(CLOCK_MONOTONIC, a)
  do_work
  clock_gettime(CLOCK_MONOTONIC, b)
  diff_time(a,b)

All 'do_work' does is writing zeros to a block of memory.
clock_getres(CLOCK_MONOTONIC) indicates a resolution of 1ns.
If 'do_work' takes like 100ns or less: a==b. I think this is something
that should not happen. In case of vcpu overcommit this happens also
when 'do_work' takes around 800ns. At some point I have also seen cases
of time going backward. I can not reproduce this anymore, might have
been bugs in my code or the domU.cfg changed.

A workaround is booting the domU kernel with 'clocksource=tsc nohz=off 
highres=off'.

Why does this happen? Are the expectations too high?


Olaf


 https://github.com/olafhering/sysbench/compare/master...pv
 bash autogen.sh
 make -j
 bash mem.1K.on.sh


signature.asc
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


  1   2   3   4   5   6   7   8   9   >