Re: [Xen-devel] [PATCH v1] tools/hotplug: convert proc-xen.mount to proc-xen.service
On Wed, Nov 08, Wei Liu wrote: > But is there really no way to ask nicely to see if systemd would accept > a change in behaviour? That is, to make proc-xen.mount (or any attempt > to mount API fs) a nop when xenfs is added to API file system. I have considered that as well. If the failing unit is "proc-xen.mount" and /proc/xen exists, just ignore the error. I will check if and how that can be done. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1] tools/hotplug: convert proc-xen.mount to proc-xen.service
On Thu, Oct 26, Olaf Hering wrote: > > If not, then out-of-tree packages are going to have compatibility > > problems with this change. > Only if they use Requires=proc-xen.mount. Any other objections to this change? How to proceed with this? Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1] tools/hotplug: convert proc-xen.mount to proc-xen.service
On Thu, Oct 26, Andrew Cooper wrote: > I've never really understood why xenfs exists in the first place > (although I expect the answer is "Because that is how someone did it in > the past"), and I'm not aware of any other project which needs its own > custom filesystem driver for device nodes. Perhaps in the early days, before udev, new nodes would not magically appear in /dev. It was likely easy to be compatible that way, just like claiming /dev/hda to please existing installation programs. > Is it possible to express a dependency on proc-xen.mount || > proc-xen.service? As ordering yes, an additional After=proc-xen.service line is needed. An existing Requires=proc-xen.mount can not be used anymore, I have not verified that. > If not, then out-of-tree packages are going to have compatibility > problems with this change. Only if they use Requires=proc-xen.mount. > Right, but ISTR that Systemd deals with /etc/fstab by auto-generating > *.mount targets, and from what is said here, it is the proc-xen.mount > target which is now broken by the change in systemd behaviour. No, existing fstab entries will continue to work. /dev/shm is automounted, and my own fstab entry for /dev/shm always worked. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1] tools/hotplug: convert proc-xen.mount to proc-xen.service
On Thu, Oct 26, Andrew Cooper wrote: > Can't all information be obtained from /sys/hypervisor? If not, how > hard would it be to make happen? Likely not that hard. Not sure why that was not added in the first place. > What happens to all the software which currently has a dependency on > proc-xen.mount ? All software gets converted by this change. > Independently, how does this interact with having a xenfs entries in > /etc/fstab, which might plausibly still exist for compatibility with > other init systems? mount(1) will continue to consider them. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1] tools/hotplug: convert proc-xen.mount to proc-xen.service
An upcoming change in systemd will mount xenfs right away, along with all other system mounts. This improves the detection of the virtualization environment, which is currently racy. Some parts of systemd rely on the presence of /proc/xen/capabilities, which will only exist if xenfs is mounted. Since xenfs is mounted by the proc-xen.mount unit, it will be processed very late. Other units may be processed earlier, and if they make use of ConditionVirtualization*= failures may occour. Unfortunately mounting xenfs by systemd as an API filesystem will lead to errors when proc-xen.mount is processed. Since that mount point already exists the unit is considered as failed, and other units that depend on proc-xen.mount will not start. To avoid this the existing proc-xen.mount will be converted into proc-xen.service, which just mounts xenfs manually. All dependencies are updated by this change. The existing conditionals in proc-xen.mount will prevent failures with existing systemd based installations: ConditionPathExists=!/proc/xen/capabilities will prevent execution with a new systemd that mounts xenfs. And this conditional, in combination with ConditionPathExists=/proc/xen, will trigger execution with an old systemd. An absolute path to the mount binary has to be used. /bin/mount is expected to be universally available, nowaways it is a symlink to /usr/bin/mount. Signed-off-by: Olaf Hering --- based on 4.10.0-rc2 Please run autogen.sh: tools/configure.ac| 2 +- tools/hotplug/Linux/systemd/Makefile | 6 +++--- .../Linux/systemd/{proc-xen.mount.in => proc-xen.service.in} | 8 tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in| 4 ++-- tools/hotplug/Linux/systemd/xen-init-dom0.service.in | 4 ++-- tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in | 4 ++-- tools/hotplug/Linux/systemd/xen-watchdog.service.in | 4 ++-- tools/hotplug/Linux/systemd/xenconsoled.service.in| 4 ++-- tools/hotplug/Linux/systemd/xendomains.service.in | 4 ++-- tools/hotplug/Linux/systemd/xendriverdomain.service.in| 4 ++-- tools/hotplug/Linux/systemd/xenstored.service.in | 6 +++--- 11 files changed, 25 insertions(+), 25 deletions(-) rename tools/hotplug/Linux/systemd/{proc-xen.mount.in => proc-xen.service.in} (60%) diff --git a/tools/configure.ac b/tools/configure.ac index d1a3a78d87..7b18421fa0 100644 --- a/tools/configure.ac +++ b/tools/configure.ac @@ -441,7 +441,7 @@ AX_AVAILABLE_SYSTEMD() AS_IF([test "x$systemd" = "xy"], [ AC_CONFIG_FILES([ -hotplug/Linux/systemd/proc-xen.mount +hotplug/Linux/systemd/proc-xen.service hotplug/Linux/systemd/var-lib-xenstored.mount hotplug/Linux/systemd/xen-init-dom0.service hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service diff --git a/tools/hotplug/Linux/systemd/Makefile b/tools/hotplug/Linux/systemd/Makefile index a5d41d86ef..855ff3747f 100644 --- a/tools/hotplug/Linux/systemd/Makefile +++ b/tools/hotplug/Linux/systemd/Makefile @@ -3,10 +3,10 @@ include $(XEN_ROOT)/tools/Rules.mk XEN_SYSTEMD_MODULES = xen.conf -XEN_SYSTEMD_MOUNT = proc-xen.mount -XEN_SYSTEMD_MOUNT += var-lib-xenstored.mount +XEN_SYSTEMD_MOUNT = var-lib-xenstored.mount -XEN_SYSTEMD_SERVICE = xenstored.service +XEN_SYSTEMD_SERVICE = proc-xen.service +XEN_SYSTEMD_SERVICE += xenstored.service XEN_SYSTEMD_SERVICE += xenconsoled.service XEN_SYSTEMD_SERVICE += xen-qemu-dom0-disk-backend.service XEN_SYSTEMD_SERVICE += xendomains.service diff --git a/tools/hotplug/Linux/systemd/proc-xen.mount.in b/tools/hotplug/Linux/systemd/proc-xen.service.in similarity index 60% rename from tools/hotplug/Linux/systemd/proc-xen.mount.in rename to tools/hotplug/Linux/systemd/proc-xen.service.in index 64ebe7f9b1..76f0097b75 100644 --- a/tools/hotplug/Linux/systemd/proc-xen.mount.in +++ b/tools/hotplug/Linux/systemd/proc-xen.service.in @@ -4,7 +4,7 @@ ConditionPathExists=/proc/xen ConditionPathExists=!/proc/xen/capabilities RefuseManualStop=true -[Mount] -What=xenfs -Where=/proc/xen -Type=xenfs +[Service] +Type=oneshot +RemainAfterExit=true +ExecStart=/bin/mount -t xenfs xenfs /proc/xen diff --git a/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in b/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in index 11a7d50edc..5d171f82e8 100644 --- a/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in +++ b/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in @@ -1,7 +1,7 @@ [Unit] Description=mount xenstore file system -Requires=proc-xen.mount -After=proc-xen.mount +Requires=proc-xen.service +After=proc-xen.service ConditionPathExists=/proc/xen/capabilities RefuseManualStop=true diff --git a/tools/hotplug/Linux/systemd/xen-init-dom0.service.in b/tools/hotplug/Linux/systemd/xen-init-dom0.service.in index 3befadcea3..c560fbe1b7 100644 --- a/tools
Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Oct 11, Olaf Hering wrote: > -#define MAX_BATCH_SIZE 1024 /* up to 1024 pages (4MB) at a time */ > +#define MAX_BATCH_SIZE SUPERPAGE_1GB_NR_PFNS /* up to 1GB at a time */ Actually the error is something else, I missed this in the debug output: xc: error: Failed to get types for pfn batch (7 = Argument list too long): Internal error write_batch() should probably split the requests when filling types[] because Xen has "1024" hardcoded in XEN_DOMCTL_getpageframeinfo3... Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v10 0/3] tools/libxc: use superpages
Using superpages on the receiving dom0 will avoid performance regressions. Olaf v10: coding style in xc_sr_bitmap API reset bitmap size on free check for empty bitmap in xc_sr_bitmap API add comment to struct x86_hvm_sp, keep the short name style and type changes in x86_hvm_punch_hole do not mark VGA hole as busy in x86_hvm_setup call decrease_reservation once for all pfns rename variable in x86_hvm_populate_pfns call decrease_reservation in 2MB chucks if possible v9: update hole checking in x86_hvm_populate_pfns add out of bounds check to xc_sr_test_and_set/clear_bit v8: remove double check of 1G/2M idx in x86_hvm_populate_pfns v7: cover holes that span multiple superpages v6: handle freeing of partly populated superpages correctly more DPRINTFs v5: send correct version, rebase was not fully finished v4: restore trailing "_bit" in bitmap function names keep track of gaps between previous and current batch split alloc functionality in x86_hvm_allocate_pfn v3: clear pointer in xc_sr_bitmap_free some coding style changes use getdomaininfo.max_pages to avoid Over-allocation check trim bitmap function names, drop trailing "_bit" add some comments v2: split into individual commits based on staging c39cf093fc ("x86/asm: add .file directives") Olaf Hering (3): tools/libxc: move SUPERPAGE macros to common header tools/libxc: add API for bitmap access for restore tools/libxc: use superpages during restore of HVM guest tools/libxc/xc_dom_x86.c| 5 - tools/libxc/xc_private.h| 5 + tools/libxc/xc_sr_common.c | 41 +++ tools/libxc/xc_sr_common.h | 103 ++- tools/libxc/xc_sr_restore.c | 141 +- tools/libxc/xc_sr_restore_x86_hvm.c | 536 tools/libxc/xc_sr_restore_x86_pv.c | 72 - 7 files changed, 755 insertions(+), 148 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v10 3/3] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.h | 30 +- tools/libxc/xc_sr_restore.c | 75 + tools/libxc/xc_sr_restore_x86_hvm.c | 536 tools/libxc/xc_sr_restore_x86_pv.c | 72 - 4 files changed, 635 insertions(+), 78 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a728c93e53..0477c20617 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -139,6 +139,15 @@ struct xc_sr_restore_ops */ int (*setup)(struct xc_sr_context *ctx); +/** + * Populate PFNs + * + * Given a set of pfns, obtain memory from Xen to fill the physmap for the + * unpopulated subset. + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + /** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). @@ -224,6 +233,8 @@ struct xc_sr_context int send_back_fd; unsigned long p2m_size; +unsigned long max_pages; +unsigned long tot_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; /* From Image Header. */ @@ -336,6 +347,17 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; +xen_pfn_t idx1G_prev, idx2M_prev; + +/* List of PFNs for decrease_reservation */ +xen_pfn_t *extents; +unsigned long max_extents; +unsigned long nr_extents; } restore; }; } x86_hvm; @@ -460,14 +482,6 @@ static inline int write_record(struct xc_sr_context *ctx, */ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); -/* - * This would ideally be private in restore.c, but is needed by - * x86_pv_localise_page() if we receive pagetables frames ahead of the - * contents of the frames they point at. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types); - #endif /* * Local variables: diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53948e1a6..8cd9289d1a 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Given a set of pfns, obtain memory from Xen to fill the physmap for the - * unpopulated subset. If types is NULL, no page type checking is performed - * and all unpopulated pfns are populated. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types) -{ -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), -*pfns = malloc(count * sizeof(*pfns)); -unsigned i, nr_pfns = 0; -int rc = -1; - -if ( !mfns || !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && - !pfn_is_populated(ctx, original_pfns[i]) ) -{ -rc = pfn_set_populated(ctx, original_pfns[i]); -if ( rc ) -goto err; -pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; -++nr_pfns; -} -} - -if ( nr_pfns ) -{ -rc = xc_domain_populate_physmap_exact( -xch, ctx->domid, nr_pfns, 0, 0, mfns); -if ( rc ) -{ -PERROR("Failed to populate physmap"); -goto err; -} - -for ( i = 0; i < nr_pfns; ++i ) -{ -if ( mfns[i] == INVALID_MFN ) -{ -ERROR("Popu
[Xen-devel] [PATCH v10 2/3] tools/libxc: add API for bitmap access for restore
Extend API for managing bitmaps. Each bitmap is now represented by a generic struct xc_sr_bitmap. Switch the existing populated_pfns to this API. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_sr_common.c | 41 + tools/libxc/xc_sr_common.h | 73 +++-- tools/libxc/xc_sr_restore.c | 66 ++-- 3 files changed, 115 insertions(+), 65 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 79b9c3e940..28c7be2b15 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void) BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8); } +/* + * Expand the tracking structures as needed. + * To avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required number of bits. + */ +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if ( bits > bm->bits ) +{ +size_t new_max; +size_t old_sz, new_sz; +void *p; + +/* Round up to the nearest power of two larger than bit, less 1. */ +new_max = bits; +new_max |= new_max >> 1; +new_max |= new_max >> 2; +new_max |= new_max >> 4; +new_max |= new_max >> 8; +new_max |= new_max >> 16; +#ifdef __x86_64__ +new_max |= new_max >> 32; +#endif + +old_sz = bitmap_size(bm->bits + 1); +new_sz = bitmap_size(new_max + 1); +p = realloc(bm->p, new_sz); +if ( !p ) +return false; + +if (bm->p) +memset(p + old_sz, 0, new_sz - old_sz); +else +memset(p, 0, new_sz); + +bm->p = p; +bm->bits = new_max; +} +return true; +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..a728c93e53 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu size_t basicsz, extdsz, xsavesz, msrsz; }; +struct xc_sr_bitmap +{ +void *p; +unsigned long bits; +}; + struct xc_sr_context { xc_interface *xch; @@ -255,8 +261,7 @@ struct xc_sr_context domid_t xenstore_domid, console_domid; /* Bitmap of currently populated PFNs during restore. */ -unsigned long *populated_pfns; -xen_pfn_t max_populated_pfn; +struct xc_sr_bitmap populated_pfns; /* Sender has invoked verify mode on the stream. */ bool verify; @@ -343,6 +348,70 @@ extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; extern struct xc_sr_restore_ops restore_ops_x86_hvm; +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits); + +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if ( bits > bm->bits ) +return _xc_sr_bitmap_resize(bm, bits); +return true; +} + +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) +{ +free( bm->p ); +bm->bits = 0; +bm->p = NULL; +} + +static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if ( !xc_sr_bitmap_resize(bm, bit) ) +return false; + +set_bit(bit, bm->p); +return true; +} + +static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if ( bit > bm->bits || !bm->bits ) +return false; +return !!test_bit(bit, bm->p); +} + +static inline bool xc_sr_test_and_clear_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if ( bit > bm->bits || !bm->bits ) +return false; +return !!test_and_clear_bit(bit, bm->p); +} + +static inline bool xc_sr_test_and_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if ( bit > bm->bits || !bm->bits ) +return false; +return !!test_and_set_bit(bit, bm->p); +} + +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns); +} + +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +xc_interface *xch = ctx->xch; + +if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) ) +{ +ERROR("Failed to realloc populated_pfns bitmap"); +errno = ENOMEM; +return -1; +} +return 0; +} + struct xc_sr_record { uint32_t type; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index a016678332..d53948e1a6 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0;
[Xen-devel] [PATCH v10 1/3] tools/libxc: move SUPERPAGE macros to common header
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by other code in libxc. Move the macros to a header file. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_dom_x86.c | 5 - tools/libxc/xc_private.h | 5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index cb68efcbd3..5aff5cad58 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -43,11 +43,6 @@ #define SUPERPAGE_BATCH_SIZE 512 -#define SUPERPAGE_2MB_SHIFT 9 -#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) -#define SUPERPAGE_1GB_SHIFT 18 -#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) - #define X86_CR0_PE 0x01 #define X86_CR0_ET 0x10 diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 1c27b0fded..d581f850b0 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -66,6 +66,11 @@ struct iovec { #define DECLARE_FLASK_OP struct xen_flask_op op #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op +#define SUPERPAGE_2MB_SHIFT 9 +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) +#define SUPERPAGE_1GB_SHIFT 18 +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) + #undef PAGE_SHIFT #undef PAGE_SIZE #undef PAGE_MASK ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest
On Fri, Sep 08, Olaf Hering wrote: > A related question: is it save to increase MAX_BATCH_SIZE from 1024 to > (256*1024) to transfer a whole gigabyte at a time? That way it will be > easier to handle holes within a 1GB superpage. To answer my own question: This change leads to this error: -#define MAX_BATCH_SIZE 1024 /* up to 1024 pages (4MB) at a time */ +#define MAX_BATCH_SIZE SUPERPAGE_1GB_NR_PFNS /* up to 1GB at a time */ ... xc: info: Found x86 HVM domain from Xen 4.10 xc: detail: dom 9 p2m_size fee01 max_pages 100100 xc: info: Restoring domain xc: error: Failed to read Record Header from stream (0 = Success): Internal error xc: error: Restore failed (0 = Success): Internal error ... Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL 1/2] xen-disk: use g_new0 to fix build
On Wed, Sep 20, Stefano Stabellini wrote: > From: Olaf Hering > g_malloc0_n is available since glib-2.24. To allow build with older glib > versions use the generic g_new0, which is already used in many other > places in the code. > Fixes commit 3284fad728 ("xen-disk: add support for multi-page shared rings") In case this missed the release, please backport to the relevant stable branches as well. Many thanks. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Sep 06, Andrew Cooper wrote: > The stream has always been in-order for the first pass (even in the > legacy days), and I don't forsee that changing. Reliance on the order > was suggested by both myself and Jan during the early design. A related question: is it save to increase MAX_BATCH_SIZE from 1024 to (256*1024) to transfer a whole gigabyte at a time? That way it will be easier to handle holes within a 1GB superpage. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Sep 06, Andrew Cooper wrote: > If a PVH guest has got MTRRs disabled, then it genuinely can run on an > unshattered 1G superpage at 0. Ok, the code will detect the holes and will release memory as needed. I will drop these two lines. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Sep 06, Andrew Cooper wrote: > I still fail to understand why you need the bitmaps at all? You can > calculate everything you need from the pfn list alone, which will also > let you spot the presence or absence of the VGA hole. These bitmaps track if a range has been allocated as superpage or not. If there is a given pfn within a range of either 1G or 2M there might be double allocation of a 1G or 2M page. This is not related to the VGA hole. These two lines are just hints that in this range no superpage can be allocated. > You need to track which pfns you've see so far in the stream, and which > pfns have been populated. When you find holes in the pfns in the > stream, you need to undo the prospective superpage allocation. Unless > I've missed something? This is whats happening, holes will be created as soon as they are seen in the stream. > Also, please take care to use 2M decrease reservations wherever > possible, or you will end up shattering the host superpage as part of > trying to remove the memory. This is what Wei suggested, build a list of pfns instead of releasing each pfn individually. I think with this new code it should be possible to decrease in 2M steps as needed. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Sep 06, Andrew Cooper wrote: > On 01/09/17 17:08, Olaf Hering wrote: > > +/* No superpage in 1st 2MB due to VGA hole */ > > +xc_sr_set_bit(0, &ctx->x86_hvm.restore.attempted_1g); > > +xc_sr_set_bit(0, &ctx->x86_hvm.restore.attempted_2m); > This is false for PVH guests. How can I detect a PVH guest? Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 2/3] tools/libxc: add API for bitmap access for restore
On Wed, Sep 06, Andrew Cooper wrote: > On 01/09/17 17:08, Olaf Hering wrote: > > +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t > > pfn) > > +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t > > pfn) > Why are these moved? They are still restore specific. There is no tools/libxc/xc_sr_restore.h, should I create one? Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest
Am Wed, 6 Sep 2017 12:34:10 +0100 schrieb Wei Liu : > > +struct x86_hvm_sp { > Forgot to ask: what does sp stand for? superpage. I will check if there is room to expand this string. > > + * Try to allocate superpages. > > + * This works without memory map only if the pfns arrive in incremental > > order. > > + */ > I have said several times, one way or another, I don't want to make > assumption on the stream of pfns. So I'm afraid I can't ack a patch like > this. It will work with any order, I think. Just with incremental order the superpages will not be split once they are allocated. Thanks for the review. I will send another series shortly. Olaf pgp5WsQzensvH.pgp Description: Digitale Signatur von OpenPGP ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH] libxc/bitops: correct comment for bitmap_size
The returned value represents now units of bytes instead of longs. Fixes commit 11d0044a16 ("tools/libxc: Modify bitmap operations to take void pointers") Signed-off-by: Olaf Hering --- tools/libxc/xc_bitops.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libxc/xc_bitops.h b/tools/libxc/xc_bitops.h index 3e7a544c9d..0951e8267d 100644 --- a/tools/libxc/xc_bitops.h +++ b/tools/libxc/xc_bitops.h @@ -13,7 +13,7 @@ #define BITMAP_ENTRY(_nr,_bmap) ((_bmap))[(_nr) / 8] #define BITMAP_SHIFT(_nr) ((_nr) % 8) -/* calculate required space for number of longs needed to hold nr_bits */ +/* calculate required space for number of bytes needed to hold nr_bits */ static inline int bitmap_size(int nr_bits) { return (nr_bits + 7) / 8; ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 3/3] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.h | 26 ++- tools/libxc/xc_sr_restore.c | 75 +--- tools/libxc/xc_sr_restore_x86_hvm.c | 341 tools/libxc/xc_sr_restore_x86_pv.c | 72 +++- 4 files changed, 436 insertions(+), 78 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 734320947a..93141a6e25 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -139,6 +139,16 @@ struct xc_sr_restore_ops */ int (*setup)(struct xc_sr_context *ctx); +/** + * Populate PFNs + * + * Given a set of pfns, obtain memory from Xen to fill the physmap for the + * unpopulated subset. + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + + /** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). @@ -224,6 +234,8 @@ struct xc_sr_context int send_back_fd; unsigned long p2m_size; +unsigned long max_pages; +unsigned long tot_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; /* From Image Header. */ @@ -336,6 +348,12 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; +xen_pfn_t idx1G_prev, idx2M_prev; } restore; }; } x86_hvm; @@ -459,14 +477,6 @@ static inline int write_record(struct xc_sr_context *ctx, */ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); -/* - * This would ideally be private in restore.c, but is needed by - * x86_pv_localise_page() if we receive pagetables frames ahead of the - * contents of the frames they point at. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types); - #endif /* * Local variables: diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53948e1a6..8cd9289d1a 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Given a set of pfns, obtain memory from Xen to fill the physmap for the - * unpopulated subset. If types is NULL, no page type checking is performed - * and all unpopulated pfns are populated. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types) -{ -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), -*pfns = malloc(count * sizeof(*pfns)); -unsigned i, nr_pfns = 0; -int rc = -1; - -if ( !mfns || !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && - !pfn_is_populated(ctx, original_pfns[i]) ) -{ -rc = pfn_set_populated(ctx, original_pfns[i]); -if ( rc ) -goto err; -pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; -++nr_pfns; -} -} - -if ( nr_pfns ) -{ -rc = xc_domain_populate_physmap_exact( -xch, ctx->domid, nr_pfns, 0, 0, mfns); -if ( rc ) -{ -PERROR("Failed to populate physmap"); -goto err; -} - -for ( i = 0; i < nr_pfns; ++i ) -{ -if ( mfns[i] == INVALID_MFN ) -{ -ERROR("Populate physmap failed for pfn %u", i); -rc = -1; -goto err; -} - -ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]); -} -} -
[Xen-devel] [PATCH v9 1/3] tools/libxc: move SUPERPAGE macros to common header
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by other code in libxc. Move the macros to a header file. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_dom_x86.c | 5 - tools/libxc/xc_private.h | 5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index cb68efcbd3..5aff5cad58 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -43,11 +43,6 @@ #define SUPERPAGE_BATCH_SIZE 512 -#define SUPERPAGE_2MB_SHIFT 9 -#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) -#define SUPERPAGE_1GB_SHIFT 18 -#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) - #define X86_CR0_PE 0x01 #define X86_CR0_ET 0x10 diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 1c27b0fded..d581f850b0 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -66,6 +66,11 @@ struct iovec { #define DECLARE_FLASK_OP struct xen_flask_op op #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op +#define SUPERPAGE_2MB_SHIFT 9 +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) +#define SUPERPAGE_1GB_SHIFT 18 +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) + #undef PAGE_SHIFT #undef PAGE_SIZE #undef PAGE_MASK ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 0/3] tools/libxc: use superpages
Using superpages on the receiving dom0 will avoid performance regressions. Olaf v9: update hole checking in x86_hvm_populate_pfns add out of bounds check to xc_sr_test_and_set/clear_bit v8: remove double check of 1G/2M idx in x86_hvm_populate_pfns v7: cover holes that span multiple superpages v6: handle freeing of partly populated superpages correctly more DPRINTFs v5: send correct version, rebase was not fully finished v4: restore trailing "_bit" in bitmap function names keep track of gaps between previous and current batch split alloc functionality in x86_hvm_allocate_pfn v3: clear pointer in xc_sr_bitmap_free some coding style changes use getdomaininfo.max_pages to avoid Over-allocation check trim bitmap function names, drop trailing "_bit" add some comments v2: split into individual commits based on staging c39cf093fc ("x86/asm: add .file directives") Olaf Hering (3): tools/libxc: move SUPERPAGE macros to common header tools/libxc: add API for bitmap access for restore tools/libxc: use superpages during restore of HVM guest tools/libxc/xc_dom_x86.c| 5 - tools/libxc/xc_private.h| 5 + tools/libxc/xc_sr_common.c | 41 + tools/libxc/xc_sr_common.h | 98 +-- tools/libxc/xc_sr_restore.c | 141 +-- tools/libxc/xc_sr_restore_x86_hvm.c | 341 tools/libxc/xc_sr_restore_x86_pv.c | 72 +++- 7 files changed, 555 insertions(+), 148 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v9 2/3] tools/libxc: add API for bitmap access for restore
Extend API for managing bitmaps. Each bitmap is now represented by a generic struct xc_sr_bitmap. Switch the existing populated_pfns to this API. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_sr_common.c | 41 ++ tools/libxc/xc_sr_common.h | 72 +++-- tools/libxc/xc_sr_restore.c | 66 ++--- 3 files changed, 114 insertions(+), 65 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 79b9c3e940..4d221ca90c 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void) BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8); } +/* + * Expand the tracking structures as needed. + * To avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required number of bits. + */ +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +{ +size_t new_max; +size_t old_sz, new_sz; +void *p; + +/* Round up to the nearest power of two larger than bit, less 1. */ +new_max = bits; +new_max |= new_max >> 1; +new_max |= new_max >> 2; +new_max |= new_max >> 4; +new_max |= new_max >> 8; +new_max |= new_max >> 16; +#ifdef __x86_64__ +new_max |= new_max >> 32; +#endif + +old_sz = bitmap_size(bm->bits + 1); +new_sz = bitmap_size(new_max + 1); +p = realloc(bm->p, new_sz); +if (!p) +return false; + +if (bm->p) +memset(p + old_sz, 0, new_sz - old_sz); +else +memset(p, 0, new_sz); + +bm->p = p; +bm->bits = new_max; +} +return true; +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..734320947a 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu size_t basicsz, extdsz, xsavesz, msrsz; }; +struct xc_sr_bitmap +{ +void *p; +unsigned long bits; +}; + struct xc_sr_context { xc_interface *xch; @@ -255,8 +261,7 @@ struct xc_sr_context domid_t xenstore_domid, console_domid; /* Bitmap of currently populated PFNs during restore. */ -unsigned long *populated_pfns; -xen_pfn_t max_populated_pfn; +struct xc_sr_bitmap populated_pfns; /* Sender has invoked verify mode on the stream. */ bool verify; @@ -343,6 +348,69 @@ extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; extern struct xc_sr_restore_ops restore_ops_x86_hvm; +extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits); + +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +return _xc_sr_bitmap_resize(bm, bits); +return true; +} + +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) +{ +free(bm->p); +bm->p = NULL; +} + +static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (!xc_sr_bitmap_resize(bm, bit)) +return false; + +set_bit(bit, bm->p); +return true; +} + +static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bits) +return false; +return !!test_bit(bit, bm->p); +} + +static inline bool xc_sr_test_and_clear_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bits) +return false; +return !!test_and_clear_bit(bit, bm->p); +} + +static inline bool xc_sr_test_and_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bits) +return false; +return !!test_and_set_bit(bit, bm->p); +} + +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns); +} + +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +xc_interface *xch = ctx->xch; + +if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) ) +{ +ERROR("Failed to realloc populated_pfns bitmap"); +errno = ENOMEM; +return -1; +} +return 0; +} + struct xc_sr_record { uint32_t type; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index a016678332..d53948e1a6 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Is a pfn populated? - */ -static bool pfn_is_populated(const st
Re: [Xen-devel] [PATCH v8 3/3] tools/libxc: use superpages during restore of HVM guest
On Fri, Sep 01, Olaf Hering wrote: > +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, > +/* > + * If this next pfn is within another 1GB superpage it is > required > + * to scan the entire previous superpage because there might be > + * holes between max_pfn and the end of the superpage. > + */ > +if ( idx1G_prev != idx1G ) > +{ > +order = SUPERPAGE_1GB_SHIFT; > +max_pfn = (((max_pfn >> order) + 1) << order) - 1; > +} > +if ( x86_hvm_punch_hole(ctx, max_pfn) == false ) And thinking about this part: with this variant it is still possible that Over-allocation happens. If the previous pfn was within a 2MB range, and this pfn is in another 2MB range, then the hole after max_pfn would not be covered. This part needs an 'else' with SUPERPAGE_2MB_SHIFT. This "reset to max" may trigger a bug in xc_sr_test_and_clear_bit(). It has to check the size of the bitmap, just as xc_sr_test_bit() does. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 2/3] tools/libxc: add API for bitmap access for restore
Extend API for managing bitmaps. Each bitmap is now represented by a generic struct xc_sr_bitmap. Switch the existing populated_pfns to this API. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_sr_common.c | 41 +++ tools/libxc/xc_sr_common.h | 68 +++-- tools/libxc/xc_sr_restore.c | 66 ++- 3 files changed, 110 insertions(+), 65 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 79b9c3e940..4d221ca90c 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void) BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8); } +/* + * Expand the tracking structures as needed. + * To avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required number of bits. + */ +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +{ +size_t new_max; +size_t old_sz, new_sz; +void *p; + +/* Round up to the nearest power of two larger than bit, less 1. */ +new_max = bits; +new_max |= new_max >> 1; +new_max |= new_max >> 2; +new_max |= new_max >> 4; +new_max |= new_max >> 8; +new_max |= new_max >> 16; +#ifdef __x86_64__ +new_max |= new_max >> 32; +#endif + +old_sz = bitmap_size(bm->bits + 1); +new_sz = bitmap_size(new_max + 1); +p = realloc(bm->p, new_sz); +if (!p) +return false; + +if (bm->p) +memset(p + old_sz, 0, new_sz - old_sz); +else +memset(p, 0, new_sz); + +bm->p = p; +bm->bits = new_max; +} +return true; +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..da2691ba79 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu size_t basicsz, extdsz, xsavesz, msrsz; }; +struct xc_sr_bitmap +{ +void *p; +unsigned long bits; +}; + struct xc_sr_context { xc_interface *xch; @@ -255,8 +261,7 @@ struct xc_sr_context domid_t xenstore_domid, console_domid; /* Bitmap of currently populated PFNs during restore. */ -unsigned long *populated_pfns; -xen_pfn_t max_populated_pfn; +struct xc_sr_bitmap populated_pfns; /* Sender has invoked verify mode on the stream. */ bool verify; @@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; extern struct xc_sr_restore_ops restore_ops_x86_hvm; +extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits); + +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +return _xc_sr_bitmap_resize(bm, bits); +return true; +} + +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) +{ +free(bm->p); +bm->p = NULL; +} + +static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (!xc_sr_bitmap_resize(bm, bit)) +return false; + +set_bit(bit, bm->p); +return true; +} + +static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bits) +return false; +return !!test_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_clear_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_set_bit(bit, bm->p); +} + +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns); +} + +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +xc_interface *xch = ctx->xch; + +if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) ) +{ +ERROR("Failed to realloc populated_pfns bitmap"); +errno = ENOMEM; +return -1; +} +return 0; +} + struct xc_sr_record { uint32_t type; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index a016678332..d53948e1a6 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Is a pfn populated? - */ -static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn) -{ -if ( pfn > ctx->restore.max_populated_pfn ) -
[Xen-devel] [PATCH v8 3/3] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.h | 25 ++- tools/libxc/xc_sr_restore.c | 75 +--- tools/libxc/xc_sr_restore_x86_hvm.c | 337 tools/libxc/xc_sr_restore_x86_pv.c | 72 +++- 4 files changed, 431 insertions(+), 78 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index da2691ba79..0fa0fbea4d 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -139,6 +139,16 @@ struct xc_sr_restore_ops */ int (*setup)(struct xc_sr_context *ctx); +/** + * Populate PFNs + * + * Given a set of pfns, obtain memory from Xen to fill the physmap for the + * unpopulated subset. + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + + /** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). @@ -224,6 +234,8 @@ struct xc_sr_context int send_back_fd; unsigned long p2m_size; +unsigned long max_pages; +unsigned long tot_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; /* From Image Header. */ @@ -336,6 +348,11 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; } restore; }; } x86_hvm; @@ -455,14 +472,6 @@ static inline int write_record(struct xc_sr_context *ctx, */ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); -/* - * This would ideally be private in restore.c, but is needed by - * x86_pv_localise_page() if we receive pagetables frames ahead of the - * contents of the frames they point at. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types); - #endif /* * Local variables: diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53948e1a6..8cd9289d1a 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Given a set of pfns, obtain memory from Xen to fill the physmap for the - * unpopulated subset. If types is NULL, no page type checking is performed - * and all unpopulated pfns are populated. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types) -{ -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), -*pfns = malloc(count * sizeof(*pfns)); -unsigned i, nr_pfns = 0; -int rc = -1; - -if ( !mfns || !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && - !pfn_is_populated(ctx, original_pfns[i]) ) -{ -rc = pfn_set_populated(ctx, original_pfns[i]); -if ( rc ) -goto err; -pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; -++nr_pfns; -} -} - -if ( nr_pfns ) -{ -rc = xc_domain_populate_physmap_exact( -xch, ctx->domid, nr_pfns, 0, 0, mfns); -if ( rc ) -{ -PERROR("Failed to populate physmap"); -goto err; -} - -for ( i = 0; i < nr_pfns; ++i ) -{ -if ( mfns[i] == INVALID_MFN ) -{ -ERROR("Populate physmap failed for pfn %u", i); -rc = -1; -goto err; -} - -ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]); -} -} - -rc = 0; - - err: -free(pfns); -free(mfns);
[Xen-devel] [PATCH v8 0/3] tools/libxc: use superpages
Using superpages on the receiving dom0 will avoid performance regressions. Olaf v8: remove double check of 1G/2M idx in x86_hvm_populate_pfns v7: cover holes that span multiple superpages v6: handle freeing of partly populated superpages correctly more DPRINTFs v5: send correct version, rebase was not fully finished v4: restore trailing "_bit" in bitmap function names keep track of gaps between previous and current batch split alloc functionality in x86_hvm_allocate_pfn v3: clear pointer in xc_sr_bitmap_free some coding style changes use getdomaininfo.max_pages to avoid Over-allocation check trim bitmap function names, drop trailing "_bit" add some comments v2: split into individual commits based on staging c39cf093fc ("x86/asm: add .file directives") Olaf Hering (3): tools/libxc: move SUPERPAGE macros to common header tools/libxc: add API for bitmap access for restore tools/libxc: use superpages during restore of HVM guest tools/libxc/xc_dom_x86.c| 5 - tools/libxc/xc_private.h| 5 + tools/libxc/xc_sr_common.c | 41 + tools/libxc/xc_sr_common.h | 93 -- tools/libxc/xc_sr_restore.c | 141 +-- tools/libxc/xc_sr_restore_x86_hvm.c | 337 tools/libxc/xc_sr_restore_x86_pv.c | 72 +++- 7 files changed, 546 insertions(+), 148 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v8 1/3] tools/libxc: move SUPERPAGE macros to common header
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by other code in libxc. Move the macros to a header file. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_dom_x86.c | 5 - tools/libxc/xc_private.h | 5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index cb68efcbd3..5aff5cad58 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -43,11 +43,6 @@ #define SUPERPAGE_BATCH_SIZE 512 -#define SUPERPAGE_2MB_SHIFT 9 -#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) -#define SUPERPAGE_1GB_SHIFT 18 -#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) - #define X86_CR0_PE 0x01 #define X86_CR0_ET 0x10 diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 1c27b0fded..d581f850b0 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -66,6 +66,11 @@ struct iovec { #define DECLARE_FLASK_OP struct xen_flask_op op #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op +#define SUPERPAGE_2MB_SHIFT 9 +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) +#define SUPERPAGE_1GB_SHIFT 18 +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) + #undef PAGE_SHIFT #undef PAGE_SIZE #undef PAGE_MASK ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v7 2/3] tools/libxc: add API for bitmap access for restore
Extend API for managing bitmaps. Each bitmap is now represented by a generic struct xc_sr_bitmap. Switch the existing populated_pfns to this API. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_sr_common.c | 41 +++ tools/libxc/xc_sr_common.h | 68 +++-- tools/libxc/xc_sr_restore.c | 66 ++- 3 files changed, 110 insertions(+), 65 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 79b9c3e940..4d221ca90c 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void) BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8); } +/* + * Expand the tracking structures as needed. + * To avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required number of bits. + */ +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +{ +size_t new_max; +size_t old_sz, new_sz; +void *p; + +/* Round up to the nearest power of two larger than bit, less 1. */ +new_max = bits; +new_max |= new_max >> 1; +new_max |= new_max >> 2; +new_max |= new_max >> 4; +new_max |= new_max >> 8; +new_max |= new_max >> 16; +#ifdef __x86_64__ +new_max |= new_max >> 32; +#endif + +old_sz = bitmap_size(bm->bits + 1); +new_sz = bitmap_size(new_max + 1); +p = realloc(bm->p, new_sz); +if (!p) +return false; + +if (bm->p) +memset(p + old_sz, 0, new_sz - old_sz); +else +memset(p, 0, new_sz); + +bm->p = p; +bm->bits = new_max; +} +return true; +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..da2691ba79 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu size_t basicsz, extdsz, xsavesz, msrsz; }; +struct xc_sr_bitmap +{ +void *p; +unsigned long bits; +}; + struct xc_sr_context { xc_interface *xch; @@ -255,8 +261,7 @@ struct xc_sr_context domid_t xenstore_domid, console_domid; /* Bitmap of currently populated PFNs during restore. */ -unsigned long *populated_pfns; -xen_pfn_t max_populated_pfn; +struct xc_sr_bitmap populated_pfns; /* Sender has invoked verify mode on the stream. */ bool verify; @@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; extern struct xc_sr_restore_ops restore_ops_x86_hvm; +extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits); + +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +return _xc_sr_bitmap_resize(bm, bits); +return true; +} + +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) +{ +free(bm->p); +bm->p = NULL; +} + +static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (!xc_sr_bitmap_resize(bm, bit)) +return false; + +set_bit(bit, bm->p); +return true; +} + +static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bits) +return false; +return !!test_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_clear_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_set_bit(bit, bm->p); +} + +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns); +} + +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +xc_interface *xch = ctx->xch; + +if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) ) +{ +ERROR("Failed to realloc populated_pfns bitmap"); +errno = ENOMEM; +return -1; +} +return 0; +} + struct xc_sr_record { uint32_t type; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index a016678332..d53948e1a6 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Is a pfn populated? - */ -static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn) -{ -if ( pfn > ctx->restore.max_populated_pfn ) -
[Xen-devel] [PATCH v7 1/3] tools/libxc: move SUPERPAGE macros to common header
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by other code in libxc. Move the macros to a header file. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_dom_x86.c | 5 - tools/libxc/xc_private.h | 5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index cb68efcbd3..5aff5cad58 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -43,11 +43,6 @@ #define SUPERPAGE_BATCH_SIZE 512 -#define SUPERPAGE_2MB_SHIFT 9 -#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) -#define SUPERPAGE_1GB_SHIFT 18 -#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) - #define X86_CR0_PE 0x01 #define X86_CR0_ET 0x10 diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 1c27b0fded..d581f850b0 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -66,6 +66,11 @@ struct iovec { #define DECLARE_FLASK_OP struct xen_flask_op op #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op +#define SUPERPAGE_2MB_SHIFT 9 +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) +#define SUPERPAGE_1GB_SHIFT 18 +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) + #undef PAGE_SHIFT #undef PAGE_SIZE #undef PAGE_MASK ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v7 0/3] tools/libxc: use superpages
Using superpages on the receiving dom0 will avoid performance regressions. Olaf v7: cover holes that span multiple superpages v6: handle freeing of partly populated superpages correctly more DPRINTFs v5: send correct version, rebase was not fully finished v4: restore trailing "_bit" in bitmap function names keep track of gaps between previous and current batch split alloc functionality in x86_hvm_allocate_pfn v3: clear pointer in xc_sr_bitmap_free some coding style changes use getdomaininfo.max_pages to avoid Over-allocation check trim bitmap function names, drop trailing "_bit" add some comments v2: split into individual commits Olaf Hering (3): tools/libxc: move SUPERPAGE macros to common header tools/libxc: add API for bitmap access for restore tools/libxc: use superpages during restore of HVM guest tools/libxc/xc_dom_x86.c| 5 - tools/libxc/xc_private.h| 5 + tools/libxc/xc_sr_common.c | 41 + tools/libxc/xc_sr_common.h | 93 -- tools/libxc/xc_sr_restore.c | 141 +-- tools/libxc/xc_sr_restore_x86_hvm.c | 340 tools/libxc/xc_sr_restore_x86_pv.c | 72 +++- 7 files changed, 549 insertions(+), 148 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v7 3/3] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.h | 25 ++- tools/libxc/xc_sr_restore.c | 75 +--- tools/libxc/xc_sr_restore_x86_hvm.c | 340 tools/libxc/xc_sr_restore_x86_pv.c | 72 +++- 4 files changed, 434 insertions(+), 78 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index da2691ba79..0fa0fbea4d 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -139,6 +139,16 @@ struct xc_sr_restore_ops */ int (*setup)(struct xc_sr_context *ctx); +/** + * Populate PFNs + * + * Given a set of pfns, obtain memory from Xen to fill the physmap for the + * unpopulated subset. + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + + /** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). @@ -224,6 +234,8 @@ struct xc_sr_context int send_back_fd; unsigned long p2m_size; +unsigned long max_pages; +unsigned long tot_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; /* From Image Header. */ @@ -336,6 +348,11 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; } restore; }; } x86_hvm; @@ -455,14 +472,6 @@ static inline int write_record(struct xc_sr_context *ctx, */ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); -/* - * This would ideally be private in restore.c, but is needed by - * x86_pv_localise_page() if we receive pagetables frames ahead of the - * contents of the frames they point at. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types); - #endif /* * Local variables: diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53948e1a6..8cd9289d1a 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Given a set of pfns, obtain memory from Xen to fill the physmap for the - * unpopulated subset. If types is NULL, no page type checking is performed - * and all unpopulated pfns are populated. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types) -{ -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), -*pfns = malloc(count * sizeof(*pfns)); -unsigned i, nr_pfns = 0; -int rc = -1; - -if ( !mfns || !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && - !pfn_is_populated(ctx, original_pfns[i]) ) -{ -rc = pfn_set_populated(ctx, original_pfns[i]); -if ( rc ) -goto err; -pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; -++nr_pfns; -} -} - -if ( nr_pfns ) -{ -rc = xc_domain_populate_physmap_exact( -xch, ctx->domid, nr_pfns, 0, 0, mfns); -if ( rc ) -{ -PERROR("Failed to populate physmap"); -goto err; -} - -for ( i = 0; i < nr_pfns; ++i ) -{ -if ( mfns[i] == INVALID_MFN ) -{ -ERROR("Populate physmap failed for pfn %u", i); -rc = -1; -goto err; -} - -ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]); -} -} - -rc = 0; - - err: -free(pfns); -free(mfns);
[Xen-devel] ballooning specific PFNs in a HVM domU
Does the Linux kernel provide an API to claim specific pages? Right now it just does alloc_page(), which I think returns any random page that happens to be unused. I want to create a specific memory layout with holes to verify my migration patches. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Aug 30, Wei Liu wrote: > > Can this actually happen with the available senders? If not, this is > > again the missing memory map. > Probably not now, but as said, you shouldn't rely on the structure of > the stream unless it is stated in the spec. Well, what can happen with todays implementation on the sender side is the case of a ballooned guest with enough holes within a batch. These will trigger 1G allocations before the releasing of memory happens. To solve this, the releasing of memory has to happen more often, probably after crossing each 2M boundary. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Aug 30, Wei Liu wrote: > As far as I can tell the algorithm in the patch can't handle: > > 1. First pfn in a batch points to start of second 1G address space > 2. Second pfn in a batch points to a page in the middle of first 1G > 3. Guest can only use 1G ram In which way does it not handle it? Over-allocation is supposed to be handled by the "ctx->restore.tot_pages + sp->count > ctx->restore.max_pages" checks. Do you mean the second 1G is allocated, then max_pages is reached, and allocation in other areas is not possible anymore? Can this actually happen with the available senders? If not, this is again the missing memory map. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest
On Sat, Aug 26, Olaf Hering wrote: > +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, > +/* > + * Scan the entire superpage because several batches will fit into > + * a superpage, and it is unknown which pfn triggered the allocation. > + */ > +order = SUPERPAGE_1GB_SHIFT; > +pfn = min_pfn = (min_pfn >> order) << order; Scanning an entire superpage again and again looked expensive, but with the debug change below it turned out that the loop which peeks at each single bit in populated_pfns is likely not a bootleneck. Migrating a domU with a simple workload that touches pages to mark them dirty will set the min_pfn/max_pfn to a large range anyway after the first iteration. This large range may also happen with an idle domU. A small domU takes 78 seconds to migrate, and just the freeing part takes 1.4 seconds. Similar for a large domain, the loop takes 1% of the time. 78 seconds, 1.4 seconds, 2119 calls (8GB, 12*512M memdirty) 695 seconds, 7.6 seconds, 18076 calls (72GB, 12*5G memdirty) Olaf track time spent if decrease_reservation is needed diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 0fa0fbea4d..5ec8b6fee6 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -353,6 +353,9 @@ struct xc_sr_context struct xc_sr_bitmap attempted_1g; struct xc_sr_bitmap attempted_2m; struct xc_sr_bitmap allocated_pfns; + +unsigned long tv_nsec; +unsigned long iterations; } restore; }; } x86_hvm; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index 8cd9289d1a..f6aad329e2 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -769,6 +769,7 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, { ctx.restore.ops = restore_ops_x86_hvm; if ( restore(&ctx) ) +; return -1; } else diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c b/tools/libxc/xc_sr_restore_x86_hvm.c index 2b0eca0c7c..11758b3f7d 100644 --- a/tools/libxc/xc_sr_restore_x86_hvm.c +++ b/tools/libxc/xc_sr_restore_x86_hvm.c @@ -1,5 +1,6 @@ #include #include +#include #include "xc_sr_common_x86.h" @@ -248,6 +249,12 @@ static int x86_hvm_stream_complete(struct xc_sr_context *ctx) static int x86_hvm_cleanup(struct xc_sr_context *ctx) { +xc_interface *xch = ctx->xch; +errno = 0; +PERROR("tv_nsec %lu.%lu iterations %lu", +ctx->x86_hvm.restore.tv_nsec / 10UL, +ctx->x86_hvm.restore.tv_nsec % 10UL, +ctx->x86_hvm.restore.iterations); free(ctx->x86_hvm.restore.context); xc_sr_bitmap_free(&ctx->x86_hvm.restore.attempted_1g); xc_sr_bitmap_free(&ctx->x86_hvm.restore.attempted_2m); @@ -440,6 +447,28 @@ static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, xen_pfn_t pfn) return rc; } +static void diff_timespec(struct xc_sr_context *ctx, const struct timespec *old, const struct timespec *new, struct timespec *diff) +{ +xc_interface *xch = ctx->xch; +if (new->tv_sec == old->tv_sec && new->tv_nsec == old->tv_nsec) +PERROR("%s: time did not move: %ld/%ld == %ld/%ld", __func__, old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec); +if ( (new->tv_sec < old->tv_sec) || (new->tv_sec == old->tv_sec && new->tv_nsec < old->tv_nsec) ) +{ +PERROR("%s: time went backwards: %ld/%ld -> %ld/%ld", __func__, old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec); +diff->tv_sec = diff->tv_nsec = 0; +return; +} +if ((new->tv_nsec - old->tv_nsec) < 0) { +diff->tv_sec = new->tv_sec - old->tv_sec - 1; +diff->tv_nsec = new->tv_nsec - old->tv_nsec + 10UL; +} else { +diff->tv_sec = new->tv_sec - old->tv_sec; +diff->tv_nsec = new->tv_nsec - old->tv_nsec; +} +if (diff->tv_sec < 0) +PERROR("%s: time diff broken. old: %ld/%ld new: %ld/%ld diff: %ld/%ld ", __func__, old->tv_sec, old->tv_nsec, new->tv_sec, new->tv_nsec, diff->tv_sec, diff->tv_nsec); +} + static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, const xen_pfn_t *original_pfns, const uint32_t *types) @@ -448,6 +477,7 @@ static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, xen_pfn_t pfn, min_pfn = original_pfns[0], max_pfn = original_pfns[0]; unsigned i, freed = 0, order; int rc = -1; +struct timespec a, b, d; for ( i = 0; i < count; ++i )
[Xen-devel] [PATCH v6 0/3] tools/libxc: use superpages
Using superpages on the receiving dom0 will avoid performance regressions. Olaf v6: handle freeing of partly populated superpages correctly more DPRINTFs v5: send correct version, rebase was not fully finished v4: restore trailing "_bit" in bitmap function names keep track of gaps between previous and current batch split alloc functionality in x86_hvm_allocate_pfn v3: clear pointer in xc_sr_bitmap_free some coding style changes use getdomaininfo.max_pages to avoid Over-allocation check trim bitmap function names, drop trailing "_bit" add some comments v2: split into individual commits Olaf Hering (3): tools/libxc: move SUPERPAGE macros to common header tools/libxc: add API for bitmap access for restore tools/libxc: use superpages during restore of HVM guest tools/libxc/xc_dom_x86.c| 5 - tools/libxc/xc_private.h| 5 + tools/libxc/xc_sr_common.c | 41 + tools/libxc/xc_sr_common.h | 93 ++-- tools/libxc/xc_sr_restore.c | 141 ++ tools/libxc/xc_sr_restore_x86_hvm.c | 288 tools/libxc/xc_sr_restore_x86_pv.c | 72 - 7 files changed, 497 insertions(+), 148 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 1/3] tools/libxc: move SUPERPAGE macros to common header
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by other code in libxc. Move the macros to a header file. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_dom_x86.c | 5 - tools/libxc/xc_private.h | 5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index cb68efcbd3..5aff5cad58 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -43,11 +43,6 @@ #define SUPERPAGE_BATCH_SIZE 512 -#define SUPERPAGE_2MB_SHIFT 9 -#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) -#define SUPERPAGE_1GB_SHIFT 18 -#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) - #define X86_CR0_PE 0x01 #define X86_CR0_ET 0x10 diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 1c27b0fded..d581f850b0 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -66,6 +66,11 @@ struct iovec { #define DECLARE_FLASK_OP struct xen_flask_op op #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op +#define SUPERPAGE_2MB_SHIFT 9 +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) +#define SUPERPAGE_1GB_SHIFT 18 +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) + #undef PAGE_SHIFT #undef PAGE_SIZE #undef PAGE_MASK ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 2/3] tools/libxc: add API for bitmap access for restore
Extend API for managing bitmaps. Each bitmap is now represented by a generic struct xc_sr_bitmap. Switch the existing populated_pfns to this API. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_sr_common.c | 41 +++ tools/libxc/xc_sr_common.h | 68 +++-- tools/libxc/xc_sr_restore.c | 66 ++- 3 files changed, 110 insertions(+), 65 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 79b9c3e940..4d221ca90c 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void) BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8); } +/* + * Expand the tracking structures as needed. + * To avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required number of bits. + */ +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +{ +size_t new_max; +size_t old_sz, new_sz; +void *p; + +/* Round up to the nearest power of two larger than bit, less 1. */ +new_max = bits; +new_max |= new_max >> 1; +new_max |= new_max >> 2; +new_max |= new_max >> 4; +new_max |= new_max >> 8; +new_max |= new_max >> 16; +#ifdef __x86_64__ +new_max |= new_max >> 32; +#endif + +old_sz = bitmap_size(bm->bits + 1); +new_sz = bitmap_size(new_max + 1); +p = realloc(bm->p, new_sz); +if (!p) +return false; + +if (bm->p) +memset(p + old_sz, 0, new_sz - old_sz); +else +memset(p, 0, new_sz); + +bm->p = p; +bm->bits = new_max; +} +return true; +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..da2691ba79 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu size_t basicsz, extdsz, xsavesz, msrsz; }; +struct xc_sr_bitmap +{ +void *p; +unsigned long bits; +}; + struct xc_sr_context { xc_interface *xch; @@ -255,8 +261,7 @@ struct xc_sr_context domid_t xenstore_domid, console_domid; /* Bitmap of currently populated PFNs during restore. */ -unsigned long *populated_pfns; -xen_pfn_t max_populated_pfn; +struct xc_sr_bitmap populated_pfns; /* Sender has invoked verify mode on the stream. */ bool verify; @@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; extern struct xc_sr_restore_ops restore_ops_x86_hvm; +extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits); + +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +return _xc_sr_bitmap_resize(bm, bits); +return true; +} + +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) +{ +free(bm->p); +bm->p = NULL; +} + +static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (!xc_sr_bitmap_resize(bm, bit)) +return false; + +set_bit(bit, bm->p); +return true; +} + +static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bits) +return false; +return !!test_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_clear_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_set_bit(bit, bm->p); +} + +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns); +} + +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +xc_interface *xch = ctx->xch; + +if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) ) +{ +ERROR("Failed to realloc populated_pfns bitmap"); +errno = ENOMEM; +return -1; +} +return 0; +} + struct xc_sr_record { uint32_t type; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index a016678332..d53948e1a6 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Is a pfn populated? - */ -static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn) -{ -if ( pfn > ctx->restore.max_populated_pfn ) -
[Xen-devel] [PATCH v6 3/3] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.h | 25 +++- tools/libxc/xc_sr_restore.c | 75 +- tools/libxc/xc_sr_restore_x86_hvm.c | 288 tools/libxc/xc_sr_restore_x86_pv.c | 72 - 4 files changed, 382 insertions(+), 78 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index da2691ba79..0fa0fbea4d 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -139,6 +139,16 @@ struct xc_sr_restore_ops */ int (*setup)(struct xc_sr_context *ctx); +/** + * Populate PFNs + * + * Given a set of pfns, obtain memory from Xen to fill the physmap for the + * unpopulated subset. + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + + /** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). @@ -224,6 +234,8 @@ struct xc_sr_context int send_back_fd; unsigned long p2m_size; +unsigned long max_pages; +unsigned long tot_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; /* From Image Header. */ @@ -336,6 +348,11 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; } restore; }; } x86_hvm; @@ -455,14 +472,6 @@ static inline int write_record(struct xc_sr_context *ctx, */ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); -/* - * This would ideally be private in restore.c, but is needed by - * x86_pv_localise_page() if we receive pagetables frames ahead of the - * contents of the frames they point at. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types); - #endif /* * Local variables: diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53948e1a6..8cd9289d1a 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Given a set of pfns, obtain memory from Xen to fill the physmap for the - * unpopulated subset. If types is NULL, no page type checking is performed - * and all unpopulated pfns are populated. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types) -{ -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), -*pfns = malloc(count * sizeof(*pfns)); -unsigned i, nr_pfns = 0; -int rc = -1; - -if ( !mfns || !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && - !pfn_is_populated(ctx, original_pfns[i]) ) -{ -rc = pfn_set_populated(ctx, original_pfns[i]); -if ( rc ) -goto err; -pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; -++nr_pfns; -} -} - -if ( nr_pfns ) -{ -rc = xc_domain_populate_physmap_exact( -xch, ctx->domid, nr_pfns, 0, 0, mfns); -if ( rc ) -{ -PERROR("Failed to populate physmap"); -goto err; -} - -for ( i = 0; i < nr_pfns; ++i ) -{ -if ( mfns[i] == INVALID_MFN ) -{ -ERROR("Populate physmap failed for pfn %u", i); -rc = -1; -goto err; -} - -ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]); -} -} - -rc = 0; - - err: -free(pfns); -free(mf
Re: [Xen-devel] [PATCH v5 3/3] tools/libxc: use superpages during restore of HVM guest
On Fri, Aug 25, Olaf Hering wrote: > +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, > + const xen_pfn_t *original_pfns, > + const uint32_t *types) > +{ > +while ( min_pfn < max_pfn ) Beside this off-by-one error, there is still a bug in accounting somewhere. Ballooned guests sometimes fails due to allocation errors. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 0/3] tools/libxc: use superpages
Using superpages on the receiving dom0 will avoid performance regressions. Olaf v5: send correct version, rebase was not fully finished v4: restore trailing "_bit" in bitmap function names keep track of gaps between previous and current batch split alloc functionality in x86_hvm_allocate_pfn v3: clear pointer in xc_sr_bitmap_free some coding style changes use getdomaininfo.max_pages to avoid Over-allocation check trim bitmap function names, drop trailing "_bit" add some comments v2: split into individual commits Olaf Hering (3): tools/libxc: move SUPERPAGE macros to common header tools/libxc: add API for bitmap access for restore tools/libxc: use superpages during restore of HVM guest tools/libxc/xc_dom_x86.c| 5 - tools/libxc/xc_private.h| 5 + tools/libxc/xc_sr_common.c | 41 ++ tools/libxc/xc_sr_common.h | 94 ++-- tools/libxc/xc_sr_restore.c | 141 ++ tools/libxc/xc_sr_restore_x86_hvm.c | 276 tools/libxc/xc_sr_restore_x86_pv.c | 72 +- 7 files changed, 486 insertions(+), 148 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 3/3] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.h | 26 ++-- tools/libxc/xc_sr_restore.c | 75 +- tools/libxc/xc_sr_restore_x86_hvm.c | 276 tools/libxc/xc_sr_restore_x86_pv.c | 72 +- 4 files changed, 371 insertions(+), 78 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index da2691ba79..26526d8896 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -139,6 +139,16 @@ struct xc_sr_restore_ops */ int (*setup)(struct xc_sr_context *ctx); +/** + * Populate PFNs + * + * Given a set of pfns, obtain memory from Xen to fill the physmap for the + * unpopulated subset. + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + + /** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). @@ -224,6 +234,8 @@ struct xc_sr_context int send_back_fd; unsigned long p2m_size; +unsigned long max_pages; +unsigned long tot_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; /* From Image Header. */ @@ -336,6 +348,12 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; +xen_pfn_t min_pfn; } restore; }; } x86_hvm; @@ -455,14 +473,6 @@ static inline int write_record(struct xc_sr_context *ctx, */ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); -/* - * This would ideally be private in restore.c, but is needed by - * x86_pv_localise_page() if we receive pagetables frames ahead of the - * contents of the frames they point at. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types); - #endif /* * Local variables: diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53948e1a6..8cd9289d1a 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Given a set of pfns, obtain memory from Xen to fill the physmap for the - * unpopulated subset. If types is NULL, no page type checking is performed - * and all unpopulated pfns are populated. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types) -{ -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), -*pfns = malloc(count * sizeof(*pfns)); -unsigned i, nr_pfns = 0; -int rc = -1; - -if ( !mfns || !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && - !pfn_is_populated(ctx, original_pfns[i]) ) -{ -rc = pfn_set_populated(ctx, original_pfns[i]); -if ( rc ) -goto err; -pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; -++nr_pfns; -} -} - -if ( nr_pfns ) -{ -rc = xc_domain_populate_physmap_exact( -xch, ctx->domid, nr_pfns, 0, 0, mfns); -if ( rc ) -{ -PERROR("Failed to populate physmap"); -goto err; -} - -for ( i = 0; i < nr_pfns; ++i ) -{ -if ( mfns[i] == INVALID_MFN ) -{ -ERROR("Populate physmap failed for pfn %u", i); -rc = -1; -goto err; -} - -ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]); -} -} - -
[Xen-devel] [PATCH v5 2/3] tools/libxc: add API for bitmap access for restore
Extend API for managing bitmaps. Each bitmap is now represented by a generic struct xc_sr_bitmap. Switch the existing populated_pfns to this API. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_sr_common.c | 41 +++ tools/libxc/xc_sr_common.h | 68 +++-- tools/libxc/xc_sr_restore.c | 66 ++- 3 files changed, 110 insertions(+), 65 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 79b9c3e940..4d221ca90c 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void) BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8); } +/* + * Expand the tracking structures as needed. + * To avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required number of bits. + */ +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +{ +size_t new_max; +size_t old_sz, new_sz; +void *p; + +/* Round up to the nearest power of two larger than bit, less 1. */ +new_max = bits; +new_max |= new_max >> 1; +new_max |= new_max >> 2; +new_max |= new_max >> 4; +new_max |= new_max >> 8; +new_max |= new_max >> 16; +#ifdef __x86_64__ +new_max |= new_max >> 32; +#endif + +old_sz = bitmap_size(bm->bits + 1); +new_sz = bitmap_size(new_max + 1); +p = realloc(bm->p, new_sz); +if (!p) +return false; + +if (bm->p) +memset(p + old_sz, 0, new_sz - old_sz); +else +memset(p, 0, new_sz); + +bm->p = p; +bm->bits = new_max; +} +return true; +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..da2691ba79 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu size_t basicsz, extdsz, xsavesz, msrsz; }; +struct xc_sr_bitmap +{ +void *p; +unsigned long bits; +}; + struct xc_sr_context { xc_interface *xch; @@ -255,8 +261,7 @@ struct xc_sr_context domid_t xenstore_domid, console_domid; /* Bitmap of currently populated PFNs during restore. */ -unsigned long *populated_pfns; -xen_pfn_t max_populated_pfn; +struct xc_sr_bitmap populated_pfns; /* Sender has invoked verify mode on the stream. */ bool verify; @@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; extern struct xc_sr_restore_ops restore_ops_x86_hvm; +extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits); + +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +return _xc_sr_bitmap_resize(bm, bits); +return true; +} + +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) +{ +free(bm->p); +bm->p = NULL; +} + +static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (!xc_sr_bitmap_resize(bm, bit)) +return false; + +set_bit(bit, bm->p); +return true; +} + +static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bits) +return false; +return !!test_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_clear_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_set_bit(bit, bm->p); +} + +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns); +} + +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +xc_interface *xch = ctx->xch; + +if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) ) +{ +ERROR("Failed to realloc populated_pfns bitmap"); +errno = ENOMEM; +return -1; +} +return 0; +} + struct xc_sr_record { uint32_t type; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index a016678332..d53948e1a6 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Is a pfn populated? - */ -static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn) -{ -if ( pfn > ctx->restore.max_populated_pfn ) -
[Xen-devel] [PATCH v5 1/3] tools/libxc: move SUPERPAGE macros to common header
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by other code in libxc. Move the macros to a header file. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_dom_x86.c | 5 - tools/libxc/xc_private.h | 5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index cb68efcbd3..5aff5cad58 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -43,11 +43,6 @@ #define SUPERPAGE_BATCH_SIZE 512 -#define SUPERPAGE_2MB_SHIFT 9 -#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) -#define SUPERPAGE_1GB_SHIFT 18 -#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) - #define X86_CR0_PE 0x01 #define X86_CR0_ET 0x10 diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 1c27b0fded..d581f850b0 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -66,6 +66,11 @@ struct iovec { #define DECLARE_FLASK_OP struct xen_flask_op op #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op +#define SUPERPAGE_2MB_SHIFT 9 +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) +#define SUPERPAGE_1GB_SHIFT 18 +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) + #undef PAGE_SHIFT #undef PAGE_SIZE #undef PAGE_MASK ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 2/3] tools/libxc: add API for bitmap access for restore
Extend API for managing bitmaps. Each bitmap is now represented by a generic struct xc_sr_bitmap. Switch the existing populated_pfns to this API. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_sr_common.c | 41 +++ tools/libxc/xc_sr_common.h | 68 +++-- tools/libxc/xc_sr_restore.c | 66 ++- 3 files changed, 110 insertions(+), 65 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 79b9c3e940..4d221ca90c 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void) BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8); } +/* + * Expand the tracking structures as needed. + * To avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required number of bits. + */ +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +{ +size_t new_max; +size_t old_sz, new_sz; +void *p; + +/* Round up to the nearest power of two larger than bit, less 1. */ +new_max = bits; +new_max |= new_max >> 1; +new_max |= new_max >> 2; +new_max |= new_max >> 4; +new_max |= new_max >> 8; +new_max |= new_max >> 16; +#ifdef __x86_64__ +new_max |= new_max >> 32; +#endif + +old_sz = bitmap_size(bm->bits + 1); +new_sz = bitmap_size(new_max + 1); +p = realloc(bm->p, new_sz); +if (!p) +return false; + +if (bm->p) +memset(p + old_sz, 0, new_sz - old_sz); +else +memset(p, 0, new_sz); + +bm->p = p; +bm->bits = new_max; +} +return true; +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..8901af112a 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu size_t basicsz, extdsz, xsavesz, msrsz; }; +struct xc_sr_bitmap +{ +void *p; +unsigned long bits; +}; + struct xc_sr_context { xc_interface *xch; @@ -255,8 +261,7 @@ struct xc_sr_context domid_t xenstore_domid, console_domid; /* Bitmap of currently populated PFNs during restore. */ -unsigned long *populated_pfns; -xen_pfn_t max_populated_pfn; +struct xc_sr_bitmap populated_pfns; /* Sender has invoked verify mode on the stream. */ bool verify; @@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; extern struct xc_sr_restore_ops restore_ops_x86_hvm; +extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits); + +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +return _xc_sr_bitmap_resize(bm, bits); +return true; +} + +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) +{ +free(bm->p); +bm->p = NULL; +} + +static inline bool xc_sr_set(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (!xc_sr_bitmap_resize(bm, bit)) +return false; + +set_bit(bit, bm->p); +return true; +} + +static inline bool xc_sr_test(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bits) +return false; +return !!test_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_clear(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_clear_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_set(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_set_bit(bit, bm->p); +} + +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +return xc_sr_test(pfn, &ctx->restore.populated_pfns); +} + +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +xc_interface *xch = ctx->xch; + +if ( !xc_sr_set(pfn, &ctx->restore.populated_pfns) ) +{ +ERROR("Failed to realloc populated_pfns bitmap"); +errno = ENOMEM; +return -1; +} +return 0; +} + struct xc_sr_record { uint32_t type; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index a016678332..d53948e1a6 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Is a pfn populated? - */ -static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn) -{ -if ( pfn > ctx->restore.max_populated_pfn ) -return false; -retu
[Xen-devel] [PATCH v4 3/3] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.h | 26 ++-- tools/libxc/xc_sr_restore.c | 75 +- tools/libxc/xc_sr_restore_x86_hvm.c | 274 tools/libxc/xc_sr_restore_x86_pv.c | 72 +- 4 files changed, 369 insertions(+), 78 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 8901af112a..4c99f3653e 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -139,6 +139,16 @@ struct xc_sr_restore_ops */ int (*setup)(struct xc_sr_context *ctx); +/** + * Populate PFNs + * + * Given a set of pfns, obtain memory from Xen to fill the physmap for the + * unpopulated subset. + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + + /** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). @@ -224,6 +234,8 @@ struct xc_sr_context int send_back_fd; unsigned long p2m_size; +unsigned long max_pages; +unsigned long tot_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; /* From Image Header. */ @@ -336,6 +348,12 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; +xen_pfn_t min_pfn; } restore; }; } x86_hvm; @@ -455,14 +473,6 @@ static inline int write_record(struct xc_sr_context *ctx, */ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); -/* - * This would ideally be private in restore.c, but is needed by - * x86_pv_localise_page() if we receive pagetables frames ahead of the - * contents of the frames they point at. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types); - #endif /* * Local variables: diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53948e1a6..8cd9289d1a 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Given a set of pfns, obtain memory from Xen to fill the physmap for the - * unpopulated subset. If types is NULL, no page type checking is performed - * and all unpopulated pfns are populated. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types) -{ -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), -*pfns = malloc(count * sizeof(*pfns)); -unsigned i, nr_pfns = 0; -int rc = -1; - -if ( !mfns || !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && - !pfn_is_populated(ctx, original_pfns[i]) ) -{ -rc = pfn_set_populated(ctx, original_pfns[i]); -if ( rc ) -goto err; -pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; -++nr_pfns; -} -} - -if ( nr_pfns ) -{ -rc = xc_domain_populate_physmap_exact( -xch, ctx->domid, nr_pfns, 0, 0, mfns); -if ( rc ) -{ -PERROR("Failed to populate physmap"); -goto err; -} - -for ( i = 0; i < nr_pfns; ++i ) -{ -if ( mfns[i] == INVALID_MFN ) -{ -ERROR("Populate physmap failed for pfn %u", i); -rc = -1; -goto err; -} - -ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]); -} -} - -
[Xen-devel] [PATCH v4 0/3] tools/libxc: use superpages
Using superpages on the receiving dom0 will avoid performance regressions. Olaf v4: restore trailing "_bit" in bitmap function names keep track of gaps between previous and current batch split alloc functionality in x86_hvm_allocate_pfn v3: clear pointer in xc_sr_bitmap_free some coding style changes use getdomaininfo.max_pages to avoid Over-allocation check trim bitmap function names, drop trailing "_bit" add some comments v2: split into individual commits Olaf Hering (3): tools/libxc: move SUPERPAGE macros to common header tools/libxc: add API for bitmap access for restore tools/libxc: use superpages during restore of HVM guest tools/libxc/xc_dom_x86.c| 5 - tools/libxc/xc_private.h| 5 + tools/libxc/xc_sr_common.c | 41 ++ tools/libxc/xc_sr_common.h | 94 +++-- tools/libxc/xc_sr_restore.c | 141 ++- tools/libxc/xc_sr_restore_x86_hvm.c | 274 tools/libxc/xc_sr_restore_x86_pv.c | 72 +- 7 files changed, 484 insertions(+), 148 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 1/3] tools/libxc: move SUPERPAGE macros to common header
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by other code in libxc. Move the macros to a header file. Signed-off-by: Olaf Hering Acked-by: Wei Liu --- tools/libxc/xc_dom_x86.c | 5 - tools/libxc/xc_private.h | 5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index cb68efcbd3..5aff5cad58 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -43,11 +43,6 @@ #define SUPERPAGE_BATCH_SIZE 512 -#define SUPERPAGE_2MB_SHIFT 9 -#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) -#define SUPERPAGE_1GB_SHIFT 18 -#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) - #define X86_CR0_PE 0x01 #define X86_CR0_ET 0x10 diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 1c27b0fded..d581f850b0 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -66,6 +66,11 @@ struct iovec { #define DECLARE_FLASK_OP struct xen_flask_op op #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op +#define SUPERPAGE_2MB_SHIFT 9 +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) +#define SUPERPAGE_1GB_SHIFT 18 +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) + #undef PAGE_SHIFT #undef PAGE_SIZE #undef PAGE_MASK ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest
On Fri, Aug 25, Olaf Hering wrote: > I think with the new check of max_pages an overallocation can not happen > anymore. If at some point the domU still has room for a superpage, it > will be allocated. In case the batch does not fully fill the superpage, > the holes will be freed. In the next batch no superpage can be allocated > anymore, but single pages will be used. There is one case where Over-allocation will happen: assume x86_hvm_populate_pfns gets a batch of pfns that fit trigger the allocation of a 1G page. All pfns will fit into that partly populated superpage. Then the guest has a hole right after the max_pfn of that batch. The next batch will start in a new superpage. As a result the freeing part of x86_hvm_populate_pfns will not consider the previous superpage anymore. Now 512MB are allocated, but unpopulated. To handle this case the min_pfn/max_pfn have to be global so that the current batch can free allocated pfns from previous batches. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest
On Fri, Aug 25, Wei Liu wrote: > Maybe a middle ground is to scan the batch to see if pfns can be fit > into a whole super page? I don't think you can get a batch as big as 1G > but there should be a lot of 2M batches? I think with the new check of max_pages an overallocation can not happen anymore. If at some point the domU still has room for a superpage, it will be allocated. In case the batch does not fully fill the superpage, the holes will be freed. In the next batch no superpage can be allocated anymore, but single pages will be used. This punching of holes might be inefficent, the win is the usage of superpages in case of contiguous pfns. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest
On Fri, Aug 25, Wei Liu wrote: > I'm still unconvinced this works all the time because it still needs the > assumption that the stream contains contiguous pfns. This is how it is done today. If the pfns start to arrive in another order the format has to be changed to send a memory layout in advance. I will check if some sort of retry logic can be added. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 2/3] tools/libxc: add API for bitmap access for restore
Extend API for managing bitmaps. Each bitmap is now represented by a generic struct xc_sr_bitmap. Switch the existing populated_pfns to this API. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.c | 41 +++ tools/libxc/xc_sr_common.h | 68 +++-- tools/libxc/xc_sr_restore.c | 66 ++- 3 files changed, 110 insertions(+), 65 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 79b9c3e940..4d221ca90c 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void) BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8); } +/* + * Expand the tracking structures as needed. + * To avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required number of bits. + */ +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +{ +size_t new_max; +size_t old_sz, new_sz; +void *p; + +/* Round up to the nearest power of two larger than bit, less 1. */ +new_max = bits; +new_max |= new_max >> 1; +new_max |= new_max >> 2; +new_max |= new_max >> 4; +new_max |= new_max >> 8; +new_max |= new_max >> 16; +#ifdef __x86_64__ +new_max |= new_max >> 32; +#endif + +old_sz = bitmap_size(bm->bits + 1); +new_sz = bitmap_size(new_max + 1); +p = realloc(bm->p, new_sz); +if (!p) +return false; + +if (bm->p) +memset(p + old_sz, 0, new_sz - old_sz); +else +memset(p, 0, new_sz); + +bm->p = p; +bm->bits = new_max; +} +return true; +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..8901af112a 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu size_t basicsz, extdsz, xsavesz, msrsz; }; +struct xc_sr_bitmap +{ +void *p; +unsigned long bits; +}; + struct xc_sr_context { xc_interface *xch; @@ -255,8 +261,7 @@ struct xc_sr_context domid_t xenstore_domid, console_domid; /* Bitmap of currently populated PFNs during restore. */ -unsigned long *populated_pfns; -xen_pfn_t max_populated_pfn; +struct xc_sr_bitmap populated_pfns; /* Sender has invoked verify mode on the stream. */ bool verify; @@ -343,6 +348,65 @@ extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; extern struct xc_sr_restore_ops restore_ops_x86_hvm; +extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits); + +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +return _xc_sr_bitmap_resize(bm, bits); +return true; +} + +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) +{ +free(bm->p); +bm->p = NULL; +} + +static inline bool xc_sr_set(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (!xc_sr_bitmap_resize(bm, bit)) +return false; + +set_bit(bit, bm->p); +return true; +} + +static inline bool xc_sr_test(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bits) +return false; +return !!test_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_clear(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_clear_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_set(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_set_bit(bit, bm->p); +} + +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +return xc_sr_test(pfn, &ctx->restore.populated_pfns); +} + +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +xc_interface *xch = ctx->xch; + +if ( !xc_sr_set(pfn, &ctx->restore.populated_pfns) ) +{ +ERROR("Failed to realloc populated_pfns bitmap"); +errno = ENOMEM; +return -1; +} +return 0; +} + struct xc_sr_record { uint32_t type; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index a016678332..d53948e1a6 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Is a pfn populated? - */ -static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn) -{ -if ( pfn > ctx->restore.max_populated_pfn ) -return false; -return test_bit(pfn, ctx-
[Xen-devel] [PATCH v3 3/3] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.h | 25 +++-- tools/libxc/xc_sr_restore.c | 75 ++--- tools/libxc/xc_sr_restore_x86_hvm.c | 202 tools/libxc/xc_sr_restore_x86_pv.c | 72 - 4 files changed, 296 insertions(+), 78 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 8901af112a..bf2758e91a 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -139,6 +139,16 @@ struct xc_sr_restore_ops */ int (*setup)(struct xc_sr_context *ctx); +/** + * Populate PFNs + * + * Given a set of pfns, obtain memory from Xen to fill the physmap for the + * unpopulated subset. + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + + /** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). @@ -224,6 +234,8 @@ struct xc_sr_context int send_back_fd; unsigned long p2m_size; +unsigned long max_pages; +unsigned long tot_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; /* From Image Header. */ @@ -336,6 +348,11 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; } restore; }; } x86_hvm; @@ -455,14 +472,6 @@ static inline int write_record(struct xc_sr_context *ctx, */ int read_record(struct xc_sr_context *ctx, int fd, struct xc_sr_record *rec); -/* - * This would ideally be private in restore.c, but is needed by - * x86_pv_localise_page() if we receive pagetables frames ahead of the - * contents of the frames they point at. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types); - #endif /* * Local variables: diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53948e1a6..8cd9289d1a 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Given a set of pfns, obtain memory from Xen to fill the physmap for the - * unpopulated subset. If types is NULL, no page type checking is performed - * and all unpopulated pfns are populated. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types) -{ -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), -*pfns = malloc(count * sizeof(*pfns)); -unsigned i, nr_pfns = 0; -int rc = -1; - -if ( !mfns || !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && - !pfn_is_populated(ctx, original_pfns[i]) ) -{ -rc = pfn_set_populated(ctx, original_pfns[i]); -if ( rc ) -goto err; -pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; -++nr_pfns; -} -} - -if ( nr_pfns ) -{ -rc = xc_domain_populate_physmap_exact( -xch, ctx->domid, nr_pfns, 0, 0, mfns); -if ( rc ) -{ -PERROR("Failed to populate physmap"); -goto err; -} - -for ( i = 0; i < nr_pfns; ++i ) -{ -if ( mfns[i] == INVALID_MFN ) -{ -ERROR("Populate physmap failed for pfn %u", i); -rc = -1; -goto err; -} - -ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]); -} -} - -rc = 0; - - err: -free(pfns); -f
[Xen-devel] [PATCH v3 0/3] tools/libxc: use superpages
Using superpages on the receiving dom0 will avoid performance regressions. Olaf v3: clear pointer in xc_sr_bitmap_free some coding style changes use getdomaininfo.max_pages to avoid Over-allocation check trim bitmap function names, drop trailing "_bit" add some comments v2: split into individual commits Olaf Hering (3): tools/libxc: move SUPERPAGE macros to common header tools/libxc: add API for bitmap access for restore tools/libxc: use superpages during restore of HVM guest tools/libxc/xc_dom_x86.c| 5 - tools/libxc/xc_private.h| 5 + tools/libxc/xc_sr_common.c | 41 tools/libxc/xc_sr_common.h | 93 +++-- tools/libxc/xc_sr_restore.c | 141 ++--- tools/libxc/xc_sr_restore_x86_hvm.c | 202 tools/libxc/xc_sr_restore_x86_pv.c | 72 - 7 files changed, 411 insertions(+), 148 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 1/3] tools/libxc: move SUPERPAGE macros to common header
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by other code in libxc. Move the macros to a header file. Signed-off-by: Olaf Hering --- tools/libxc/xc_dom_x86.c | 5 - tools/libxc/xc_private.h | 5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index cb68efcbd3..5aff5cad58 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -43,11 +43,6 @@ #define SUPERPAGE_BATCH_SIZE 512 -#define SUPERPAGE_2MB_SHIFT 9 -#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) -#define SUPERPAGE_1GB_SHIFT 18 -#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) - #define X86_CR0_PE 0x01 #define X86_CR0_ET 0x10 diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 1c27b0fded..d581f850b0 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -66,6 +66,11 @@ struct iovec { #define DECLARE_FLASK_OP struct xen_flask_op op #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op +#define SUPERPAGE_2MB_SHIFT 9 +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) +#define SUPERPAGE_1GB_SHIFT 18 +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) + #undef PAGE_SHIFT #undef PAGE_SIZE #undef PAGE_MASK ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] tools/libxc: add API for bitmap access for restore
On Thu, Aug 17, Olaf Hering wrote: > Extend API for managing bitmaps. Each bitmap is now represented by a > generic struct xc_sr_bitmap. > +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned > long bits) > +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) > +static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) > +static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm) > +static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct > xc_sr_bitmap *bm) > +static inline int xc_sr_test_and_set_bit(unsigned long bit, struct > xc_sr_bitmap *bm) Any objection to remove the trailing '_bit' from these four function names? Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Aug 23, Olaf Hering wrote: > The value of p2m_size does not represent the actual number of pages > assigned to a domU. This info is stored in getdomaininfo.max_pages, > which is currently not used by restore. I will see if using this value > will avoid triggering the Over-allocation check. This untested change ontop of this series (done with git diff -w -b base..HEAD) does some accounting to avoid Over-allocation: diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 26c45fdd6d..e0321ea224 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -234,6 +234,8 @@ struct xc_sr_context int send_back_fd; unsigned long p2m_size; +unsigned long max_pages; +unsigned long tot_pages; xc_hypercall_buffer_t dirty_bitmap_hbuf; /* From Image Header. */ @@ -375,6 +377,7 @@ static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bi static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) { free(bm->p); +bm->p = NULL; } static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index 1f9fe25b8f..eff24d3805 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -758,6 +758,9 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, return -1; } +/* See xc_domain_getinfo */ +ctx.restore.max_pages = ctx.dominfo.max_memkb >> (PAGE_SHIFT-10); +ctx.restore.tot_pages = ctx.dominfo.nr_pages; ctx.restore.p2m_size = nr_pfns; if ( ctx.dominfo.hvm ) diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c b/tools/libxc/xc_sr_restore_x86_hvm.c index 60454148db..f2932dafb7 100644 --- a/tools/libxc/xc_sr_restore_x86_hvm.c +++ b/tools/libxc/xc_sr_restore_x86_hvm.c @@ -278,7 +278,8 @@ static int pfn_set_allocated(struct xc_sr_context *ctx, xen_pfn_t pfn) static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, xen_pfn_t pfn) { xc_interface *xch = ctx->xch; -bool success = false; +struct xc_sr_bitmap *bm; +bool success = false, do_sp; int rc = -1, done; unsigned int order; unsigned long i; @@ -303,15 +304,18 @@ static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, xen_pfn_t pfn) return -1; } DPRINTF("idx_1g %lu idx_2m %lu\n", idx_1g, idx_2m); -if (!xc_sr_test_and_set_bit(idx_1g, &ctx->x86_hvm.restore.attempted_1g)) { + +bm = &ctx->x86_hvm.restore.attempted_1g; order = SUPERPAGE_1GB_SHIFT; count = 1UL << order; +do_sp = ctx->restore.tot_pages + count <= ctx->restore.max_pages; +if ( do_sp && !xc_sr_test_and_set_bit(idx_1g, bm) ) { base_pfn = (pfn >> order) << order; extnt = base_pfn; done = xc_domain_populate_physmap(xch, ctx->domid, 1, order, 0, &extnt); DPRINTF("1G base_pfn %" PRI_xen_pfn " done %d\n", base_pfn, done); if ( done > 0 ) { -struct xc_sr_bitmap *bm = &ctx->x86_hvm.restore.attempted_2m; +bm = &ctx->x86_hvm.restore.attempted_2m; success = true; stat_1g = done; for ( i = 0; i < (count >> SUPERPAGE_2MB_SHIFT); i++ ) @@ -319,9 +323,11 @@ static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, xen_pfn_t pfn) } } -if (!xc_sr_test_and_set_bit(idx_2m, &ctx->x86_hvm.restore.attempted_2m)) { +bm = &ctx->x86_hvm.restore.attempted_2m; order = SUPERPAGE_2MB_SHIFT; count = 1UL << order; +do_sp = ctx->restore.tot_pages + count <= ctx->restore.max_pages; +if ( do_sp && !xc_sr_test_and_set_bit(idx_2m, bm) ) { base_pfn = (pfn >> order) << order; extnt = base_pfn; done = xc_domain_populate_physmap(xch, ctx->domid, 1, order, 0, &extnt); @@ -344,6 +350,7 @@ static int x86_hvm_allocate_pfn(struct xc_sr_context *ctx, xen_pfn_t pfn) if ( success == true ) { do { count--; +ctx->restore.tot_pages++; rc = pfn_set_allocated(ctx, base_pfn + count); if ( rc ) break; @@ -396,6 +403,7 @@ static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, PERROR("Failed to release pfn %" PRI_xen_pfn, min_pfn); goto err; } +ctx->restore.tot_pages--; } min_pfn++; } Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/3] tools/libxc: use superpages during restore of HVM guest
On Wed, Aug 23, Wei Liu wrote: > On Tue, Aug 22, 2017 at 05:53:25PM +0200, Olaf Hering wrote: > > In my testing I have seen the case of over-allocation. Thats why I > > implemented the freeing of unpopulated parts. It would be nice to know > > how many pages are actually coming. I think this info is not available. > Not sure I follow. What do you mean by "how many pages are actually > coming"? This meant the expected number of pages to populate. The value of p2m_size does not represent the actual number of pages assigned to a domU. This info is stored in getdomaininfo.max_pages, which is currently not used by restore. I will see if using this value will avoid triggering the Over-allocation check. > > On the other side, the first iteration sends the pfns linear. This is > > when the allocation actually happens. So the over-allocation will only > > trigger near the end, if a 1G range is allocated but only a few pages > > will be stored into this range. > This could be making too many assumptions on the data stream. With the usage of max_pages some assumptions can be avoided. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/3] tools/libxc: use superpages during restore of HVM guest
On Tue, Aug 22, Olaf Hering wrote: > In my testing I have seen the case of over-allocation. Thats why I > implemented the freeing of unpopulated parts. It would be nice to know > how many pages are actually coming. I think this info is not available. If the receiving dom0 recognizes an over-allocation it must know how much memory a domU is supposed to have. Perhaps there is a way to retreive this info. An interesting case is ballooning during migration. Is the new amount of pages per domU actually transfered to the receiving domU? If the domU is ballooned up the other side may see the incoming domU as over-allocated. If it is ballooned down pages may be missing. Was this ever considered? Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/3] tools/libxc: use superpages during restore of HVM guest
On Tue, Aug 22, Wei Liu wrote: > On Thu, Aug 17, 2017 at 07:01:33PM +0200, Olaf Hering wrote: > > +/* No superpage in 1st 2MB due to VGA hole */ > > +xc_sr_set_bit(0, &ctx->x86_hvm.restore.attempted_1g); > > +xc_sr_set_bit(0, &ctx->x86_hvm.restore.attempted_2m); > I don't quite get this. What about other holes such as MMIO? This just copies what meminit_hvm does. Is there a way to know where the MMIO hole is? Maybe I just missed the MMIO part. In the worst case I think a super page is allocated, which is later split into single pages. > One potential issue I can see with your algorithm is, if the stream of > page info contains pages from different super pages, the risk of going > over memory limit is high (hence failing the migration). > > Is my concern unfounded? In my testing I have seen the case of over-allocation. Thats why I implemented the freeing of unpopulated parts. It would be nice to know how many pages are actually coming. I think this info is not available. On the other side, the first iteration sends the pfns linear. This is when the allocation actually happens. So the over-allocation will only trigger near the end, if a 1G range is allocated but only a few pages will be stored into this range. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 0/3] tools/libxc: use superpages
Using superpages on the receiving dom0 will avoid performance regressions. Olaf v2: split into individual commits Olaf Hering (3): tools/libxc: move SUPERPAGE macros to common header tools/libxc: add API for bitmap access for restore tools/libxc: use superpages during restore of HVM guest tools/libxc/xc_dom_x86.c| 5 - tools/libxc/xc_private.h| 5 + tools/libxc/xc_sr_common.c | 41 tools/libxc/xc_sr_common.h | 82 +++- tools/libxc/xc_sr_restore.c | 136 +-- tools/libxc/xc_sr_restore_x86_hvm.c | 180 tools/libxc/xc_sr_restore_x86_pv.c | 72 ++- 7 files changed, 381 insertions(+), 140 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 3/3] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.h | 15 +++ tools/libxc/xc_sr_restore.c | 70 +- tools/libxc/xc_sr_restore_x86_hvm.c | 180 tools/libxc/xc_sr_restore_x86_pv.c | 72 ++- 4 files changed, 267 insertions(+), 70 deletions(-) diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index 5d78f461af..26c45fdd6d 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -139,6 +139,16 @@ struct xc_sr_restore_ops */ int (*setup)(struct xc_sr_context *ctx); +/** + * Populate PFNs + * + * Given a set of pfns, obtain memory from Xen to fill the physmap for the + * unpopulated subset. + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + + /** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). @@ -336,6 +346,11 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; } restore; }; } x86_hvm; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index d53948e1a6..1f9fe25b8f 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,74 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Given a set of pfns, obtain memory from Xen to fill the physmap for the - * unpopulated subset. If types is NULL, no page type checking is performed - * and all unpopulated pfns are populated. - */ -int populate_pfns(struct xc_sr_context *ctx, unsigned count, - const xen_pfn_t *original_pfns, const uint32_t *types) -{ -xc_interface *xch = ctx->xch; -xen_pfn_t *mfns = malloc(count * sizeof(*mfns)), -*pfns = malloc(count * sizeof(*pfns)); -unsigned i, nr_pfns = 0; -int rc = -1; - -if ( !mfns || !pfns ) -{ -ERROR("Failed to allocate %zu bytes for populating the physmap", - 2 * count * sizeof(*mfns)); -goto err; -} - -for ( i = 0; i < count; ++i ) -{ -if ( (!types || (types && - (types[i] != XEN_DOMCTL_PFINFO_XTAB && - types[i] != XEN_DOMCTL_PFINFO_BROKEN))) && - !pfn_is_populated(ctx, original_pfns[i]) ) -{ -rc = pfn_set_populated(ctx, original_pfns[i]); -if ( rc ) -goto err; -pfns[nr_pfns] = mfns[nr_pfns] = original_pfns[i]; -++nr_pfns; -} -} - -if ( nr_pfns ) -{ -rc = xc_domain_populate_physmap_exact( -xch, ctx->domid, nr_pfns, 0, 0, mfns); -if ( rc ) -{ -PERROR("Failed to populate physmap"); -goto err; -} - -for ( i = 0; i < nr_pfns; ++i ) -{ -if ( mfns[i] == INVALID_MFN ) -{ -ERROR("Populate physmap failed for pfn %u", i); -rc = -1; -goto err; -} - -ctx->restore.ops.set_gfn(ctx, pfns[i], mfns[i]); -} -} - -rc = 0; - - err: -free(pfns); -free(mfns); - -return rc; -} - /* * Given a list of pfns, their types, and a block of page data from the * stream, populate and record their types, map the relevant subset and copy @@ -161,7 +93,7 @@ static int process_page_data(struct xc_sr_context *ctx, unsigned count, goto err; } -rc = populate_pfns(ctx, count, pfns, types); +rc = ctx->restore.ops.populate_pfns(ctx, count, pfns, types); if ( rc ) { ERROR("Failed to populate pfns for batch of %u pages", count); diff --git a/tools/libxc/xc_sr_restore_x86_hvm.c b/tools/libxc/xc_sr_restore_x86_hvm.c index 1dca85354a..60454148db 100644 --- a/tools/libxc/xc_sr_restore_x86_hvm.c +++ b/tools/libxc/xc_sr_restore_x86_hvm.c @@ -135,6 +135,8 @@ static int x86_hvm_localise_p
[Xen-devel] [PATCH v2 1/3] tools/libxc: move SUPERPAGE macros to common header
The macros SUPERPAGE_2MB_SHIFT and SUPERPAGE_1GB_SHIFT will be used by other code in libxc. Move the macros to a header file. Signed-off-by: Olaf Hering --- tools/libxc/xc_dom_x86.c | 5 - tools/libxc/xc_private.h | 5 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index cb68efcbd3..5aff5cad58 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -43,11 +43,6 @@ #define SUPERPAGE_BATCH_SIZE 512 -#define SUPERPAGE_2MB_SHIFT 9 -#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) -#define SUPERPAGE_1GB_SHIFT 18 -#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) - #define X86_CR0_PE 0x01 #define X86_CR0_ET 0x10 diff --git a/tools/libxc/xc_private.h b/tools/libxc/xc_private.h index 1c27b0fded..d581f850b0 100644 --- a/tools/libxc/xc_private.h +++ b/tools/libxc/xc_private.h @@ -66,6 +66,11 @@ struct iovec { #define DECLARE_FLASK_OP struct xen_flask_op op #define DECLARE_PLATFORM_OP struct xen_platform_op platform_op +#define SUPERPAGE_2MB_SHIFT 9 +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) +#define SUPERPAGE_1GB_SHIFT 18 +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) + #undef PAGE_SHIFT #undef PAGE_SIZE #undef PAGE_MASK ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/3] tools/libxc: add API for bitmap access for restore
Extend API for managing bitmaps. Each bitmap is now represented by a generic struct xc_sr_bitmap. Switch the existing populated_pfns to this API. Signed-off-by: Olaf Hering --- tools/libxc/xc_sr_common.c | 41 +++ tools/libxc/xc_sr_common.h | 67 +++-- tools/libxc/xc_sr_restore.c | 66 ++-- 3 files changed, 109 insertions(+), 65 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 79b9c3e940..4d221ca90c 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -155,6 +155,47 @@ static void __attribute__((unused)) build_assertions(void) BUILD_BUG_ON(sizeof(struct xc_sr_rec_hvm_params)!= 8); } +/* + * Expand the tracking structures as needed. + * To avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required number of bits. + */ +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +{ +size_t new_max; +size_t old_sz, new_sz; +void *p; + +/* Round up to the nearest power of two larger than bit, less 1. */ +new_max = bits; +new_max |= new_max >> 1; +new_max |= new_max >> 2; +new_max |= new_max >> 4; +new_max |= new_max >> 8; +new_max |= new_max >> 16; +#ifdef __x86_64__ +new_max |= new_max >> 32; +#endif + +old_sz = bitmap_size(bm->bits + 1); +new_sz = bitmap_size(new_max + 1); +p = realloc(bm->p, new_sz); +if (!p) +return false; + +if (bm->p) +memset(p + old_sz, 0, new_sz - old_sz); +else +memset(p, 0, new_sz); + +bm->p = p; +bm->bits = new_max; +} +return true; +} + /* * Local variables: * mode: C diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..5d78f461af 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -172,6 +172,12 @@ struct xc_sr_x86_pv_restore_vcpu size_t basicsz, extdsz, xsavesz, msrsz; }; +struct xc_sr_bitmap +{ +void *p; +unsigned long bits; +}; + struct xc_sr_context { xc_interface *xch; @@ -255,8 +261,7 @@ struct xc_sr_context domid_t xenstore_domid, console_domid; /* Bitmap of currently populated PFNs during restore. */ -unsigned long *populated_pfns; -xen_pfn_t max_populated_pfn; +struct xc_sr_bitmap populated_pfns; /* Sender has invoked verify mode on the stream. */ bool verify; @@ -343,6 +348,64 @@ extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; extern struct xc_sr_restore_ops restore_ops_x86_hvm; +extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits); + +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +return _xc_sr_bitmap_resize(bm, bits); +return true; +} + +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) +{ +free(bm->p); +} + +static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (!xc_sr_bitmap_resize(bm, bit)) +return false; + +set_bit(bit, bm->p); +return true; +} + +static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bits) +return false; +return !!test_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_clear_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_clear_bit(bit, bm->p); +} + +static inline int xc_sr_test_and_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +return test_and_set_bit(bit, bm->p); +} + +static inline bool pfn_is_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +return xc_sr_test_bit(pfn, &ctx->restore.populated_pfns); +} + +static inline int pfn_set_populated(struct xc_sr_context *ctx, xen_pfn_t pfn) +{ +xc_interface *xch = ctx->xch; + +if ( !xc_sr_set_bit(pfn, &ctx->restore.populated_pfns) ) +{ +ERROR("Failed to realloc populated_pfns bitmap"); +errno = ENOMEM; +return -1; +} +return 0; +} + struct xc_sr_record { uint32_t type; diff --git a/tools/libxc/xc_sr_restore.c b/tools/libxc/xc_sr_restore.c index a016678332..d53948e1a6 100644 --- a/tools/libxc/xc_sr_restore.c +++ b/tools/libxc/xc_sr_restore.c @@ -68,64 +68,6 @@ static int read_headers(struct xc_sr_context *ctx) return 0; } -/* - * Is a pfn populated? - */ -static bool pfn_is_populated(const struct xc_sr_context *ctx, xen_pfn_t pfn) -{ -if ( pfn > ctx->restore.max_populated_pfn ) -return false; -return test_bit(pf
Re: [Xen-devel] [PATCH v1] tools/libxc: use superpages during restore of HVM guest
On Fri, Aug 04, Wei Liu wrote: > Can you split this patch into several: > 1. code movement > 2. refactoring / introduction of new hooks > 3. implementing the new algorithm I tried that, it did not work well. But, I can try again if required. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1] tools/libxc: use superpages during restore of HVM guest
On Wed, Aug 02, Olaf Hering wrote: > +++ b/tools/libxc/xc_sr_restore_x86_hvm.c > +#define SUPERPAGE_2MB_SHIFT 9 > +#define SUPERPAGE_2MB_NR_PFNS (1UL << SUPERPAGE_2MB_SHIFT) > +#define SUPERPAGE_1GB_SHIFT 18 > +#define SUPERPAGE_1GB_NR_PFNS (1UL << SUPERPAGE_1GB_SHIFT) I think these can be moved to a header file. xc_dom_x86.c and xc_sr_restore_x86_hvm.c use xc_dom.h. > +static int x86_hvm_populate_pfns(struct xc_sr_context *ctx, unsigned count, > + const xen_pfn_t *original_pfns, > + const uint32_t *types) > +{ > +xc_interface *xch = ctx->xch; > +xen_pfn_t min_pfn = original_pfns[0], max_pfn = original_pfns[0]; > +unsigned i; > +int rc = -1; > + > +for ( i = 0; i < count; ++i ) > +{ > +if (original_pfns[i] < min_pfn) > +min_pfn = original_pfns[i]; > +if (original_pfns[i] > max_pfn) > +max_pfn = original_pfns[i]; > +if ( (types[i] != XEN_DOMCTL_PFINFO_XTAB && > + types[i] != XEN_DOMCTL_PFINFO_BROKEN) && > + !pfn_is_populated(ctx, original_pfns[i]) ) Are these types used at all for a HVM domU? Otherwise this condition can be simplified to just check the populated state. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] backport docs changes for Xen 4.9.1
On Thu, Aug 03, Jan Beulich wrote: > >>> On 01.08.17 at 11:43, wrote: > > Please backport the following changes for docs/ for the Xen 4.9.1 > > release: > > > > aa4eb460bc docs: add pod variant of xl-numa-placement > > 458df9f374 docs: add pod variant of xl-network-configuration.5 > > 4359b86f31 docs: add pod variant of xen-pv-channel.7 > I'm not convinced these qualify for backporting. What's the > justification? Less paperwork for me, avoids maintaining three patches. And it fixes the references within man pages for those who have no pandoc while building Xen. Not sure if that is just SUSE. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v1] tools/libxc: use superpages during restore of HVM guest
During creating of a HVM domU meminit_hvm() tries to map superpages. After save/restore or migration this mapping is lost, everything is allocated in single pages. This causes a performance degradition after migration. Add neccessary code to preallocate a superpage for the chunk of pfns that is received. In case a pfn was not populated on the sending side it must be freed on the receiving side to avoid over-allocation. The existing code for x86_pv is moved unmodified into its own file. Signed-off-by: Olaf Hering --- based on RELEASE-4.9.0 tools/libxc/xc_sr_common.c | 41 tools/libxc/xc_sr_common.h | 79 +++- tools/libxc/xc_sr_restore.c | 135 +- tools/libxc/xc_sr_restore_x86_hvm.c | 183 tools/libxc/xc_sr_restore_x86_pv.c | 72 +- 5 files changed, 376 insertions(+), 134 deletions(-) diff --git a/tools/libxc/xc_sr_common.c b/tools/libxc/xc_sr_common.c index 48fa676f4e..9b68a064eb 100644 --- a/tools/libxc/xc_sr_common.c +++ b/tools/libxc/xc_sr_common.c @@ -156,6 +156,47 @@ static void __attribute__((unused)) build_assertions(void) } /* + * Expand the tracking structures as needed. + * To avoid realloc()ing too excessively, the size increased to the nearest power + * of two large enough to contain the required number of bits. + */ +bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +{ +size_t new_max; +size_t old_sz, new_sz; +void *p; + +/* Round up to the nearest power of two larger than bit, less 1. */ +new_max = bits; +new_max |= new_max >> 1; +new_max |= new_max >> 2; +new_max |= new_max >> 4; +new_max |= new_max >> 8; +new_max |= new_max >> 16; +#ifdef __x86_64__ +new_max |= new_max >> 32; +#endif + +old_sz = bitmap_size(bm->bits + 1); +new_sz = bitmap_size(new_max + 1); +p = realloc(bm->p, new_sz); +if (!p) +return false; + +if (bm->p) +memset(p + old_sz, 0, new_sz - old_sz); +else +memset(p, 0, new_sz); + +bm->p = p; +bm->bits = new_max; +} +return true; +} + +/* * Local variables: * mode: C * c-file-style: "BSD" diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22af4e..ad1a2e6e02 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -140,6 +140,13 @@ struct xc_sr_restore_ops int (*setup)(struct xc_sr_context *ctx); /** + * Populate PFNs + * + */ +int (*populate_pfns)(struct xc_sr_context *ctx, unsigned count, + const xen_pfn_t *original_pfns, const uint32_t *types); + +/** * Process an individual record from the stream. The caller shall take * care of processing common records (e.g. END, PAGE_DATA). * @@ -172,6 +179,12 @@ struct xc_sr_x86_pv_restore_vcpu size_t basicsz, extdsz, xsavesz, msrsz; }; +struct xc_sr_bitmap +{ +void *p; +unsigned long bits; +}; + struct xc_sr_context { xc_interface *xch; @@ -255,8 +268,7 @@ struct xc_sr_context domid_t xenstore_domid, console_domid; /* Bitmap of currently populated PFNs during restore. */ -unsigned long *populated_pfns; -xen_pfn_t max_populated_pfn; +struct xc_sr_bitmap populated_pfns; /* Sender has invoked verify mode on the stream. */ bool verify; @@ -331,6 +343,11 @@ struct xc_sr_context /* HVM context blob. */ void *context; size_t contextsz; + +/* Bitmap of currently allocated PFNs during restore. */ +struct xc_sr_bitmap attempted_1g; +struct xc_sr_bitmap attempted_2m; +struct xc_sr_bitmap allocated_pfns; } restore; }; } x86_hvm; @@ -343,6 +360,64 @@ extern struct xc_sr_save_ops save_ops_x86_hvm; extern struct xc_sr_restore_ops restore_ops_x86_pv; extern struct xc_sr_restore_ops restore_ops_x86_hvm; +extern bool _xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits); + +static inline bool xc_sr_bitmap_resize(struct xc_sr_bitmap *bm, unsigned long bits) +{ +if (bits > bm->bits) +return _xc_sr_bitmap_resize(bm, bits); +return true; +} + +static inline void xc_sr_bitmap_free(struct xc_sr_bitmap *bm) +{ +free(bm->p); +} + +static inline bool xc_sr_set_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (!xc_sr_bitmap_resize(bm, bit)) +return false; + +set_bit(bit, bm->p); +return true; +} + +static inline bool xc_sr_test_bit(unsigned long bit, struct xc_sr_bitmap *bm) +{ +if (bit > bm->bit
Re: [Xen-devel] [PATCH] vtpmmgr: make inline functions static
Ping On Fri, Jun 23, Olaf Hering wrote: > gcc7 is more strict with functions marked as inline. They are not > automatically inlined. Instead a function call is generated, but the > actual code is not visible by the linker. > > Do a mechanical change and mark every 'inline' as 'static inline'. For > simpler review the static goes into an extra line. > > Signed-off-by: Olaf Hering > --- > stubdom/vtpmmgr/marshal.h | 76 > ++ > stubdom/vtpmmgr/tcg.h | 14 > stubdom/vtpmmgr/tpm2_marshal.h | 58 > stubdom/vtpmmgr/tpmrsa.h | 1 + > 4 files changed, 149 insertions(+) > > diff --git a/stubdom/vtpmmgr/marshal.h b/stubdom/vtpmmgr/marshal.h > index d826f19d89..dce19c6439 100644 > --- a/stubdom/vtpmmgr/marshal.h > +++ b/stubdom/vtpmmgr/marshal.h > @@ -47,16 +47,19 @@ typedef enum UnpackPtr { > UNPACK_ALLOC > } UnpackPtr; > > +static > inline BYTE* pack_BYTE(BYTE* ptr, BYTE t) { ... signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] Stable Branch Maintainer in Wiki and MAINTAINERS
According to https://wiki.xenproject.org/wiki/Xen_Project_Maintenance_Releases in the "Stable Branch Maintainer" section someone is supposed to be added to the MAINTAINERS file. Where in the staging-4.9 branch was this change done? I guess an equivalent of 1f4ea16035 ("update Xen version to 4.8.1-pre") is missing. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] backport docs changes for Xen 4.9.1
Please backport the following changes for docs/ for the Xen 4.9.1 release: aa4eb460bc docs: add pod variant of xl-numa-placement 458df9f374 docs: add pod variant of xl-network-configuration.5 4359b86f31 docs: add pod variant of xen-pv-channel.7 55924baf22 docs: correct paragraph indention in xen-tscmode 763267e315 docs: replace xm with xl in xen-tscmode Thanks. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] xen-disk: use g_new0 to fix build
g_malloc0_n is available since glib-2.24. To allow build with older glib versions use the generic g_new0, which is already used in many other places in the code. Fixes commit 3284fad728 ("xen-disk: add support for multi-page shared rings") Signed-off-by: Olaf Hering --- hw/block/xen_disk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index d42ed7070d..536e2ee735 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -1232,7 +1232,7 @@ static int blk_connect(struct XenDevice *xendev) return -1; } -domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t)); +domids = g_new0(uint32_t, blkdev->nr_ring_ref); for (i = 0; i < blkdev->nr_ring_ref; i++) { domids[i] = blkdev->xendev.dom; } ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] [PATCH] xen-disk: use g_malloc0 to fix build
On Fri, Jul 28, Eric Blake wrote: > This version is prone to multiplication overflow (well, maybe not, but > you have to audit for that). Wouldn't it be better to use: What could go wrong? qemu will die either way, I think. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH] xen-disk: use g_malloc0 to fix build
g_malloc0_n is available since glib-2.24. To allow build with older glib versions use the generic g_malloc0, which is already used in many other places in the code. Fixes commit 3284fad728 ("xen-disk: add support for multi-page shared rings") Signed-off-by: Olaf Hering --- hw/block/xen_disk.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index d42ed7070d..71deec17b0 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -1232,7 +1232,7 @@ static int blk_connect(struct XenDevice *xendev) return -1; } -domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t)); +domids = g_malloc0(blkdev->nr_ring_ref * sizeof(uint32_t)); for (i = 0; i < blkdev->nr_ring_ref; i++) { domids[i] = blkdev->xendev.dom; } ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PULL 3/3] xen-disk: add support for multi-page shared rings
On Tue, Jun 27, Stefano Stabellini wrote: > From: Paul Durrant > The blkif protocol has had provision for negotiation of multi-page shared > rings for some time now and many guest OS have support in their frontend > drivers. > +++ b/hw/block/xen_disk.c > +domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t)); According to [1] g_malloc0_n requires at least glib-2.24. As a result compilation of qemu-2.10 fails in SLE11, which has just glib-2.22. Olaf [1] https://developer.gnome.org/glib/stable/glib-Memory-Allocation.html signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 0/3] docs: convert manpages to pod
To remove the buildtime dependency to pandoc/ghc some manpages are converted from markdown to pod format. This will provide more manpages which are referenced in xl(1) and xl.cfg(5). This series does not cover xen-vbd-interface.7 because converting the lists used in this manpage was not straight forward. Olaf v3: - add NAME/DESCRIPION, minor formating tweaks, whitespace v2: fold each add/remove into a single commit Cc: Ian Jackson Cc: Wei Liu To: xen-devel@lists.xen.org Olaf Hering (3): docs: add pod variant of xen-pv-channel.7 docs: add pod variant of xl-network-configuration.5 docs: add pod variant of xl-numa-placement docs/man/xen-pv-channel.markdown.7 | 106 --- docs/man/xen-pv-channel.pod.7 | 188 ...n.markdown.5 => xl-network-configuration.pod.5} | 196 ++--- ...lacement.markdown.7 => xl-numa-placement.pod.7} | 166 +++-- 4 files changed, 435 insertions(+), 221 deletions(-) delete mode 100644 docs/man/xen-pv-channel.markdown.7 create mode 100644 docs/man/xen-pv-channel.pod.7 rename docs/man/{xl-network-configuration.markdown.5 => xl-network-configuration.pod.5} (55%) rename docs/man/{xl-numa-placement.markdown.7 => xl-numa-placement.pod.7} (74%) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 1/3] docs: add pod variant of xen-pv-channel.7
Convert source for xen-pv-channel.7 from markdown to pod. This removes the buildtime requirement for pandoc, and subsequently the need for ghc, in the chain for BuildRequires of xen.rpm. Signed-off-by: Olaf Hering --- docs/man/xen-pv-channel.markdown.7 | 106 - docs/man/xen-pv-channel.pod.7 | 188 + 2 files changed, 188 insertions(+), 106 deletions(-) delete mode 100644 docs/man/xen-pv-channel.markdown.7 create mode 100644 docs/man/xen-pv-channel.pod.7 diff --git a/docs/man/xen-pv-channel.markdown.7 b/docs/man/xen-pv-channel.markdown.7 deleted file mode 100644 index 1c6149dae0..00 --- a/docs/man/xen-pv-channel.markdown.7 +++ /dev/null @@ -1,106 +0,0 @@ -Xen PV Channels -=== - -A channel is a low-bandwidth private byte stream similar to a serial -link. Typical uses of channels are - - 1. to provide initial configuration information to a VM on boot - (example use: CloudStack's cloud-early-config service) - 2. to signal/query an in-guest agent - (example use: oVirt's guest agent) - -Channels are similar to virtio-serial devices and emulated serial links. -Channels are intended to be used in the implementation of libvirt s -when running on Xen. - -Note: if an application requires a high-bandwidth link then it should use -vchan instead. - -How to use channels: an example - -Consider a cloud deployment where VMs are cloned from pre-made templates, -and customised on first boot by an in-guest agent which sets the IP address, -hostname, ssh keys etc. To install the system the cloud administrator would -first: - - 1. Install a guest as normal (no channel configuration necessary) - 2. Install the in-guest agent specific to the cloud software. This will - prepare the guest to communicate over the channel, and also prepare - the guest to be cloned safely (sometimes known as "sysprepping") - 3. Shutdown the guest - 4. Register the guest as a template with the cloud orchestration software - 5. Install the cloud orchestration agent in dom0 - -At runtime, when a cloud tenant requests that a VM is created from the template, -the sequence of events would be: (assuming a Linux domU) - - 1. A VM is "cloned" from the template - 2. A unique Unix domain socket path in dom0 is allocated - (e.g. /my/cloud/software/talk/to/domain/) - 3. Domain configuration is created for the VM, listing the channel - name expected by the in-guest agent. In xl syntax this would be: - - channel = [ "connection=socket, name=org.my.cloud.software.agent.version1, - path = /my/cloud/software/talk/to/domain/" ] - - 4. The VM is started - 5. In dom0 the cloud orchestration agent connects to the Unix domain - socket, writes a handshake message and waits for a reply - 6. Assuming the guest kernel has CONFIG_HVC_XEN_FRONTEND set then the console - driver will generate a hotplug event - 7. A udev rule is activated by the hotplug event. - - The udev rule would look something like: - - SUBSYSTEM=="xen", DEVPATH=="/devices/console-[0-9]", RUN+="xen-console-setup" - - where the "xen-console-setup" script would read the channel name and - make a symlink in /dev/xen-channel/org.my.cloud.software.agent.version1 - - 8. The in-guest agent uses inotify to see the creation of the /dev/xen-channel - symlink and opens the device. - 9. The in-guest agent completes the handshake with the dom0 agent - 10. The dom0 agent transmits the unique VM configuration: hostname, IP - address, ssh keys etc etc - 11. The in-guest agent receives the configuration and applies it. - -Using channels avoids having to use a temporary disk device or network -connection. - -Design recommendations and pitfalls - -It's necessary to install channel-specific software (an "agent") into the guest -before you can use a channel. By default a channel will appear as a device -which could be mistaken for a serial port or regular console. It is known -that some software will proactively seek out serial ports and issue AT commands -at them; make sure such software is disabled! - -Since channels are identified by names, application authors must ensure their -channel names are unique to avoid clashes. We recommend that channel names -include parts unique to the application such as a domain names. To assist -prevent clashes we recommend authors add their names to our global channel -registry at the end of this document. - -Limitations - -Hotplug and unplug of channels is not currently implemented. - -Channel name registry -- - -It is important that channel names are globally unique. To help ensure -that no-one's name clashes with yours, please add yours to this list. - -Key: -N: Name -C: Contact -
[Xen-devel] [PATCH v3 2/3] docs: add pod variant of xl-network-configuration.5
Convert source for xl-network-configuration.5 from markdown to pod. This removes the buildtime requirement for pandoc, and subsequently the need for ghc, in the chain for BuildRequires of xen.rpm. Signed-off-by: Olaf Hering --- ...n.markdown.5 => xl-network-configuration.pod.5} | 196 ++--- 1 file changed, 137 insertions(+), 59 deletions(-) rename docs/man/{xl-network-configuration.markdown.5 => xl-network-configuration.pod.5} (55%) diff --git a/docs/man/xl-network-configuration.markdown.5 b/docs/man/xl-network-configuration.pod.5 similarity index 55% rename from docs/man/xl-network-configuration.markdown.5 rename to docs/man/xl-network-configuration.pod.5 index 84c2645ad8..e9ac3c5b9e 100644 --- a/docs/man/xl-network-configuration.markdown.5 +++ b/docs/man/xl-network-configuration.pod.5 @@ -1,6 +1,11 @@ -# XL Network Configuration +=encoding utf8 -## Syntax Overview +=head1 NAME + +xl-network-configuration - XL Network Configuration Syntax + + +=head1 SYNTAX This document specifies the xl config file format vif configuration option. It has the following form: @@ -8,7 +13,7 @@ option. It has the following form: vif = [ '', '', ... ] where each vifspec is in this form: - + [=|,] For example: @@ -24,11 +29,13 @@ These might be specified in the domain config file like this: More formally, the string is a series of comma-separated keyword/value pairs. All keywords are optional. -Each device has a `DEVID` which is its index within the vif list, starting from 0. +Each device has a C which is its index within the vif list, starting from 0. -## Keywords -### mac +=head1 Keywords + + +=head2 mac If specified then this option specifies the MAC address inside the guest of this VIF device. The value is a 48-bit number represented as @@ -36,89 +43,137 @@ six groups of two hexadecimal digits, separated by colons (:). The default if this keyword is not specified is to be automatically generate a MAC address inside the space assigned to Xen's -[Organizationally Unique Identifier][oui] (00:16:3e). +Lhttp://en.wikipedia.org/wiki/Organizationally_Unique_Identifier> (00:16:3e). If you are choosing a MAC address then it is strongly recommend to follow one of the following strategies: - * Generate a random sequence of 6 byte, set the locally administered -bit (bit 2 of the first byte) and clear the multicast bit (bit 1 -of the first byte). In other words the first byte should have the -bit pattern xx10 (where x is a randomly generated bit) and the -remaining 5 bytes are randomly generated See -[http://en.wikipedia.org/wiki/MAC_address] for more details the -structure of a MAC address. - * Allocate an address from within the space defined by your -organization's OUI (if you have one) following your organization's -procedures for doing so. - * Allocate an address from within the space defined by Xen's OUI -(00:16:3e). Taking care not to clash with other users of the -physical network segment where this VIF will reside. +=over + +=item * + +Generate a random sequence of 6 byte, set the locally administered +bit (bit 2 of the first byte) and clear the multicast bit (bit 1 +of the first byte). In other words the first byte should have the +bit pattern xx10 (where x is a randomly generated bit) and the +remaining 5 bytes are randomly generated See +[http://en.wikipedia.org/wiki/MAC_address] for more details the +structure of a MAC address. + + +=item * + +Allocate an address from within the space defined by your +organization's OUI (if you have one) following your organization's +procedures for doing so. + + +=item * + +Allocate an address from within the space defined by Xen's OUI +(00:16:3e). Taking care not to clash with other users of the +physical network segment where this VIF will reside. + + +=back If you have an OUI for your own use then that is the preferred strategy. Otherwise in general you should prefer to generate a random MAC and set the locally administered bit since this allows for more bits of randomness than using the Xen OUI. -### bridge + +=head2 bridge Specifies the name of the network bridge which this VIF should be -added to. The default is `xenbr0`. The bridge must be configured using -your distribution's network configuration tools. See the [wiki][net] +added to. The default is C. The bridge must be configured using +your distribution's network configuration tools. See the Lhttp://wiki.xen.org/wiki/HostConfiguration/Networking> for guidance and examples. -### gatewaydev + +=head2 gatewaydev Specifies the name of the network interface which has an IP and which is in the network the VIF should communicate with. This is used in the host -by the vif-route hotplug script. See [wiki][vifroute] for guidance and +by the vif-route hotplug script. See Lhttp://wiki.xen.org/wiki/Vif-rou
[Xen-devel] [PATCH v3 3/3] docs: add pod variant of xl-numa-placement
Convert source for xl-numa-placement.7 from markdown to pod. This removes the buildtime requirement for pandoc, and subsequently the need for ghc, in the chain for BuildRequires of xen.rpm. Signed-off-by: Olaf Hering --- ...lacement.markdown.7 => xl-numa-placement.pod.7} | 166 ++--- 1 file changed, 110 insertions(+), 56 deletions(-) rename docs/man/{xl-numa-placement.markdown.7 => xl-numa-placement.pod.7} (74%) diff --git a/docs/man/xl-numa-placement.markdown.7 b/docs/man/xl-numa-placement.pod.7 similarity index 74% rename from docs/man/xl-numa-placement.markdown.7 rename to docs/man/xl-numa-placement.pod.7 index f863492093..54a444172e 100644 --- a/docs/man/xl-numa-placement.markdown.7 +++ b/docs/man/xl-numa-placement.pod.7 @@ -1,6 +1,12 @@ -# Guest Automatic NUMA Placement in libxl and xl # +=encoding utf8 -## Rationale ## +=head1 NAME + +Guest Automatic NUMA Placement in libxl and xl + +=head1 DESCRIPTION + +=head2 Rationale NUMA (which stands for Non-Uniform Memory Access) means that the memory accessing times of a program running on a CPU depends on the relative @@ -17,13 +23,14 @@ running memory-intensive workloads on a shared host. In fact, the cost of accessing non node-local memory locations is very high, and the performance degradation is likely to be noticeable. -For more information, have a look at the [Xen NUMA Introduction][numa_intro] +For more information, have a look at the Lhttp://wiki.xen.org/wiki/Xen_NUMA_Introduction> page on the Wiki. -## Xen and NUMA machines: the concept of _node-affinity_ ## + +=head2 Xen and NUMA machines: the concept of I The Xen hypervisor deals with NUMA machines throughout the concept of -_node-affinity_. The node-affinity of a domain is the set of NUMA nodes +I. The node-affinity of a domain is the set of NUMA nodes of the host where the memory for the domain is being allocated (mostly, at domain creation time). This is, at least in principle, different and unrelated with the vCPU (hard and soft, see below) scheduling affinity, @@ -42,15 +49,16 @@ it is very important to "place" the domain correctly when it is fist created, as the most of its memory is allocated at that time and can not (for now) be moved easily. -### Placing via pinning and cpupools ### + +=head2 Placing via pinning and cpupools The simplest way of placing a domain on a NUMA node is setting the hard scheduling affinity of the domain's vCPUs to the pCPUs of the node. This also goes under the name of vCPU pinning, and can be done through the "cpus=" option in the config file (more about this below). Another option is to pool together the pCPUs spanning the node and put the domain in -such a _cpupool_ with the "pool=" config option (as documented in our -[Wiki][cpupools_howto]). +such a I with the "pool=" config option (as documented in our +Lhttp://wiki.xen.org/wiki/Cpupools_Howto>). In both the above cases, the domain will not be able to execute outside the specified set of pCPUs for any reasons, even if all those pCPUs are @@ -59,7 +67,8 @@ busy doing something else while there are others, idle, pCPUs. So, when doing this, local memory accesses are 100% guaranteed, but that may come at he cost of some load imbalances. -### NUMA aware scheduling ### + +=head2 NUMA aware scheduling If using the credit1 scheduler, and starting from Xen 4.3, the scheduler itself always tries to run the domain's vCPUs on one of the nodes in @@ -87,21 +96,37 @@ workload. Notice that, for each vCPU, the following three scenarios are possbile: - * a vCPU *is pinned* to some pCPUs and *does not have* any soft affinity -In this case, the vCPU is always scheduled on one of the pCPUs to which -it is pinned, without any specific peference among them. - * a vCPU *has* its own soft affinity and *is not* pinned to any particular -pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the -scheduler will try to have it running on one of the pCPUs in its soft -affinity; - * a vCPU *has* its own vCPU soft affinity and *is also* pinned to some -pCPUs. In this case, the vCPU is always scheduled on one of the pCPUs -onto which it is pinned, with, among them, a preference for the ones -that also forms its soft affinity. In case pinning and soft affinity -form two disjoint sets of pCPUs, pinning "wins", and the soft affinity -is just ignored. - -## Guest placement in xl ## +=over + +=item * + +a vCPU I to some pCPUs and I any soft affinity +In this case, the vCPU is always scheduled on one of the pCPUs to which +it is pinned, without any specific peference among them. + + +=item * + +a vCPU I its own soft affinity and I pinned to any particular +pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the +scheduler will try to have it running on one of the pCPUs in its soft +affinity; + + +=item * + +a vCPU I its own vCPU sof
[Xen-devel] [PATCH v2 3/3] docs: add pod variant of xl-numa-placement
Convert source for xl-numa-placement.7 from markdown to pod. This removes the buildtime requirement for pandoc, and subsequently the need for ghc, in the chain for BuildRequires of xen.rpm. Signed-off-by: Olaf Hering --- ...lacement.markdown.7 => xl-numa-placement.pod.7} | 164 ++--- 1 file changed, 108 insertions(+), 56 deletions(-) rename docs/man/{xl-numa-placement.markdown.7 => xl-numa-placement.pod.7} (74%) diff --git a/docs/man/xl-numa-placement.markdown.7 b/docs/man/xl-numa-placement.pod.7 similarity index 74% rename from docs/man/xl-numa-placement.markdown.7 rename to docs/man/xl-numa-placement.pod.7 index f863492093..5cad33be48 100644 --- a/docs/man/xl-numa-placement.markdown.7 +++ b/docs/man/xl-numa-placement.pod.7 @@ -1,6 +1,10 @@ -# Guest Automatic NUMA Placement in libxl and xl # +=encoding utf8 -## Rationale ## + +=head1 Guest Automatic NUMA Placement in libxl and xl + + +=head2 Rationale NUMA (which stands for Non-Uniform Memory Access) means that the memory accessing times of a program running on a CPU depends on the relative @@ -17,13 +21,14 @@ running memory-intensive workloads on a shared host. In fact, the cost of accessing non node-local memory locations is very high, and the performance degradation is likely to be noticeable. -For more information, have a look at the [Xen NUMA Introduction][numa_intro] +For more information, have a look at the Lhttp://wiki.xen.org/wiki/Xen_NUMA_Introduction> page on the Wiki. -## Xen and NUMA machines: the concept of _node-affinity_ ## + +=head2 Xen and NUMA machines: the concept of I The Xen hypervisor deals with NUMA machines throughout the concept of -_node-affinity_. The node-affinity of a domain is the set of NUMA nodes +I. The node-affinity of a domain is the set of NUMA nodes of the host where the memory for the domain is being allocated (mostly, at domain creation time). This is, at least in principle, different and unrelated with the vCPU (hard and soft, see below) scheduling affinity, @@ -42,15 +47,16 @@ it is very important to "place" the domain correctly when it is fist created, as the most of its memory is allocated at that time and can not (for now) be moved easily. -### Placing via pinning and cpupools ### + +=head2 Placing via pinning and cpupools The simplest way of placing a domain on a NUMA node is setting the hard scheduling affinity of the domain's vCPUs to the pCPUs of the node. This also goes under the name of vCPU pinning, and can be done through the "cpus=" option in the config file (more about this below). Another option is to pool together the pCPUs spanning the node and put the domain in -such a _cpupool_ with the "pool=" config option (as documented in our -[Wiki][cpupools_howto]). +such a I with the "pool=" config option (as documented in our +Lhttp://wiki.xen.org/wiki/Cpupools_Howto>). In both the above cases, the domain will not be able to execute outside the specified set of pCPUs for any reasons, even if all those pCPUs are @@ -59,7 +65,8 @@ busy doing something else while there are others, idle, pCPUs. So, when doing this, local memory accesses are 100% guaranteed, but that may come at he cost of some load imbalances. -### NUMA aware scheduling ### + +=head2 NUMA aware scheduling If using the credit1 scheduler, and starting from Xen 4.3, the scheduler itself always tries to run the domain's vCPUs on one of the nodes in @@ -87,21 +94,37 @@ workload. Notice that, for each vCPU, the following three scenarios are possbile: - * a vCPU *is pinned* to some pCPUs and *does not have* any soft affinity -In this case, the vCPU is always scheduled on one of the pCPUs to which -it is pinned, without any specific peference among them. - * a vCPU *has* its own soft affinity and *is not* pinned to any particular -pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the -scheduler will try to have it running on one of the pCPUs in its soft -affinity; - * a vCPU *has* its own vCPU soft affinity and *is also* pinned to some -pCPUs. In this case, the vCPU is always scheduled on one of the pCPUs -onto which it is pinned, with, among them, a preference for the ones -that also forms its soft affinity. In case pinning and soft affinity -form two disjoint sets of pCPUs, pinning "wins", and the soft affinity -is just ignored. - -## Guest placement in xl ## +=over + +=item * + +a vCPU I to some pCPUs and I any soft affinity +In this case, the vCPU is always scheduled on one of the pCPUs to which +it is pinned, without any specific peference among them. + + +=item * + +a vCPU I its own soft affinity and I pinned to any particular +pCPU. In this case, the vCPU can run on every pCPU. Nevertheless, the +scheduler will try to have it running on one of the pCPUs in its soft +affinity; + + +=item * + +a vCPU I its own vCPU soft affinity and I pinned
[Xen-devel] [PATCH v2 1/3] docs: add pod variant of xen-pv-channel.7
Convert source for xen-pv-channel.7 from markdown to pod. This removes the buildtime requirement for pandoc, and subsequently the need for ghc, in the chain for BuildRequires of xen.rpm. Signed-off-by: Olaf Hering --- docs/man/xen-pv-channel.markdown.7 | 106 - docs/man/xen-pv-channel.pod.7 | 189 + 2 files changed, 189 insertions(+), 106 deletions(-) delete mode 100644 docs/man/xen-pv-channel.markdown.7 create mode 100644 docs/man/xen-pv-channel.pod.7 diff --git a/docs/man/xen-pv-channel.markdown.7 b/docs/man/xen-pv-channel.markdown.7 deleted file mode 100644 index 1c6149dae0..00 --- a/docs/man/xen-pv-channel.markdown.7 +++ /dev/null @@ -1,106 +0,0 @@ -Xen PV Channels -=== - -A channel is a low-bandwidth private byte stream similar to a serial -link. Typical uses of channels are - - 1. to provide initial configuration information to a VM on boot - (example use: CloudStack's cloud-early-config service) - 2. to signal/query an in-guest agent - (example use: oVirt's guest agent) - -Channels are similar to virtio-serial devices and emulated serial links. -Channels are intended to be used in the implementation of libvirt s -when running on Xen. - -Note: if an application requires a high-bandwidth link then it should use -vchan instead. - -How to use channels: an example - -Consider a cloud deployment where VMs are cloned from pre-made templates, -and customised on first boot by an in-guest agent which sets the IP address, -hostname, ssh keys etc. To install the system the cloud administrator would -first: - - 1. Install a guest as normal (no channel configuration necessary) - 2. Install the in-guest agent specific to the cloud software. This will - prepare the guest to communicate over the channel, and also prepare - the guest to be cloned safely (sometimes known as "sysprepping") - 3. Shutdown the guest - 4. Register the guest as a template with the cloud orchestration software - 5. Install the cloud orchestration agent in dom0 - -At runtime, when a cloud tenant requests that a VM is created from the template, -the sequence of events would be: (assuming a Linux domU) - - 1. A VM is "cloned" from the template - 2. A unique Unix domain socket path in dom0 is allocated - (e.g. /my/cloud/software/talk/to/domain/) - 3. Domain configuration is created for the VM, listing the channel - name expected by the in-guest agent. In xl syntax this would be: - - channel = [ "connection=socket, name=org.my.cloud.software.agent.version1, - path = /my/cloud/software/talk/to/domain/" ] - - 4. The VM is started - 5. In dom0 the cloud orchestration agent connects to the Unix domain - socket, writes a handshake message and waits for a reply - 6. Assuming the guest kernel has CONFIG_HVC_XEN_FRONTEND set then the console - driver will generate a hotplug event - 7. A udev rule is activated by the hotplug event. - - The udev rule would look something like: - - SUBSYSTEM=="xen", DEVPATH=="/devices/console-[0-9]", RUN+="xen-console-setup" - - where the "xen-console-setup" script would read the channel name and - make a symlink in /dev/xen-channel/org.my.cloud.software.agent.version1 - - 8. The in-guest agent uses inotify to see the creation of the /dev/xen-channel - symlink and opens the device. - 9. The in-guest agent completes the handshake with the dom0 agent - 10. The dom0 agent transmits the unique VM configuration: hostname, IP - address, ssh keys etc etc - 11. The in-guest agent receives the configuration and applies it. - -Using channels avoids having to use a temporary disk device or network -connection. - -Design recommendations and pitfalls - -It's necessary to install channel-specific software (an "agent") into the guest -before you can use a channel. By default a channel will appear as a device -which could be mistaken for a serial port or regular console. It is known -that some software will proactively seek out serial ports and issue AT commands -at them; make sure such software is disabled! - -Since channels are identified by names, application authors must ensure their -channel names are unique to avoid clashes. We recommend that channel names -include parts unique to the application such as a domain names. To assist -prevent clashes we recommend authors add their names to our global channel -registry at the end of this document. - -Limitations - -Hotplug and unplug of channels is not currently implemented. - -Channel name registry -- - -It is important that channel names are globally unique. To help ensure -that no-one's name clashes with yours, please add yours to this list. - -Key: -N: Name -C: Contact -
[Xen-devel] [PATCH v2 0/6] docs: convert manpages to pod
To remove the buildtime dependency to pandoc/ghc some manpages are converted from markdown to pod format. This will provide more manpages which are referenced in xl(1) and xl.cfg(5). This series does not cover xen-vbd-interface.7 because converting the lists used in this manpage was not straight forward. Olaf v2: fold each add/remove into a single commit Cc: Ian Jackson Cc: Wei Liu To: xen-devel@lists.xen.org Olaf Hering (6): docs: add pod variant of xen-pv-channel.7 docs: add pod variant of xl-network-configuration.5 docs: add pod variant of xl-numa-placement docs: remove markdown variant of xen-pv-channel.7 docs: remove markdown variant of xl-network-configuration.5 docs: remove markdown variant of xl-numa-placement.7 docs/man/xen-pv-channel.markdown.7 | 106 --- docs/man/xen-pv-channel.pod.7 | 189 ...n.markdown.5 => xl-network-configuration.pod.5} | 195 ++--- ...lacement.markdown.7 => xl-numa-placement.pod.7} | 164 +++-- 4 files changed, 433 insertions(+), 221 deletions(-) delete mode 100644 docs/man/xen-pv-channel.markdown.7 create mode 100644 docs/man/xen-pv-channel.pod.7 rename docs/man/{xl-network-configuration.markdown.5 => xl-network-configuration.pod.5} (55%) rename docs/man/{xl-numa-placement.markdown.7 => xl-numa-placement.pod.7} (74%) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/3] docs: add pod variant of xl-network-configuration.5
Convert source for xl-network-configuration.5 from markdown to pod. This removes the buildtime requirement for pandoc, and subsequently the need for ghc, in the chain for BuildRequires of xen.rpm. Signed-off-by: Olaf Hering --- ...n.markdown.5 => xl-network-configuration.pod.5} | 195 ++--- 1 file changed, 136 insertions(+), 59 deletions(-) rename docs/man/{xl-network-configuration.markdown.5 => xl-network-configuration.pod.5} (55%) diff --git a/docs/man/xl-network-configuration.markdown.5 b/docs/man/xl-network-configuration.pod.5 similarity index 55% rename from docs/man/xl-network-configuration.markdown.5 rename to docs/man/xl-network-configuration.pod.5 index 84c2645ad8..9fa373e20d 100644 --- a/docs/man/xl-network-configuration.markdown.5 +++ b/docs/man/xl-network-configuration.pod.5 @@ -1,6 +1,10 @@ -# XL Network Configuration +=encoding utf8 -## Syntax Overview + +=head1 XL Network Configuration + + +=head2 Syntax Overview This document specifies the xl config file format vif configuration option. It has the following form: @@ -8,7 +12,7 @@ option. It has the following form: vif = [ '', '', ... ] where each vifspec is in this form: - + [=|,] For example: @@ -24,11 +28,13 @@ These might be specified in the domain config file like this: More formally, the string is a series of comma-separated keyword/value pairs. All keywords are optional. -Each device has a `DEVID` which is its index within the vif list, starting from 0. +Each device has a C which is its index within the vif list, starting from 0. -## Keywords -### mac +=head2 Keywords + + +=head2 mac If specified then this option specifies the MAC address inside the guest of this VIF device. The value is a 48-bit number represented as @@ -36,89 +42,137 @@ six groups of two hexadecimal digits, separated by colons (:). The default if this keyword is not specified is to be automatically generate a MAC address inside the space assigned to Xen's -[Organizationally Unique Identifier][oui] (00:16:3e). +Lhttp://en.wikipedia.org/wiki/Organizationally_Unique_Identifier> (00:16:3e). If you are choosing a MAC address then it is strongly recommend to follow one of the following strategies: - * Generate a random sequence of 6 byte, set the locally administered -bit (bit 2 of the first byte) and clear the multicast bit (bit 1 -of the first byte). In other words the first byte should have the -bit pattern xx10 (where x is a randomly generated bit) and the -remaining 5 bytes are randomly generated See -[http://en.wikipedia.org/wiki/MAC_address] for more details the -structure of a MAC address. - * Allocate an address from within the space defined by your -organization's OUI (if you have one) following your organization's -procedures for doing so. - * Allocate an address from within the space defined by Xen's OUI -(00:16:3e). Taking care not to clash with other users of the -physical network segment where this VIF will reside. +=over + +=item * + +Generate a random sequence of 6 byte, set the locally administered +bit (bit 2 of the first byte) and clear the multicast bit (bit 1 +of the first byte). In other words the first byte should have the +bit pattern xx10 (where x is a randomly generated bit) and the +remaining 5 bytes are randomly generated See +[http://en.wikipedia.org/wiki/MAC_address] for more details the +structure of a MAC address. + + +=item * + +Allocate an address from within the space defined by your +organization's OUI (if you have one) following your organization's +procedures for doing so. + + +=item * + +Allocate an address from within the space defined by Xen's OUI +(00:16:3e). Taking care not to clash with other users of the +physical network segment where this VIF will reside. + + +=back If you have an OUI for your own use then that is the preferred strategy. Otherwise in general you should prefer to generate a random MAC and set the locally administered bit since this allows for more bits of randomness than using the Xen OUI. -### bridge + +=head2 bridge Specifies the name of the network bridge which this VIF should be -added to. The default is `xenbr0`. The bridge must be configured using -your distribution's network configuration tools. See the [wiki][net] +added to. The default is C. The bridge must be configured using +your distribution's network configuration tools. See the Lhttp://wiki.xen.org/wiki/HostConfiguration/Networking> for guidance and examples. -### gatewaydev + +=head2 gatewaydev Specifies the name of the network interface which has an IP and which is in the network the VIF should communicate with. This is used in the host -by the vif-route hotplug script. See [wiki][vifroute] for guidance and +by the vif-route hotplug script. See Lhttp://wiki.xen.org/wiki/Vif-route> for guidance and examples.
Re: [Xen-devel] [PATCH 0/6] docs: convert manpages to pod
On Mon, Jul 24, Ian Jackson wrote: > * There are a lot of other documents in docs/misc/ which are in > markdown format. Some of them are internal. I'm pretty sure we don't > want them _all_ converted. So even if you convert the manpages, these > documents will remain. I did not intent to change other files outside of docs/man/. Just the references to non-existant manpages triggered this series. Sometimes I wish that xen-command-line.5 exists, but google always helped in such occasions. > * It may be that there are other markdown processors which could be > substituted for pandoc - either at runtime or by changing the Xen > Project's default, upstream. After a quick research there is a ruby "ronn" and go/ruby "md2man". Both would have the same dependency issue. Perhaps ruby is less troublesome because YaST is written in ruby. > * Our markdown documents are, I think, intended to be plain text which > can be simply shipped as-is. So for things other than manpages you > can probably just ship them as if they were text files. If the end > user wants to read them in a fancy format (eg HTML) they could install > the relevant processor. Yes. I have to see what HTML we ship. So far it did not cause trouble. > * I don't understand why promoting GHC would be a problem. But, in > the worst case, rather than demoting Xen, you could simply not ship > certain docs (although - see above about plain text). The package ghc is in the tree since nearly 5 years, pandoc since 3 years. The hurdle is likely that a 4GB DVD is filled quickly. It is always a fight to get everyone happy, and ghc is seen as leaf package. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/6] docs: convert manpages to pod
On Mon, Jul 24, Ian Jackson wrote: > Olaf Hering writes ("[PATCH 0/6] docs: convert manpages to pod"): > > To remove the buildtime dependency to pandoc/ghc some manpages are > > converted from markdown to pod format. This will provide more manpages > > which are referenced in xl(1) and xl.cfg(5). > > Sorry to ask this at this stage, but: did I miss some discussion of > why this was desirable ? Likely yes: https://build.opensuse.org/request/show/511948 The point is: if all manpages need to be build then Xen needs to depend on pandoc, which in turn depends on ghc. Neither of them is seen as a "core" package, while "Xen" is a core package. Either ghc becomes a core package, or Xen is moved out of core. In this context "core" means it is part of a install DVD, if I understand the concept of "rings" correctly. Do you see any downside of this series? There is currently a mix of pod and markdown format for the manpages. This change gets it closer to have them all as pod. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 6/6] docs: remove markdown variant of xl-numa-placement.7
A variant in pod format exists now. Signed-off-by: Olaf Hering --- docs/man/xl-numa-placement.markdown.7 | 239 -- 1 file changed, 239 deletions(-) delete mode 100644 docs/man/xl-numa-placement.markdown.7 diff --git a/docs/man/xl-numa-placement.markdown.7 b/docs/man/xl-numa-placement.markdown.7 deleted file mode 100644 index f863492093..00 --- a/docs/man/xl-numa-placement.markdown.7 +++ /dev/null @@ -1,239 +0,0 @@ -# Guest Automatic NUMA Placement in libxl and xl # - -## Rationale ## - -NUMA (which stands for Non-Uniform Memory Access) means that the memory -accessing times of a program running on a CPU depends on the relative -distance between that CPU and that memory. In fact, most of the NUMA -systems are built in such a way that each processor has its local memory, -on which it can operate very fast. On the other hand, getting and storing -data from and on remote memory (that is, memory local to some other processor) -is quite more complex and slow. On these machines, a NUMA node is usually -defined as a set of processor cores (typically a physical CPU package) and -the memory directly attached to the set of cores. - -NUMA awareness becomes very important as soon as many domains start -running memory-intensive workloads on a shared host. In fact, the cost -of accessing non node-local memory locations is very high, and the -performance degradation is likely to be noticeable. - -For more information, have a look at the [Xen NUMA Introduction][numa_intro] -page on the Wiki. - -## Xen and NUMA machines: the concept of _node-affinity_ ## - -The Xen hypervisor deals with NUMA machines throughout the concept of -_node-affinity_. The node-affinity of a domain is the set of NUMA nodes -of the host where the memory for the domain is being allocated (mostly, -at domain creation time). This is, at least in principle, different and -unrelated with the vCPU (hard and soft, see below) scheduling affinity, -which instead is the set of pCPUs where the vCPU is allowed (or prefers) -to run. - -Of course, despite the fact that they belong to and affect different -subsystems, the domain node-affinity and the vCPUs affinity are not -completely independent. -In fact, if the domain node-affinity is not explicitly specified by the -user, via the proper libxl calls or xl config item, it will be computed -basing on the vCPUs' scheduling affinity. - -Notice that, even if the node affinity of a domain may change on-line, -it is very important to "place" the domain correctly when it is fist -created, as the most of its memory is allocated at that time and can -not (for now) be moved easily. - -### Placing via pinning and cpupools ### - -The simplest way of placing a domain on a NUMA node is setting the hard -scheduling affinity of the domain's vCPUs to the pCPUs of the node. This -also goes under the name of vCPU pinning, and can be done through the -"cpus=" option in the config file (more about this below). Another option -is to pool together the pCPUs spanning the node and put the domain in -such a _cpupool_ with the "pool=" config option (as documented in our -[Wiki][cpupools_howto]). - -In both the above cases, the domain will not be able to execute outside -the specified set of pCPUs for any reasons, even if all those pCPUs are -busy doing something else while there are others, idle, pCPUs. - -So, when doing this, local memory accesses are 100% guaranteed, but that -may come at he cost of some load imbalances. - -### NUMA aware scheduling ### - -If using the credit1 scheduler, and starting from Xen 4.3, the scheduler -itself always tries to run the domain's vCPUs on one of the nodes in -its node-affinity. Only if that turns out to be impossible, it will just -pick any free pCPU. Locality of access is less guaranteed than in the -pinning case, but that comes along with better chances to exploit all -the host resources (e.g., the pCPUs). - -Starting from Xen 4.5, credit1 supports two forms of affinity: hard and -soft, both on a per-vCPU basis. This means each vCPU can have its own -soft affinity, stating where such vCPU prefers to execute on. This is -less strict than what it (also starting from 4.5) is called hard affinity, -as the vCPU can potentially run everywhere, it just prefers some pCPUs -rather than others. -In Xen 4.5, therefore, NUMA-aware scheduling is achieved by matching the -soft affinity of the vCPUs of a domain with its node-affinity. - -In fact, as it was for 4.3, if all the pCPUs in a vCPU's soft affinity -are busy, it is possible for the domain to run outside from there. The -idea is that slower execution (due to remote memory accesses) is still -better than no execution at all (as it would happen with pinning). For -this reason, NUMA aware scheduling has the potential of bringing -substantial performances benefits, although this will depend on the -workload. - -Notice that, for each vCPU, the following three sc
[Xen-devel] [PATCH 3/6] docs: add pod variant of xl-numa-placement
Add source in pod format for xl-numa-placement.7 This removes the buildtime requirement for pandoc, and subsequently the need for ghc, in the chain for BuildRequires of xen.rpm. Signed-off-by: Olaf Hering --- docs/man/xl-numa-placement.pod.7 | 291 +++ 1 file changed, 291 insertions(+) create mode 100644 docs/man/xl-numa-placement.pod.7 diff --git a/docs/man/xl-numa-placement.pod.7 b/docs/man/xl-numa-placement.pod.7 new file mode 100644 index 00..5cad33be48 --- /dev/null +++ b/docs/man/xl-numa-placement.pod.7 @@ -0,0 +1,291 @@ +=encoding utf8 + + +=head1 Guest Automatic NUMA Placement in libxl and xl + + +=head2 Rationale + +NUMA (which stands for Non-Uniform Memory Access) means that the memory +accessing times of a program running on a CPU depends on the relative +distance between that CPU and that memory. In fact, most of the NUMA +systems are built in such a way that each processor has its local memory, +on which it can operate very fast. On the other hand, getting and storing +data from and on remote memory (that is, memory local to some other processor) +is quite more complex and slow. On these machines, a NUMA node is usually +defined as a set of processor cores (typically a physical CPU package) and +the memory directly attached to the set of cores. + +NUMA awareness becomes very important as soon as many domains start +running memory-intensive workloads on a shared host. In fact, the cost +of accessing non node-local memory locations is very high, and the +performance degradation is likely to be noticeable. + +For more information, have a look at the Lhttp://wiki.xen.org/wiki/Xen_NUMA_Introduction> +page on the Wiki. + + +=head2 Xen and NUMA machines: the concept of I + +The Xen hypervisor deals with NUMA machines throughout the concept of +I. The node-affinity of a domain is the set of NUMA nodes +of the host where the memory for the domain is being allocated (mostly, +at domain creation time). This is, at least in principle, different and +unrelated with the vCPU (hard and soft, see below) scheduling affinity, +which instead is the set of pCPUs where the vCPU is allowed (or prefers) +to run. + +Of course, despite the fact that they belong to and affect different +subsystems, the domain node-affinity and the vCPUs affinity are not +completely independent. +In fact, if the domain node-affinity is not explicitly specified by the +user, via the proper libxl calls or xl config item, it will be computed +basing on the vCPUs' scheduling affinity. + +Notice that, even if the node affinity of a domain may change on-line, +it is very important to "place" the domain correctly when it is fist +created, as the most of its memory is allocated at that time and can +not (for now) be moved easily. + + +=head2 Placing via pinning and cpupools + +The simplest way of placing a domain on a NUMA node is setting the hard +scheduling affinity of the domain's vCPUs to the pCPUs of the node. This +also goes under the name of vCPU pinning, and can be done through the +"cpus=" option in the config file (more about this below). Another option +is to pool together the pCPUs spanning the node and put the domain in +such a I with the "pool=" config option (as documented in our +Lhttp://wiki.xen.org/wiki/Cpupools_Howto>). + +In both the above cases, the domain will not be able to execute outside +the specified set of pCPUs for any reasons, even if all those pCPUs are +busy doing something else while there are others, idle, pCPUs. + +So, when doing this, local memory accesses are 100% guaranteed, but that +may come at he cost of some load imbalances. + + +=head2 NUMA aware scheduling + +If using the credit1 scheduler, and starting from Xen 4.3, the scheduler +itself always tries to run the domain's vCPUs on one of the nodes in +its node-affinity. Only if that turns out to be impossible, it will just +pick any free pCPU. Locality of access is less guaranteed than in the +pinning case, but that comes along with better chances to exploit all +the host resources (e.g., the pCPUs). + +Starting from Xen 4.5, credit1 supports two forms of affinity: hard and +soft, both on a per-vCPU basis. This means each vCPU can have its own +soft affinity, stating where such vCPU prefers to execute on. This is +less strict than what it (also starting from 4.5) is called hard affinity, +as the vCPU can potentially run everywhere, it just prefers some pCPUs +rather than others. +In Xen 4.5, therefore, NUMA-aware scheduling is achieved by matching the +soft affinity of the vCPUs of a domain with its node-affinity. + +In fact, as it was for 4.3, if all the pCPUs in a vCPU's soft affinity +are busy, it is possible for the domain to run outside from there. The +idea is that slower execution (due to remote memory accesses) is still +better than no execution at all (as it would happen with pinning). For +this reason, NUMA aware scheduling has the
[Xen-devel] [PATCH 4/6] docs: remove markdown variant of xen-pv-channel.7
A variant in pod format exists now. Signed-off-by: Olaf Hering --- docs/man/xen-pv-channel.markdown.7 | 106 - 1 file changed, 106 deletions(-) delete mode 100644 docs/man/xen-pv-channel.markdown.7 diff --git a/docs/man/xen-pv-channel.markdown.7 b/docs/man/xen-pv-channel.markdown.7 deleted file mode 100644 index 1c6149dae0..00 --- a/docs/man/xen-pv-channel.markdown.7 +++ /dev/null @@ -1,106 +0,0 @@ -Xen PV Channels -=== - -A channel is a low-bandwidth private byte stream similar to a serial -link. Typical uses of channels are - - 1. to provide initial configuration information to a VM on boot - (example use: CloudStack's cloud-early-config service) - 2. to signal/query an in-guest agent - (example use: oVirt's guest agent) - -Channels are similar to virtio-serial devices and emulated serial links. -Channels are intended to be used in the implementation of libvirt s -when running on Xen. - -Note: if an application requires a high-bandwidth link then it should use -vchan instead. - -How to use channels: an example - -Consider a cloud deployment where VMs are cloned from pre-made templates, -and customised on first boot by an in-guest agent which sets the IP address, -hostname, ssh keys etc. To install the system the cloud administrator would -first: - - 1. Install a guest as normal (no channel configuration necessary) - 2. Install the in-guest agent specific to the cloud software. This will - prepare the guest to communicate over the channel, and also prepare - the guest to be cloned safely (sometimes known as "sysprepping") - 3. Shutdown the guest - 4. Register the guest as a template with the cloud orchestration software - 5. Install the cloud orchestration agent in dom0 - -At runtime, when a cloud tenant requests that a VM is created from the template, -the sequence of events would be: (assuming a Linux domU) - - 1. A VM is "cloned" from the template - 2. A unique Unix domain socket path in dom0 is allocated - (e.g. /my/cloud/software/talk/to/domain/) - 3. Domain configuration is created for the VM, listing the channel - name expected by the in-guest agent. In xl syntax this would be: - - channel = [ "connection=socket, name=org.my.cloud.software.agent.version1, - path = /my/cloud/software/talk/to/domain/" ] - - 4. The VM is started - 5. In dom0 the cloud orchestration agent connects to the Unix domain - socket, writes a handshake message and waits for a reply - 6. Assuming the guest kernel has CONFIG_HVC_XEN_FRONTEND set then the console - driver will generate a hotplug event - 7. A udev rule is activated by the hotplug event. - - The udev rule would look something like: - - SUBSYSTEM=="xen", DEVPATH=="/devices/console-[0-9]", RUN+="xen-console-setup" - - where the "xen-console-setup" script would read the channel name and - make a symlink in /dev/xen-channel/org.my.cloud.software.agent.version1 - - 8. The in-guest agent uses inotify to see the creation of the /dev/xen-channel - symlink and opens the device. - 9. The in-guest agent completes the handshake with the dom0 agent - 10. The dom0 agent transmits the unique VM configuration: hostname, IP - address, ssh keys etc etc - 11. The in-guest agent receives the configuration and applies it. - -Using channels avoids having to use a temporary disk device or network -connection. - -Design recommendations and pitfalls - -It's necessary to install channel-specific software (an "agent") into the guest -before you can use a channel. By default a channel will appear as a device -which could be mistaken for a serial port or regular console. It is known -that some software will proactively seek out serial ports and issue AT commands -at them; make sure such software is disabled! - -Since channels are identified by names, application authors must ensure their -channel names are unique to avoid clashes. We recommend that channel names -include parts unique to the application such as a domain names. To assist -prevent clashes we recommend authors add their names to our global channel -registry at the end of this document. - -Limitations - -Hotplug and unplug of channels is not currently implemented. - -Channel name registry -- - -It is important that channel names are globally unique. To help ensure -that no-one's name clashes with yours, please add yours to this list. - -Key: -N: Name -C: Contact -D: Short description of use, possibly including a URL to your software - or API - -N: org.xenproject.guest.clipboard.0.1 -C: David Scott -D: Share clipboard data via an in-guest agent. See: - http://wiki.xenproject.org/wiki/Clipboard_sharing_protocol
[Xen-devel] [PATCH 5/6] docs: remove markdown variant of xl-network-configuration.5
A variant in pod format exists now. Signed-off-by: Olaf Hering --- docs/man/xl-network-configuration.markdown.5 | 173 --- 1 file changed, 173 deletions(-) delete mode 100644 docs/man/xl-network-configuration.markdown.5 diff --git a/docs/man/xl-network-configuration.markdown.5 b/docs/man/xl-network-configuration.markdown.5 deleted file mode 100644 index 84c2645ad8..00 --- a/docs/man/xl-network-configuration.markdown.5 +++ /dev/null @@ -1,173 +0,0 @@ -# XL Network Configuration - -## Syntax Overview - -This document specifies the xl config file format vif configuration -option. It has the following form: - -vif = [ '', '', ... ] - -where each vifspec is in this form: - -[=|,] - -For example: - -'mac=00:16:3E:74:3d:76,model=rtl8139,bridge=xenbr0' -'mac=00:16:3E:74:34:32' -'' # The empty string - -These might be specified in the domain config file like this: - -vif = [ 'mac=00:16:3E:74:34:32', 'mac=00:16:3e:5f:48:e4,bridge=xenbr1' ] - -More formally, the string is a series of comma-separated keyword/value -pairs. All keywords are optional. - -Each device has a `DEVID` which is its index within the vif list, starting from 0. - -## Keywords - -### mac - -If specified then this option specifies the MAC address inside the -guest of this VIF device. The value is a 48-bit number represented as -six groups of two hexadecimal digits, separated by colons (:). - -The default if this keyword is not specified is to be automatically -generate a MAC address inside the space assigned to Xen's -[Organizationally Unique Identifier][oui] (00:16:3e). - -If you are choosing a MAC address then it is strongly recommend to -follow one of the following strategies: - - * Generate a random sequence of 6 byte, set the locally administered -bit (bit 2 of the first byte) and clear the multicast bit (bit 1 -of the first byte). In other words the first byte should have the -bit pattern xx10 (where x is a randomly generated bit) and the -remaining 5 bytes are randomly generated See -[http://en.wikipedia.org/wiki/MAC_address] for more details the -structure of a MAC address. - * Allocate an address from within the space defined by your -organization's OUI (if you have one) following your organization's -procedures for doing so. - * Allocate an address from within the space defined by Xen's OUI -(00:16:3e). Taking care not to clash with other users of the -physical network segment where this VIF will reside. - -If you have an OUI for your own use then that is the preferred -strategy. Otherwise in general you should prefer to generate a random -MAC and set the locally administered bit since this allows for more -bits of randomness than using the Xen OUI. - -### bridge - -Specifies the name of the network bridge which this VIF should be -added to. The default is `xenbr0`. The bridge must be configured using -your distribution's network configuration tools. See the [wiki][net] -for guidance and examples. - -### gatewaydev - -Specifies the name of the network interface which has an IP and which -is in the network the VIF should communicate with. This is used in the host -by the vif-route hotplug script. See [wiki][vifroute] for guidance and -examples. - -NOTE: netdev is a deprecated alias of this option. - -### type - -This keyword is valid for HVM guests only. - -Specifies the type of device to valid values are: - - * `ioemu` (default) -- this device will be provided as an emulate -device to the guest and also as a paravirtualised device which the -guest may choose to use instead if it has suitable drivers -available. - * `vif` -- this device will be provided as a paravirtualised device -only. - -### model - -This keyword is valid for HVM guest devices with `type=ioemu` only. - -Specifies the type device to emulated for this guest. Valid values -are: - - * `rtl8139` (default) -- Realtek RTL8139 - * `e1000` -- Intel E1000 - * in principle any device supported by your device model - -### vifname - -Specifies the backend device name for the virtual device. - -If the domain is an HVM domain then the associated emulated (tap) -device will have a "-emu" suffice added. - -The default name for the virtual device is `vifDOMID.DEVID` where -`DOMID` is the guest domain ID and `DEVID` is the device -number. Likewise the default tap name is `vifDOMID.DEVID-emu`. - -### script - -Specifies the hotplug script to run to configure this device (e.g. to -add it to the relevant bridge). Defaults to -`XEN_SCRIPT_DIR/vif-bridge` but can be set to any script. Some example -scripts are installed in `XEN_SCRIPT_DIR`. - -### ip - -Specifies the IP address for the device, the default is not to -specify an IP address. - -What, if any, effect this has depends on the hotplug script which is -configured. A typic
[Xen-devel] [PATCH 1/6] docs: add pod variant of xen-pv-channel.7
Add source in pod format for xen-pv-channel.7 This removes the buildtime requirement for pandoc, and subsequently the need for ghc, in the chain for BuildRequires of xen.rpm. Signed-off-by: Olaf Hering --- docs/man/xen-pv-channel.pod.7 | 189 ++ 1 file changed, 189 insertions(+) create mode 100644 docs/man/xen-pv-channel.pod.7 diff --git a/docs/man/xen-pv-channel.pod.7 b/docs/man/xen-pv-channel.pod.7 new file mode 100644 index 00..8b0b74aa27 --- /dev/null +++ b/docs/man/xen-pv-channel.pod.7 @@ -0,0 +1,189 @@ +=encoding utf8 + + +=head1 Xen PV Channels + +A channel is a low-bandwidth private byte stream similar to a serial +link. Typical uses of channels are + +=over + +=item 1. + +to provide initial configuration information to a VM on boot + (example use: CloudStack's cloud-early-config service) + + +=item 2. + +to signal/query an in-guest agent + (example use: oVirt's guest agent) + + +=back + +Channels are similar to virtio-serial devices and emulated serial links. +Channels are intended to be used in the implementation of libvirt s +when running on Xen. + +Note: if an application requires a high-bandwidth link then it should use +vchan instead. + + +=head2 How to use channels: an example + +Consider a cloud deployment where VMs are cloned from pre-made templates, +and customised on first boot by an in-guest agent which sets the IP address, +hostname, ssh keys etc. To install the system the cloud administrator would +first: + +=over + +=item 1. + +Install a guest as normal (no channel configuration necessary) + + +=item 2. + +Install the in-guest agent specific to the cloud software. This will + prepare the guest to communicate over the channel, and also prepare + the guest to be cloned safely (sometimes known as "sysprepping") + + +=item 3. + +Shutdown the guest + + +=item 4. + +Register the guest as a template with the cloud orchestration software + + +=item 5. + +Install the cloud orchestration agent in dom0 + + +=back + +At runtime, when a cloud tenant requests that a VM is created from the template, +the sequence of events would be: (assuming a Linux domU) + +=over + +=item 1. + +A VM is "cloned" from the template + + +=item 2. + +A unique Unix domain socket path in dom0 is allocated + (e.g. /my/cloud/software/talk/to/domain/) + + +=item 3. + +Domain configuration is created for the VM, listing the channel + name expected by the in-guest agent. In xl syntax this would be: + + channel = [ "connection=socket, name=org.my.cloud.software.agent.version1, + path = /my/cloud/software/talk/to/domain/" ] + + + +=item 4. + +The VM is started + + +=item 5. + +In dom0 the cloud orchestration agent connects to the Unix domain + socket, writes a handshake message and waits for a reply + + +=item 6. + +Assuming the guest kernel has CONFIGIXEN_FRONTEND set then the console + driver will generate a hotplug event + + +=item 7. + +A udev rule is activated by the hotplug event. + + The udev rule would look something like: + + SUBSYSTEM=="xen", DEVPATH=="/devices/console-[0-9]", RUN+="xen-console-setup" + + where the "xen-console-setup" script would read the channel name and + make a symlink in /dev/xen-channel/org.my.cloud.software.agent.version1 + + + +=item 8. + +The in-guest agent uses inotify to see the creation of the /dev/xen-channel + symlink and opens the device. + + +=item 9. + +The in-guest agent completes the handshake with the dom0 agent + + +=item 10. + +The dom0 agent transmits the unique VM configuration: hostname, IP + address, ssh keys etc etc + + +=item 11. + +The in-guest agent receives the configuration and applies it. + + +=back + +Using channels avoids having to use a temporary disk device or network +connection. + + +=head2 Design recommendations and pitfalls + +It's necessary to install channel-specific software (an "agent") into the guest +before you can use a channel. By default a channel will appear as a device +which could be mistaken for a serial port or regular console. It is known +that some software will proactively seek out serial ports and issue AT commands +at them; make sure such software is disabled! + +Since channels are identified by names, application authors must ensure their +channel names are unique to avoid clashes. We recommend that channel names +include parts unique to the application such as a domain names. To assist +prevent clashes we recommend authors add their names to our global channel +registry at the end of this document. + + +=head2 Limitations + +Hotplug and unplug of channels is not currently implemented. + + +=head2 Channel name registry + +It is important that channel names are globally unique. To help ensure +that no-one's name clashes with yours, please add yours to this list. + +Key: +N: Name +C: Contact +D: Short description of use, possibly including a URL to your softwar
[Xen-devel] [PATCH 2/6] docs: add pod variant of xl-network-configuration.5
Add source in pod format for xl-network-configuration.5 This removes the buildtime requirement for pandoc, and subsequently the need for ghc, in the chain for BuildRequires of xen.rpm. Signed-off-by: Olaf Hering --- docs/man/xl-network-configuration.pod.5 | 250 1 file changed, 250 insertions(+) create mode 100644 docs/man/xl-network-configuration.pod.5 diff --git a/docs/man/xl-network-configuration.pod.5 b/docs/man/xl-network-configuration.pod.5 new file mode 100644 index 00..9fa373e20d --- /dev/null +++ b/docs/man/xl-network-configuration.pod.5 @@ -0,0 +1,250 @@ +=encoding utf8 + + +=head1 XL Network Configuration + + +=head2 Syntax Overview + +This document specifies the xl config file format vif configuration +option. It has the following form: + +vif = [ '', '', ... ] + +where each vifspec is in this form: + +[=|,] + +For example: + +'mac=00:16:3E:74:3d:76,model=rtl8139,bridge=xenbr0' +'mac=00:16:3E:74:34:32' +'' # The empty string + +These might be specified in the domain config file like this: + +vif = [ 'mac=00:16:3E:74:34:32', 'mac=00:16:3e:5f:48:e4,bridge=xenbr1' ] + +More formally, the string is a series of comma-separated keyword/value +pairs. All keywords are optional. + +Each device has a C which is its index within the vif list, starting from 0. + + +=head2 Keywords + + +=head2 mac + +If specified then this option specifies the MAC address inside the +guest of this VIF device. The value is a 48-bit number represented as +six groups of two hexadecimal digits, separated by colons (:). + +The default if this keyword is not specified is to be automatically +generate a MAC address inside the space assigned to Xen's +Lhttp://en.wikipedia.org/wiki/Organizationally_Unique_Identifier> (00:16:3e). + +If you are choosing a MAC address then it is strongly recommend to +follow one of the following strategies: + +=over + +=item * + +Generate a random sequence of 6 byte, set the locally administered +bit (bit 2 of the first byte) and clear the multicast bit (bit 1 +of the first byte). In other words the first byte should have the +bit pattern xx10 (where x is a randomly generated bit) and the +remaining 5 bytes are randomly generated See +[http://en.wikipedia.org/wiki/MAC_address] for more details the +structure of a MAC address. + + +=item * + +Allocate an address from within the space defined by your +organization's OUI (if you have one) following your organization's +procedures for doing so. + + +=item * + +Allocate an address from within the space defined by Xen's OUI +(00:16:3e). Taking care not to clash with other users of the +physical network segment where this VIF will reside. + + +=back + +If you have an OUI for your own use then that is the preferred +strategy. Otherwise in general you should prefer to generate a random +MAC and set the locally administered bit since this allows for more +bits of randomness than using the Xen OUI. + + +=head2 bridge + +Specifies the name of the network bridge which this VIF should be +added to. The default is C. The bridge must be configured using +your distribution's network configuration tools. See the Lhttp://wiki.xen.org/wiki/HostConfiguration/Networking> +for guidance and examples. + + +=head2 gatewaydev + +Specifies the name of the network interface which has an IP and which +is in the network the VIF should communicate with. This is used in the host +by the vif-route hotplug script. See Lhttp://wiki.xen.org/wiki/Vif-route> for guidance and +examples. + +NOTE: netdev is a deprecated alias of this option. + + +=head2 type + +This keyword is valid for HVM guests only. + +Specifies the type of device to valid values are: + +=over + +=item * + +C (default) -- this device will be provided as an emulate +device to the guest and also as a paravirtualised device which the +guest may choose to use instead if it has suitable drivers +available. + + +=item * + +C -- this device will be provided as a paravirtualised device +only. + + +=back + + +=head2 model + +This keyword is valid for HVM guest devices with C only. + +Specifies the type device to emulated for this guest. Valid values +are: + +=over + +=item * + +C (default) -- Realtek RTL8139 + + +=item * + +C -- Intel E1000 + + +=item * + +in principle any device supported by your device model + + +=back + + +=head2 vifname + +Specifies the backend device name for the virtual device. + +If the domain is an HVM domain then the associated emulated (tap) +device will have a "-emu" suffice added. + +The default name for the virtual device is C where +C is the guest domain ID and C is the device +number. Likewise the default tap name is C. + + +=head2 script + +Specifies the hotplug script to run to configure this device (e.g. to +add it to the relevant bridge). Defaults to +C but can be set to an
[Xen-devel] [PATCH 0/6] docs: convert manpages to pod
To remove the buildtime dependency to pandoc/ghc some manpages are converted from markdown to pod format. This will provide more manpages which are referenced in xl(1) and xl.cfg(5). This series does not cover xen-vbd-interface.7 because converting the lists used in this manpage was not straight forward. Olaf Cc: Ian Jackson Cc: Wei Liu To: xen-devel@lists.xen.org Olaf Hering (6): docs: add pod variant of xen-pv-channel.7 docs: add pod variant of xl-network-configuration.5 docs: add pod variant of xl-numa-placement docs: remove markdown variant of xen-pv-channel.7 docs: remove markdown variant of xl-network-configuration.5 docs: remove markdown variant of xl-numa-placement.7 docs/man/xen-pv-channel.markdown.7 | 106 --- docs/man/xen-pv-channel.pod.7 | 189 ...n.markdown.5 => xl-network-configuration.pod.5} | 195 ++--- ...lacement.markdown.7 => xl-numa-placement.pod.7} | 164 +++-- 4 files changed, 433 insertions(+), 221 deletions(-) delete mode 100644 docs/man/xen-pv-channel.markdown.7 create mode 100644 docs/man/xen-pv-channel.pod.7 rename docs/man/{xl-network-configuration.markdown.5 => xl-network-configuration.pod.5} (55%) rename docs/man/{xl-numa-placement.markdown.7 => xl-numa-placement.pod.7} (74%) ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.9] docs: replace xm with xl in xen-tscmode [and 1 more messages]
On Thu, May 25, Julien Grall wrote: > Hi Ian, > > On 24/05/2017 12:07, Ian Jackson wrote: > > Olaf Hering writes ("[PATCH] docs: replace xm with xl in xen-tscmode"): > > > Signed-off-by: Olaf Hering > > Olaf Hering writes ("[PATCH] docs: correct paragraph indention in > > xen-tscmode"): > > > Signed-off-by: Olaf Hering > > Both: > > Acked-by: Ian Jackson > > > > I think these good for 4.9 and are covered by Julien's exception for > > docs. So Wei or I will commit them soon. > Yes that's correct. Both missed the 4.9 release. Please apply now. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] API to query NUMA node of mfn
On Mon, Jul 10, Konrad Rzeszutek Wilk wrote: > Soo I wrote some code for exactly this for Xen 4.4.4 , along with > creation of a PGM map to see the NUMA nodes locality. Are you planning to prepare that for staging at some point? I have not checked this series is already merged. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] API to query NUMA node of mfn
I would like to verify on which NUMA node the PFNs used by a HVM guest are located. Is there an API for that? Something like: foreach (pfn, domid) mfns_per_node[pfn_to_node(pfn)]++ foreach (node) printk("%x %x\n", node, mfns_per_node[node]) Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] time does not move forward in HVM guests
Am Wed, 05 Jul 2017 02:14:23 -0600 schrieb "Jan Beulich" : > Oh, even for HVM. Doesn't that go back to the missing vDSO > support then again, which we had discussed just last week? Yes. This is part of it. With clocksource=tsc there is a performance boost because vdso is used. With clocksource=hpet there is a performance drop to 20%, depending on the workload, due to the emulation of it (I guess). Olaf pgpUQf64XmILx.pgp Description: Digitale Signatur von OpenPGP ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] time does not move forward in HVM guests
On Wed, Jul 05, Jan Beulich wrote: > > clock_getres(CLOCK_MONOTONIC) indicates a resolution of 1ns. > But what's the implied meaning of resolution here? See below. I have no ide what the returned value is supposed to promise. > Or did you perhaps test with an older version, where the time > handling backports from master hadn't been there yet? It was weeks ago, and I have not seen it since then. I think it is fixed in one way or another. > > A workaround is booting the domU kernel with 'clocksource=tsc nohz=off > > highres=off'. > What clocksource does the system use by default? HPET? HPET would be really really slow. The default clocksource is "xen". > According to what the hypervisor tells the guest, vHPET > resolution is 16ns. That still wouldn't explain a steady value > over a period of 100ns, but it's at least a hint that what the > kernel tells you may not be what underlying (virtual) > hardware reports. If clocksource=xen relies on the hypervisor, perhaps the kernel should be aware of it in some way. So far I have not checked where clock_getres gets its data. > Additionally - are all three options indeed required to work > around this, i.e. no pair out of the three is enough? Yes, otherwise the kernel would complain, forgot the exact error message. Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] valgrind support for xen4.7+
On Wed, Apr 12, Glenn Enright wrote: > Has anyone seen or been working on patches for valgrind for recent versions > of xen? Upstream requires paperwork, via kde.org bugzilla. This is my variant, which is enough to run 'xl create' with valgrind. Olaf --- coregrind/m_syswrap/syswrap-xen.c.orig +++ coregrind/m_syswrap/syswrap-xen.c @@ -584,6 +584,8 @@ PRE(sysctl) { case 0x0009: case 0x000a: case 0x000b: + case 0x000c: + case 0x000d: break; default: bad_intf_version(tid, layout, arrghs, status, flags, @@ -626,6 +628,8 @@ PRE(sysctl) { break; case 0x000a: case 0x000b: + case 0x000c: + case 0x000d: PRE_XEN_SYSCTL_READ(getdomaininfolist_000a, first_domain); PRE_XEN_SYSCTL_READ(getdomaininfolist_000a, max_domains); PRE_XEN_SYSCTL_READ(getdomaininfolist_000a, buffer); @@ -728,6 +732,9 @@ PRE(domctl) case 0x0008: case 0x0009: case 0x000a: + case 0x000b: + case 0x000c: + case 0x000d: break; default: bad_intf_version(tid, layout, arrghs, status, flags, @@ -1534,6 +1541,8 @@ POST(sysctl) case 0x0009: case 0x000a: case 0x000b: + case 0x000c: + case 0x000d: break; default: return; @@ -1568,6 +1577,8 @@ POST(sysctl) break; case 0x000a: case 0x000b: + case 0x000c: + case 0x000d: POST_XEN_SYSCTL_WRITE(getdomaininfolist_000a, num_domains); POST_MEM_WRITE((Addr)sysctl->u.getdomaininfolist_000a.buffer.p, sizeof(*sysctl->u.getdomaininfolist_000a.buffer.p) Olaf signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] time does not move forward in HVM guests
In my testing with sysbench in a HVM domU running a linux-4.4 based pvops kernel on a xen-4.7 based dom0 the time does not move forward properly: There (URL below) is basically code like this: clock_gettime(CLOCK_MONOTONIC, a) do_work clock_gettime(CLOCK_MONOTONIC, b) diff_time(a,b) All 'do_work' does is writing zeros to a block of memory. clock_getres(CLOCK_MONOTONIC) indicates a resolution of 1ns. If 'do_work' takes like 100ns or less: a==b. I think this is something that should not happen. In case of vcpu overcommit this happens also when 'do_work' takes around 800ns. At some point I have also seen cases of time going backward. I can not reproduce this anymore, might have been bugs in my code or the domU.cfg changed. A workaround is booting the domU kernel with 'clocksource=tsc nohz=off highres=off'. Why does this happen? Are the expectations too high? Olaf https://github.com/olafhering/sysbench/compare/master...pv bash autogen.sh make -j bash mem.1K.on.sh signature.asc Description: PGP signature ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel