Re: [Xen-devel] [PATCH v3 2/3] Introduce migration precopy policy
On 26/09/17 12:02, Andrew Cooper wrote: On 25/09/17 19:55, Jennifer Herbert wrote: @@ -46,7 +60,22 @@ struct save_callbacks { */ int (*suspend)(void* data); -/* Called after the guest's dirty pages have been +/* + * Called before and after every batch of page data sent during + * the precopy phase of a live migration to ask the caller what + * to do next based on the current state of the precopy migration. How is the callback supposed to determine whether it is before or ahead of the data batch? As far as I can tell, its not possible. ~Andrew You look at the dirty_count. It works well for me. I can add that info to the comment. -jenny + * + * Should return one of the values listed below: + */ +#define XGS_POLICY_ABORT (-1) /* Abandon the migration entirely +* and tidy up. */ +#define XGS_POLICY_CONTINUE_PRECOPY 0 /* Remain in the precopy phase. */ +#define XGS_POLICY_STOP_AND_COPY1 /* Immediately suspend and transmit the +* remaining dirty pages. */ +precopy_policy_t precopy_policy; + +/* + * Called after the guest's dirty pages have been * copied into an output buffer. * Callback function resumes the guest & the device model, * returns to xc_domain_save. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 2/3] Introduce migration precopy policy
On 26/09/17 11:58, Andrew Cooper wrote: On 25/09/17 19:55, Jennifer Herbert wrote: +/* + * A precopy_policy callback may not be running in the same address + * space as libxc an so precopy_stats is passed by value. + */ Please take a step back and thing about what is written here... As I've said repeatedly, the structure vs pointer argument here is entirely unrelated to the IPC boundary. I am also unhappy that, after multiple review requests saying "turn this into a pointer", its remained being passed by value. It is not ok to hack things like this up simply because its the easier cause of action. Passing tiny structures by value is not a hack - it is perfectly valid C. Bare in mind that a pointer is not a zero sized entity - it itself needs to be copied. Passing by value here is quite possibly faster, and use less memory. Furthermore, the multiple reviews concluded that this was related to IPC boundary, and that the tiny struct size was acceptable. Maybe you could elaborate on why you do not think it is related to the IPC boundary. Instead, we could pass each element of the structure, as an individual parameters. This would seem a retrograde step to me, but would address your sensitivities. -jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/3] RFC: migration: defer precopy policy to libxl
On 26/09/17 09:48, Wei Liu wrote: On Mon, Sep 25, 2017 at 07:55:35PM +0100, Jennifer Herbert wrote: Provide an implementation of the old policy as a callback in libxl and plumb it through the IPC machinery to libxc. This serves as an example for defining a libxl policy, and provides no advantage over the default policy in libxc. Signed-off-by: Joshua Otto <jto...@uwaterloo.ca> Reviewed-by: Roger Pau Monné <roger@citrix.com> The code looks plausible but it is not really usable by libxl users as-is, because a public API is still missing. This is why I excluded it to start with, and now labelled it RFC. Its only the first two patches I would like to see pulled atm. Cheers ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 2/3] Introduce migration precopy policy
This Patch allows a migration precopy policy to be specified. The precopy phase of the xc_domain_save() live migration algorithm has historically been implemented to run until either a) (almost) no pages are dirty or b) some fixed, hard-coded maximum number of precopy iterations has been exceeded. This policy and its implementation are less than ideal for a few reasons: - the logic of the policy is intertwined with the control flow of the mechanism of the precopy stage - it can't take into account facts external to the immediate migration context, such external state transfer state, interactive user input, or the passage of wall-clock time. - it does not permit the user to change their mind, over time, about what to do at the end of the precopy (they get an unconditional transition into the stop-and-copy phase of the migration) To permit callers to implement arbitrary higher-level policies governing when the live migration precopy phase should end, and what should be done next: - add a precopy_policy() callback to the xc_domain_save() user-supplied callbacks - during the precopy phase of live migrations, consult this policy after each batch of pages transmitted and take the dictated action, which may be to a) abort the migration entirely, b) continue with the precopy, or c) proceed to the stop-and-copy phase. - provide an implementation of the old policy, used when precopy_policy callback is not provided. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> Signed-off-by: Joshua Otto <jto...@uwaterloo.ca> Reviewed-by: Paul Durrant <paul.durr...@citrix.com> --- v3: Have changed/added comments, removed iteration parmeter from update_progress_string and made a few formatting corrections. v2: Have made a few formatting corrections, added typedef as suggested. This is updated/modified subset of patch 7/20, part of Joshua Otto's "Add postcopy live migration support." patch, dated 27th March 2017. As indicated on the original thread, I wish to make use of this this within the XenServer product. I hope this will aid Josh in pushing the remainder of his series. --- tools/libxc/include/xenguest.h | 37 +-- tools/libxc/xc_sr_common.h | 6 +-- tools/libxc/xc_sr_save.c | 105 - 3 files changed, 109 insertions(+), 39 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 6626f0c..de97f1b 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -39,6 +39,20 @@ */ struct xenevtchn_handle; +/* For save's precopy_policy(). */ +struct precopy_stats +{ +unsigned int iteration; +unsigned int total_written; +long dirty_count; /* -1 if unknown */ +}; + +/* + * A precopy_policy callback may not be running in the same address + * space as libxc an so precopy_stats is passed by value. + */ +typedef int (*precopy_policy_t)(struct precopy_stats, void *); + /* callbacks provided by xc_domain_save */ struct save_callbacks { /* Called after expiration of checkpoint interval, @@ -46,7 +60,22 @@ struct save_callbacks { */ int (*suspend)(void* data); -/* Called after the guest's dirty pages have been +/* + * Called before and after every batch of page data sent during + * the precopy phase of a live migration to ask the caller what + * to do next based on the current state of the precopy migration. + * + * Should return one of the values listed below: + */ +#define XGS_POLICY_ABORT (-1) /* Abandon the migration entirely +* and tidy up. */ +#define XGS_POLICY_CONTINUE_PRECOPY 0 /* Remain in the precopy phase. */ +#define XGS_POLICY_STOP_AND_COPY1 /* Immediately suspend and transmit the +* remaining dirty pages. */ +precopy_policy_t precopy_policy; + +/* + * Called after the guest's dirty pages have been * copied into an output buffer. * Callback function resumes the guest & the device model, * returns to xc_domain_save. @@ -55,7 +84,8 @@ struct save_callbacks { */ int (*postcopy)(void* data); -/* Called after the memory checkpoint has been flushed +/* + * Called after the memory checkpoint has been flushed * out into the network. Typical actions performed in this * callback include: * (a) send the saved device model state (for HVM guests), @@ -65,7 +95,8 @@ struct save_callbacks { * * returns: * 0: terminate checkpointing gracefully - * 1: take another checkpoint */ + * 1: take another checkpoint + */ int (*checkpoint)(void* data); /* diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22a..3635704 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -198,12 +198,10 @@ struct xc_sr_context
[Xen-devel] [PATCH v3 3/3] RFC: migration: defer precopy policy to libxl
Provide an implementation of the old policy as a callback in libxl and plumb it through the IPC machinery to libxc. This serves as an example for defining a libxl policy, and provides no advantage over the default policy in libxc. Signed-off-by: Joshua OttoReviewed-by: Roger Pau Monné --- tools/libxl/libxl_dom_save.c | 20 tools/libxl/libxl_save_msgs_gen.pl | 4 +++- 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c index 77fe30e..6d28cce 100644 --- a/tools/libxl/libxl_dom_save.c +++ b/tools/libxl/libxl_dom_save.c @@ -328,6 +328,25 @@ int libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss, return rc; } +/* + * This is the live migration precopy policy - it's called periodically during + * the precopy phase of live migrations, and is responsible for deciding when + * the precopy phase should terminate and what should be done next. + * + * The policy implemented here behaves identically to the policy previously + * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy phase of + * the live migration when there are either fewer than 50 dirty pages, or more + * than 5 precopy rounds have completed. + */ +static int libxl__save_live_migration_simple_precopy_policy( +struct precopy_stats stats, void *user) +{ +return ((stats.dirty_count >= 0 && stats.dirty_count < 50) || +stats.iteration >= 5) +? XGS_POLICY_STOP_AND_COPY +: XGS_POLICY_CONTINUE_PRECOPY; +} + /*- main code for saving, in order of execution -*/ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss) @@ -401,6 +420,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss) if (dss->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_NONE) callbacks->suspend = libxl__domain_suspend_callback; +callbacks->precopy_policy = libxl__save_live_migration_simple_precopy_policy; callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty; dss->sws.ao = dss->ao; diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl index 3ae7373..bb1d4e9 100755 --- a/tools/libxl/libxl_save_msgs_gen.pl +++ b/tools/libxl/libxl_save_msgs_gen.pl @@ -33,6 +33,7 @@ our @msgs = ( 'xen_pfn_t', 'console_gfn'] ], [ 9, 'srW',"complete", [qw(int retval int errnoval)] ], +[ 10, 'scxW', "precopy_policy", ['struct precopy_stats', 'stats'] ] ); # @@ -141,7 +142,8 @@ static void bytes_put(unsigned char *const buf, int *len, END -foreach my $simpletype (qw(int uint16_t uint32_t unsigned), 'unsigned long', 'xen_pfn_t') { +foreach my $simpletype (qw(int uint16_t uint32_t unsigned), +'unsigned long', 'xen_pfn_t', 'struct precopy_stats') { my $typeid = typeid($simpletype); $out_body{'callout'} .= <
[Xen-devel] [PATCH v3 0/3] Introduce migration precopy policy
Hi all, v3: In patch 2/3, have changed/added comments, removed iteration parmeter from update_progress_string and made a few formatting corrections. v2: Have tidied some formatting, s.o.b Joshua, and added a RFC patch showing how to use this in libxl. v1: Here I present a updated/modified subset of patch 7/20, part of Joshua Otto's "Add postcopy live migration support." patch, dated 27th March 2017. As indicated on the original thread, I wish to make use of this this within the XenServer product, and hence am trying to get this subset pulled in now. I also hope this will aid Josh in pushing the remainder of his series. Here I present two patches, the first which does some tidy up, removing unused and unhelpful paramaters to xc_domain_save(), and the second which allows a precopy callback to be specified, providing the test for when to end the live phase of migration should end. If none is provided, a default policy of the current behaviour is used. Jennifer Herbert (3): Tidy libxc xc_domain_save Introduce migration precopy policy RFC: migration: defer precopy policy to libxl tools/libxc/include/xenguest.h | 41 -- tools/libxc/xc_nomigrate.c | 3 +- tools/libxc/xc_sr_common.h | 6 +- tools/libxc/xc_sr_save.c | 113 + tools/libxl/libxl_dom_save.c | 20 +++ tools/libxl/libxl_save_callout.c | 4 +- tools/libxl/libxl_save_helper.c| 7 +-- tools/libxl/libxl_save_msgs_gen.pl | 4 +- 8 files changed, 142 insertions(+), 56 deletions(-) -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 1/3] Tidy libxc xc_domain_save
Tidy up libxc's xc_domain_save, removing unused paramaters max_iters and max_factor, making matching changes to libxl. Signed-off-by: Joshua Otto <jto...@uwaterloo.ca> Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> Reviewed-by: Paul Durrant <paul.durr...@citrix.com> Acked-by: Wei Liu <wei.l...@citrix.com> --- tools/libxc/include/xenguest.h | 4 ++-- tools/libxc/xc_nomigrate.c | 3 +-- tools/libxc/xc_sr_save.c | 8 +++- tools/libxl/libxl_save_callout.c | 4 ++-- tools/libxl/libxl_save_helper.c | 7 ++- 5 files changed, 10 insertions(+), 16 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 5cd8111..6626f0c 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -100,8 +100,8 @@ typedef enum { *doesn't use checkpointing * @return 0 on success, -1 on failure */ -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, - uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */, +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, + uint32_t flags /* XCFLAGS_xxx */, struct save_callbacks* callbacks, int hvm, xc_migration_stream_t stream_type, int recv_fd); diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c index 317c8ce..fe8f68c 100644 --- a/tools/libxc/xc_nomigrate.c +++ b/tools/libxc/xc_nomigrate.c @@ -20,8 +20,7 @@ #include #include -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, - uint32_t max_factor, uint32_t flags, +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags, struct save_callbacks* callbacks, int hvm, xc_migration_stream_t stream_type, int recv_fd) { diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index ca6913b..1e7502d 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -916,9 +916,8 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) }; int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, - uint32_t max_iters, uint32_t max_factor, uint32_t flags, - struct save_callbacks* callbacks, int hvm, - xc_migration_stream_t stream_type, int recv_fd) + uint32_t flags, struct save_callbacks* callbacks, + int hvm, xc_migration_stream_t stream_type, int recv_fd) { struct xc_sr_context ctx = { @@ -955,8 +954,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, if ( ctx.save.checkpointed == XC_MIG_STREAM_COLO ) assert(callbacks->wait_checkpoint); -DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d", -io_fd, dom, max_iters, max_factor, flags, hvm); +DPRINTF("fd %d, dom %u, flags %u, hvm %d", io_fd, dom, flags, hvm); if ( xc_domain_getinfo(xch, dom, 1, ) != 1 ) { diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index 891c669..6452d70 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -89,8 +89,8 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss, libxl__srm_callout_enumcallbacks_save(>callbacks.save.a); const unsigned long argnums[] = { -dss->domid, 0, 0, dss->xcflags, dss->hvm, -cbflags, dss->checkpointed_stream, +dss->domid, dss->xcflags, dss->hvm, cbflags, +dss->checkpointed_stream, }; shs->ao = ao; diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c index 1dece23..38089a0 100644 --- a/tools/libxl/libxl_save_helper.c +++ b/tools/libxl/libxl_save_helper.c @@ -251,8 +251,6 @@ int main(int argc, char **argv) io_fd = atoi(NEXTARG); recv_fd = atoi(NEXTARG); uint32_t dom = strtoul(NEXTARG,0,10); -uint32_t max_iters =strtoul(NEXTARG,0,10); -uint32_t max_factor = strtoul(NEXTARG,0,10); uint32_t flags =strtoul(NEXTARG,0,10); int hvm = atoi(NEXTARG); unsigned cbflags = strtoul(NEXTARG,0,10); @@ -264,9 +262,8 @@ int main(int argc, char **argv) startup("save"); setup_signals(save_signal_handler); -r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags, - _save_callbacks, hvm, stream_type, - recv_fd); +r = xc_domain_save(xch, io_fd, dom, flags, _save_callbacks, + hvm, stream_type, recv_fd); complete(r); } else if (!strcmp(mode,"--restore-domain&q
Re: [Xen-devel] [PATCH v2 2/3] Introduce migration precopy policy
On 20/09/17 11:20, Roger Pau Monné wrote: On Tue, Sep 19, 2017 at 07:06:26PM +0100, Jennifer Herbert wrote: This Patch allows a migration precopy policy to be specified. The precopy phase of the xc_domain_save() live migration algorithm has historically been implemented to run until either a) (almost) no pages are dirty or b) some fixed, hard-coded maximum number of precopy iterations has been exceeded. This policy and its implementation are less than ideal for a few reasons: - the logic of the policy is intertwined with the control flow of the mechanism of the precopy stage - it can't take into account facts external to the immediate migration context, such external state transfer state, interactive user input, or the passage of wall-clock time. - it does not permit the user to change their mind, over time, about what to do at the end of the precopy (they get an unconditional transition into the stop-and-copy phase of the migration) To permit callers to implement arbitrary higher-level policies governing when the live migration precopy phase should end, and what should be done next: - add a precopy_policy() callback to the xc_domain_save() user-supplied callbacks - during the precopy phase of live migrations, consult this policy after each batch of pages transmitted and take the dictated action, which may be to a) abort the migration entirely, b) continue with the precopy, or c) proceed to the stop-and-copy phase. - provide an implementation of the old policy, used when precopy_policy callback is not provided. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> Signed-off-by: Joshua Otto <jto...@uwaterloo.ca> --- v2: Have made a few formatting corrections, added typedef as suggested. v1: This is updated/modified subset of patch 7/20, part of Joshua Otto's "Add postcopy live migration support." patch, dated 27th March 2017. As indicated on the original thread, I wish to make use of this this within the XenServer product. I hope this will aid Josh in pushing the remainder of his series. --- tools/libxc/include/xenguest.h | 31 -- tools/libxc/xc_sr_common.h | 6 +-- tools/libxc/xc_sr_save.c | 97 +- 3 files changed, 97 insertions(+), 37 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 6626f0c..a2a654c 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -39,6 +39,16 @@ */ struct xenevtchn_handle; +/* For save's precopy_policy(). */ +struct precopy_stats +{ +unsigned int iteration; +unsigned int total_written; +long dirty_count; /* -1 if unknown */ +}; + +typedef int (*precopy_policy_t)(struct precopy_stats, void *); Shouldn't precopy_stats be a pointer (const pointer probably seeing it's usage)? In April Joshua described how he did it like this to help with IPC plumbing into libxl. Ian Jackson explained that you can't pass a pointer across the IPC boundary. The two bits of code run in different processes, with different address spaces. Since the precopy_stats structure is tiny, it was concluded it would have very small impact on performance. I'd also agree since a pointer to this structure would almost half as big at the structure itself. I think I'll add a comment above the line to explain the decision. + /* callbacks provided by xc_domain_save */ struct save_callbacks { /* Called after expiration of checkpoint interval, @@ -46,7 +56,20 @@ struct save_callbacks { */ int (*suspend)(void* data); -/* Called after the guest's dirty pages have been +/* + * Called after every batch of page data sent during the precopy + * phase of a live migration to ask the caller what to do next + * based on the current state of the precopy migration. I would add: "Should return one of the values listed below:" + */ +#define XGS_POLICY_ABORT (-1) /* Abandon the migration entirely +* and tidy up. */ +#define XGS_POLICY_CONTINUE_PRECOPY 0 /* Remain in the precopy phase. */ +#define XGS_POLICY_STOP_AND_COPY1 /* Immediately suspend and transmit the +* remaining dirty pages. */ +precopy_policy_t precopy_policy; + +/* + * Called after the guest's dirty pages have been * copied into an output buffer. * Callback function resumes the guest & the device model, * returns to xc_domain_save. @@ -55,7 +78,8 @@ struct save_callbacks { */ int (*postcopy)(void* data); -/* Called after the memory checkpoint has been flushed +/* + * Called after the memory checkpoint has been flushed * out into the network. Typical actions performed in this * callback include: * (a) send the saved device model state (for HVM guests), @@
[Xen-devel] [PATCH v2 3/3] RFC: migration: defer precopy policy to libxl
Provide an implementation of the old policy as a callback in libxl and plumb it through the IPC machinery to libxc. This serves as an example for defining a libxl policy, and provides no advantage over the default policy in libxc. Signed-off-by: Joshua Otto--- I have included this patch, as rfc, as requested by Ian, to show how libxl can provide a migration precopy policy. This was part of the same larger patch from Joshua - I have not changed or tested it. tools/libxl/libxl_dom_save.c | 20 tools/libxl/libxl_save_msgs_gen.pl | 4 +++- 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c index 77fe30e..6d28cce 100644 --- a/tools/libxl/libxl_dom_save.c +++ b/tools/libxl/libxl_dom_save.c @@ -328,6 +328,25 @@ int libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss, return rc; } +/* + * This is the live migration precopy policy - it's called periodically during + * the precopy phase of live migrations, and is responsible for deciding when + * the precopy phase should terminate and what should be done next. + * + * The policy implemented here behaves identically to the policy previously + * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy phase of + * the live migration when there are either fewer than 50 dirty pages, or more + * than 5 precopy rounds have completed. + */ +static int libxl__save_live_migration_simple_precopy_policy( +struct precopy_stats stats, void *user) +{ +return ((stats.dirty_count >= 0 && stats.dirty_count < 50) || +stats.iteration >= 5) +? XGS_POLICY_STOP_AND_COPY +: XGS_POLICY_CONTINUE_PRECOPY; +} + /*- main code for saving, in order of execution -*/ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss) @@ -401,6 +420,7 @@ void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss) if (dss->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_NONE) callbacks->suspend = libxl__domain_suspend_callback; +callbacks->precopy_policy = libxl__save_live_migration_simple_precopy_policy; callbacks->switch_qemu_logdirty = libxl__domain_suspend_common_switch_qemu_logdirty; dss->sws.ao = dss->ao; diff --git a/tools/libxl/libxl_save_msgs_gen.pl b/tools/libxl/libxl_save_msgs_gen.pl index 3ae7373..bb1d4e9 100755 --- a/tools/libxl/libxl_save_msgs_gen.pl +++ b/tools/libxl/libxl_save_msgs_gen.pl @@ -33,6 +33,7 @@ our @msgs = ( 'xen_pfn_t', 'console_gfn'] ], [ 9, 'srW',"complete", [qw(int retval int errnoval)] ], +[ 10, 'scxW', "precopy_policy", ['struct precopy_stats', 'stats'] ] ); # @@ -141,7 +142,8 @@ static void bytes_put(unsigned char *const buf, int *len, END -foreach my $simpletype (qw(int uint16_t uint32_t unsigned), 'unsigned long', 'xen_pfn_t') { +foreach my $simpletype (qw(int uint16_t uint32_t unsigned), +'unsigned long', 'xen_pfn_t', 'struct precopy_stats') { my $typeid = typeid($simpletype); $out_body{'callout'} .= <
[Xen-devel] [PATCH v2 2/3] Introduce migration precopy policy
This Patch allows a migration precopy policy to be specified. The precopy phase of the xc_domain_save() live migration algorithm has historically been implemented to run until either a) (almost) no pages are dirty or b) some fixed, hard-coded maximum number of precopy iterations has been exceeded. This policy and its implementation are less than ideal for a few reasons: - the logic of the policy is intertwined with the control flow of the mechanism of the precopy stage - it can't take into account facts external to the immediate migration context, such external state transfer state, interactive user input, or the passage of wall-clock time. - it does not permit the user to change their mind, over time, about what to do at the end of the precopy (they get an unconditional transition into the stop-and-copy phase of the migration) To permit callers to implement arbitrary higher-level policies governing when the live migration precopy phase should end, and what should be done next: - add a precopy_policy() callback to the xc_domain_save() user-supplied callbacks - during the precopy phase of live migrations, consult this policy after each batch of pages transmitted and take the dictated action, which may be to a) abort the migration entirely, b) continue with the precopy, or c) proceed to the stop-and-copy phase. - provide an implementation of the old policy, used when precopy_policy callback is not provided. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> Signed-off-by: Joshua Otto <jto...@uwaterloo.ca> --- v2: Have made a few formatting corrections, added typedef as suggested. v1: This is updated/modified subset of patch 7/20, part of Joshua Otto's "Add postcopy live migration support." patch, dated 27th March 2017. As indicated on the original thread, I wish to make use of this this within the XenServer product. I hope this will aid Josh in pushing the remainder of his series. --- tools/libxc/include/xenguest.h | 31 -- tools/libxc/xc_sr_common.h | 6 +-- tools/libxc/xc_sr_save.c | 97 +- 3 files changed, 97 insertions(+), 37 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 6626f0c..a2a654c 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -39,6 +39,16 @@ */ struct xenevtchn_handle; +/* For save's precopy_policy(). */ +struct precopy_stats +{ +unsigned int iteration; +unsigned int total_written; +long dirty_count; /* -1 if unknown */ +}; + +typedef int (*precopy_policy_t)(struct precopy_stats, void *); + /* callbacks provided by xc_domain_save */ struct save_callbacks { /* Called after expiration of checkpoint interval, @@ -46,7 +56,20 @@ struct save_callbacks { */ int (*suspend)(void* data); -/* Called after the guest's dirty pages have been +/* + * Called after every batch of page data sent during the precopy + * phase of a live migration to ask the caller what to do next + * based on the current state of the precopy migration. + */ +#define XGS_POLICY_ABORT (-1) /* Abandon the migration entirely +* and tidy up. */ +#define XGS_POLICY_CONTINUE_PRECOPY 0 /* Remain in the precopy phase. */ +#define XGS_POLICY_STOP_AND_COPY1 /* Immediately suspend and transmit the +* remaining dirty pages. */ +precopy_policy_t precopy_policy; + +/* + * Called after the guest's dirty pages have been * copied into an output buffer. * Callback function resumes the guest & the device model, * returns to xc_domain_save. @@ -55,7 +78,8 @@ struct save_callbacks { */ int (*postcopy)(void* data); -/* Called after the memory checkpoint has been flushed +/* + * Called after the memory checkpoint has been flushed * out into the network. Typical actions performed in this * callback include: * (a) send the saved device model state (for HVM guests), @@ -65,7 +89,8 @@ struct save_callbacks { * * returns: * 0: terminate checkpointing gracefully - * 1: take another checkpoint */ + * 1: take another checkpoint + */ int (*checkpoint)(void* data); /* diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22a..3635704 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -198,12 +198,10 @@ struct xc_sr_context /* Further debugging information in the stream. */ bool debug; -/* Parameters for tweaking live migration. */ -unsigned max_iterations; -unsigned dirty_threshold; - unsigned long p2m_size; +struct precopy_stats stats; + xen_pfn_t *batch_pfns; unsigned nr_batch_pfns; unsigned long *
[Xen-devel] [PATCH v2 1/3] Tidy libxc xc_domain_save
Tidy up libxc's xc_domain_save, removing unused paramaters max_iters and max_factor, making matching changes to libxl. Signed-off-by: Joshua Otto <jto...@uwaterloo.ca> Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> Reviewed-by: Paul Durrant <paul.durr...@citrix.com> Acked-by: Wei Liu <wei.l...@citrix.com> --- tools/libxc/include/xenguest.h | 4 ++-- tools/libxc/xc_nomigrate.c | 3 +-- tools/libxc/xc_sr_save.c | 8 +++- tools/libxl/libxl_save_callout.c | 4 ++-- tools/libxl/libxl_save_helper.c | 7 ++- 5 files changed, 10 insertions(+), 16 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 5cd8111..6626f0c 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -100,8 +100,8 @@ typedef enum { *doesn't use checkpointing * @return 0 on success, -1 on failure */ -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, - uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */, +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, + uint32_t flags /* XCFLAGS_xxx */, struct save_callbacks* callbacks, int hvm, xc_migration_stream_t stream_type, int recv_fd); diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c index 317c8ce..fe8f68c 100644 --- a/tools/libxc/xc_nomigrate.c +++ b/tools/libxc/xc_nomigrate.c @@ -20,8 +20,7 @@ #include #include -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, - uint32_t max_factor, uint32_t flags, +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags, struct save_callbacks* callbacks, int hvm, xc_migration_stream_t stream_type, int recv_fd) { diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index ca6913b..1e7502d 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -916,9 +916,8 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) }; int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, - uint32_t max_iters, uint32_t max_factor, uint32_t flags, - struct save_callbacks* callbacks, int hvm, - xc_migration_stream_t stream_type, int recv_fd) + uint32_t flags, struct save_callbacks* callbacks, + int hvm, xc_migration_stream_t stream_type, int recv_fd) { struct xc_sr_context ctx = { @@ -955,8 +954,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, if ( ctx.save.checkpointed == XC_MIG_STREAM_COLO ) assert(callbacks->wait_checkpoint); -DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d", -io_fd, dom, max_iters, max_factor, flags, hvm); +DPRINTF("fd %d, dom %u, flags %u, hvm %d", io_fd, dom, flags, hvm); if ( xc_domain_getinfo(xch, dom, 1, ) != 1 ) { diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index 891c669..6452d70 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -89,8 +89,8 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss, libxl__srm_callout_enumcallbacks_save(>callbacks.save.a); const unsigned long argnums[] = { -dss->domid, 0, 0, dss->xcflags, dss->hvm, -cbflags, dss->checkpointed_stream, +dss->domid, dss->xcflags, dss->hvm, cbflags, +dss->checkpointed_stream, }; shs->ao = ao; diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c index 1dece23..38089a0 100644 --- a/tools/libxl/libxl_save_helper.c +++ b/tools/libxl/libxl_save_helper.c @@ -251,8 +251,6 @@ int main(int argc, char **argv) io_fd = atoi(NEXTARG); recv_fd = atoi(NEXTARG); uint32_t dom = strtoul(NEXTARG,0,10); -uint32_t max_iters =strtoul(NEXTARG,0,10); -uint32_t max_factor = strtoul(NEXTARG,0,10); uint32_t flags =strtoul(NEXTARG,0,10); int hvm = atoi(NEXTARG); unsigned cbflags = strtoul(NEXTARG,0,10); @@ -264,9 +262,8 @@ int main(int argc, char **argv) startup("save"); setup_signals(save_signal_handler); -r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags, - _save_callbacks, hvm, stream_type, - recv_fd); +r = xc_domain_save(xch, io_fd, dom, flags, _save_callbacks, + hvm, stream_type, recv_fd); complete(r); } else if (!strcmp(mode,"--restore-domain&q
[Xen-devel] [PATCH v2 0/3] Introduce migration precopy policy
Hi all, v2: Have tidied some formatting, s.o.b Joshua, and added a RFC patch showing how to use this in libxl. v1: Here I present a updated/modified subset of patch 7/20, part of Joshua Otto's "Add postcopy live migration support." patch, dated 27th March 2017. As indicated on the original thread, I wish to make use of this this within the XenServer product, and hence am trying to get this subset pulled in now. I also hope this will aid Josh in pushing the remainder of his series. Here I present two patches, the first which does some tidy up, removing unused and unhelpful paramaters to xc_domain_save(), and the second which allows a precopy callback to be specified, providing the test for when to end the live phase of migration should end. If none is provided, a default policy of the current behaviour is used. Jennifer Herbert (3): Tidy libxc xc_domain_save Introduce migration precopy policy RFC: migration: defer precopy policy to libxl tools/libxc/include/xenguest.h | 35 +++-- tools/libxc/xc_nomigrate.c | 3 +- tools/libxc/xc_sr_common.h | 6 +-- tools/libxc/xc_sr_save.c | 105 - tools/libxl/libxl_dom_save.c | 20 +++ tools/libxl/libxl_save_callout.c | 4 +- tools/libxl/libxl_save_helper.c| 7 +-- tools/libxl/libxl_save_msgs_gen.pl | 4 +- 8 files changed, 130 insertions(+), 54 deletions(-) -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/2] Introduce migration precopy policy
On 18/09/17 14:30, Ian Jackson wrote: Jennifer Herbert writes ("[PATCH 2/2] Introduce migration precopy policy"): This Patch allows a migration precopy policy to be specified. But only for direct libxc users. How do you think this should be exposed via libxl ? The original patch had a small amount of code to pass this though such that libxl could provide one. I'm not entity sure how the perl code worked, but I've not made any changes that wouldn't stop this working. That section would form a follow on patch. Since I don't use or have much understanding of libxl, I didn't really feel I was the best person to do this. -jenny The general approach, so far, seems sound. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/2] Tidy libxc xc_domain_save
Tidy up libxc's xc_domain_save, removing unused paramaters max_iters and max_factor, making matching changes to libxl. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> --- tools/libxc/include/xenguest.h | 4 ++-- tools/libxc/xc_nomigrate.c | 3 +-- tools/libxc/xc_sr_save.c | 8 +++- tools/libxl/libxl_save_callout.c | 4 ++-- tools/libxl/libxl_save_helper.c | 7 ++- 5 files changed, 10 insertions(+), 16 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 5cd8111..6626f0c 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -100,8 +100,8 @@ typedef enum { *doesn't use checkpointing * @return 0 on success, -1 on failure */ -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, - uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */, +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, + uint32_t flags /* XCFLAGS_xxx */, struct save_callbacks* callbacks, int hvm, xc_migration_stream_t stream_type, int recv_fd); diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c index 317c8ce..fe8f68c 100644 --- a/tools/libxc/xc_nomigrate.c +++ b/tools/libxc/xc_nomigrate.c @@ -20,8 +20,7 @@ #include #include -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, - uint32_t max_factor, uint32_t flags, +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags, struct save_callbacks* callbacks, int hvm, xc_migration_stream_t stream_type, int recv_fd) { diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index ca6913b..1e7502d 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -916,9 +916,8 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) }; int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, - uint32_t max_iters, uint32_t max_factor, uint32_t flags, - struct save_callbacks* callbacks, int hvm, - xc_migration_stream_t stream_type, int recv_fd) + uint32_t flags, struct save_callbacks* callbacks, + int hvm, xc_migration_stream_t stream_type, int recv_fd) { struct xc_sr_context ctx = { @@ -955,8 +954,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, if ( ctx.save.checkpointed == XC_MIG_STREAM_COLO ) assert(callbacks->wait_checkpoint); -DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d", -io_fd, dom, max_iters, max_factor, flags, hvm); +DPRINTF("fd %d, dom %u, flags %u, hvm %d", io_fd, dom, flags, hvm); if ( xc_domain_getinfo(xch, dom, 1, ) != 1 ) { diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index 891c669..6452d70 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -89,8 +89,8 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss, libxl__srm_callout_enumcallbacks_save(>callbacks.save.a); const unsigned long argnums[] = { -dss->domid, 0, 0, dss->xcflags, dss->hvm, -cbflags, dss->checkpointed_stream, +dss->domid, dss->xcflags, dss->hvm, cbflags, +dss->checkpointed_stream, }; shs->ao = ao; diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c index 1dece23..38089a0 100644 --- a/tools/libxl/libxl_save_helper.c +++ b/tools/libxl/libxl_save_helper.c @@ -251,8 +251,6 @@ int main(int argc, char **argv) io_fd = atoi(NEXTARG); recv_fd = atoi(NEXTARG); uint32_t dom = strtoul(NEXTARG,0,10); -uint32_t max_iters =strtoul(NEXTARG,0,10); -uint32_t max_factor = strtoul(NEXTARG,0,10); uint32_t flags =strtoul(NEXTARG,0,10); int hvm = atoi(NEXTARG); unsigned cbflags = strtoul(NEXTARG,0,10); @@ -264,9 +262,8 @@ int main(int argc, char **argv) startup("save"); setup_signals(save_signal_handler); -r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags, - _save_callbacks, hvm, stream_type, - recv_fd); +r = xc_domain_save(xch, io_fd, dom, flags, _save_callbacks, + hvm, stream_type, recv_fd); complete(r); } else if (!strcmp(mode,"--restore-domain")) { -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/2] Tidy libxc xc_domain_save
Tidy up libxc's xc_domain_save, removing unused paramaters max_iters and max_factor, making matching changes to libxl. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> --- tools/libxc/include/xenguest.h | 4 ++-- tools/libxc/xc_nomigrate.c | 3 +-- tools/libxc/xc_sr_save.c | 8 +++- tools/libxl/libxl_save_callout.c | 4 ++-- tools/libxl/libxl_save_helper.c | 7 ++- 5 files changed, 10 insertions(+), 16 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 5cd8111..6626f0c 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -100,8 +100,8 @@ typedef enum { *doesn't use checkpointing * @return 0 on success, -1 on failure */ -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, - uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */, +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, + uint32_t flags /* XCFLAGS_xxx */, struct save_callbacks* callbacks, int hvm, xc_migration_stream_t stream_type, int recv_fd); diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c index 317c8ce..fe8f68c 100644 --- a/tools/libxc/xc_nomigrate.c +++ b/tools/libxc/xc_nomigrate.c @@ -20,8 +20,7 @@ #include #include -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, - uint32_t max_factor, uint32_t flags, +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags, struct save_callbacks* callbacks, int hvm, xc_migration_stream_t stream_type, int recv_fd) { diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index ca6913b..1e7502d 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -916,9 +916,8 @@ static int save(struct xc_sr_context *ctx, uint16_t guest_type) }; int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, - uint32_t max_iters, uint32_t max_factor, uint32_t flags, - struct save_callbacks* callbacks, int hvm, - xc_migration_stream_t stream_type, int recv_fd) + uint32_t flags, struct save_callbacks* callbacks, + int hvm, xc_migration_stream_t stream_type, int recv_fd) { struct xc_sr_context ctx = { @@ -955,8 +954,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, if ( ctx.save.checkpointed == XC_MIG_STREAM_COLO ) assert(callbacks->wait_checkpoint); -DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d", -io_fd, dom, max_iters, max_factor, flags, hvm); +DPRINTF("fd %d, dom %u, flags %u, hvm %d", io_fd, dom, flags, hvm); if ( xc_domain_getinfo(xch, dom, 1, ) != 1 ) { diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c index 891c669..6452d70 100644 --- a/tools/libxl/libxl_save_callout.c +++ b/tools/libxl/libxl_save_callout.c @@ -89,8 +89,8 @@ void libxl__xc_domain_save(libxl__egc *egc, libxl__domain_save_state *dss, libxl__srm_callout_enumcallbacks_save(>callbacks.save.a); const unsigned long argnums[] = { -dss->domid, 0, 0, dss->xcflags, dss->hvm, -cbflags, dss->checkpointed_stream, +dss->domid, dss->xcflags, dss->hvm, cbflags, +dss->checkpointed_stream, }; shs->ao = ao; diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c index 1dece23..38089a0 100644 --- a/tools/libxl/libxl_save_helper.c +++ b/tools/libxl/libxl_save_helper.c @@ -251,8 +251,6 @@ int main(int argc, char **argv) io_fd = atoi(NEXTARG); recv_fd = atoi(NEXTARG); uint32_t dom = strtoul(NEXTARG,0,10); -uint32_t max_iters =strtoul(NEXTARG,0,10); -uint32_t max_factor = strtoul(NEXTARG,0,10); uint32_t flags =strtoul(NEXTARG,0,10); int hvm = atoi(NEXTARG); unsigned cbflags = strtoul(NEXTARG,0,10); @@ -264,9 +262,8 @@ int main(int argc, char **argv) startup("save"); setup_signals(save_signal_handler); -r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags, - _save_callbacks, hvm, stream_type, - recv_fd); +r = xc_domain_save(xch, io_fd, dom, flags, _save_callbacks, + hvm, stream_type, recv_fd); complete(r); } else if (!strcmp(mode,"--restore-domain")) { -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/2] Introduce migration precopy policy
This Patch allows a migration precopy policy to be specified. The precopy phase of the xc_domain_save() live migration algorithm has historically been implemented to run until either a) (almost) no pages are dirty or b) some fixed, hard-coded maximum number of precopy iterations has been exceeded. This policy and its implementation are less than ideal for a few reasons: - the logic of the policy is intertwined with the control flow of the mechanism of the precopy stage - it can't take into account facts external to the immediate migration context, such external state transfer state, interactive user input, or the passage of wall-clock time. - it does not permit the user to change their mind, over time, about what to do at the end of the precopy (they get an unconditional transition into the stop-and-copy phase of the migration) To permit callers to implement arbitrary higher-level policies governing when the live migration precopy phase should end, and what should be done next: - add a precopy_policy() callback to the xc_domain_save() user-supplied callbacks - during the precopy phase of live migrations, consult this policy after each batch of pages transmitted and take the dictated action, which may be to a) abort the migration entirely, b) continue with the precopy, or c) proceed to the stop-and-copy phase. - provide an implementation of the old policy, used when precopy_policy callback is not provided. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> --- This is updated/modified subset of patch 7/20, part of Joshua Otto's "Add postcopy live migration support." patch, dated 27th March 2017. As indicated on the original thread, I wish to make use of this this within the XenServer product. I hope this will aid Josh in pushing the remainder of his series. --- tools/libxc/include/xenguest.h | 19 tools/libxc/xc_sr_common.h | 7 ++- tools/libxc/xc_sr_save.c | 102 + 3 files changed, 94 insertions(+), 34 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 6626f0c..d5908dc 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -39,6 +39,14 @@ */ struct xenevtchn_handle; +/* For save's precopy_policy(). */ +struct precopy_stats +{ +unsigned iteration; +unsigned total_written; +long dirty_count; /* -1 if unknown */ +}; + /* callbacks provided by xc_domain_save */ struct save_callbacks { /* Called after expiration of checkpoint interval, @@ -46,6 +54,17 @@ struct save_callbacks { */ int (*suspend)(void* data); +/* Called after every batch of page data sent during the precopy phase of a + * live migration to ask the caller what to do next based on the current + * state of the precopy migration. + */ +#define XGS_POLICY_ABORT (-1) /* Abandon the migration entirely and +* tidy up. */ +#define XGS_POLICY_CONTINUE_PRECOPY 0 /* Remain in the precopy phase. */ +#define XGS_POLICY_STOP_AND_COPY1 /* Immediately suspend and transmit the +* remaining dirty pages. */ +int (*precopy_policy)(struct precopy_stats stats, void *data); + /* Called after the guest's dirty pages have been * copied into an output buffer. * Callback function resumes the guest & the device model, diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22a..2bc261b 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -198,12 +198,11 @@ struct xc_sr_context /* Further debugging information in the stream. */ bool debug; -/* Parameters for tweaking live migration. */ -unsigned max_iterations; -unsigned dirty_threshold; - unsigned long p2m_size; +struct precopy_stats stats; +int policy_decision; + xen_pfn_t *batch_pfns; unsigned nr_batch_pfns; unsigned long *deferred_pages; diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index 1e7502d..03dfa61 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -452,8 +452,7 @@ static int update_progress_string(struct xc_sr_context *ctx, xc_interface *xch = ctx->xch; char *new_str = NULL; -if ( asprintf(_str, "Frames iteration %u of %u", - iter, ctx->save.max_iterations) == -1 ) +if ( asprintf(_str, "Frames iteration %u", iter) == -1 ) { PERROR("Unable to allocate new progress string"); return -1; @@ -467,6 +466,25 @@ static int update_progress_string(struct xc_sr_context *ctx, } /* + * This is the live migration precopy policy - it's called periodically during + * the precopy phase of live migrations, and is responsible for deciding whe
[Xen-devel] [PATCH 0/2] Introduce migration precopy policy
Hi all, Here I present a updated/modified subset of patch 7/20, part of Joshua Otto's "Add postcopy live migration support." patch, dated 27th March 2017. As indicated on the original thread, I wish to make use of this this within the XenServer product, and hence am trying to get this subset pulled in now. I also hope this will aid Josh in pushing the remainder of his series. Here I present two patches, the first which does some tidy up, removing unused and unhelpful paramaters to xc_domain_save(), and the second which allows a precopy callback to be specified, providing the test for when to end the live phase of migration should end. If none is provided, a default policy of the current behaviour is used. Cheers, Jennifer Herbert (2): Tidy libxc xc_domain_save Introduce migration precopy policy tools/libxc/include/xenguest.h | 23 +++- tools/libxc/xc_nomigrate.c | 3 +- tools/libxc/xc_sr_common.h | 7 ++- tools/libxc/xc_sr_save.c | 110 ++- tools/libxl/libxl_save_callout.c | 4 +- tools/libxl/libxl_save_helper.c | 7 +-- 6 files changed, 104 insertions(+), 50 deletions(-) -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/2] Introduce migration precopy policy
Hi all, Here I present a updated/modified subset of patch 7/20, part of Joshua Otto's "Add postcopy live migration support." patch, dated 27th March 2017. As indicated on the original thread, I wish to make use of this this within the XenServer product, and hence am trying to get this subset pulled in now. I also hope this will aid Josh in pushing the remainder of his series. Here I present two patches, the first which does some tidy up, removing unused and unhelpful paramaters to xc_domain_save(), and the second which allows a precopy callback to be specified, providing the test for when to end the live phase of migration should end. If none is provided, a default policy of the current behaviour is used. Cheers, Jennifer Herbert (2): Tidy libxc xc_domain_save Introduce migration precopy policy tools/libxc/include/xenguest.h | 23 +++- tools/libxc/xc_nomigrate.c | 3 +- tools/libxc/xc_sr_common.h | 7 ++- tools/libxc/xc_sr_save.c | 110 ++- tools/libxl/libxl_save_callout.c | 4 +- tools/libxl/libxl_save_helper.c | 7 +-- 6 files changed, 104 insertions(+), 50 deletions(-) -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/2] Introduce migration precopy policy
This Patch allows a migration precopy policy to be specified. The precopy phase of the xc_domain_save() live migration algorithm has historically been implemented to run until either a) (almost) no pages are dirty or b) some fixed, hard-coded maximum number of precopy iterations has been exceeded. This policy and its implementation are less than ideal for a few reasons: - the logic of the policy is intertwined with the control flow of the mechanism of the precopy stage - it can't take into account facts external to the immediate migration context, such external state transfer state, interactive user input, or the passage of wall-clock time. - it does not permit the user to change their mind, over time, about what to do at the end of the precopy (they get an unconditional transition into the stop-and-copy phase of the migration) To permit callers to implement arbitrary higher-level policies governing when the live migration precopy phase should end, and what should be done next: - add a precopy_policy() callback to the xc_domain_save() user-supplied callbacks - during the precopy phase of live migrations, consult this policy after each batch of pages transmitted and take the dictated action, which may be to a) abort the migration entirely, b) continue with the precopy, or c) proceed to the stop-and-copy phase. - provide an implementation of the old policy, used when precopy_policy callback is not provided. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> --- This is updated/modified subset of patch 7/20, part of Joshua Otto's "Add postcopy live migration support." patch, dated 27th March 2017. As indicated on the original thread, I wish to make use of this this within the XenServer product. I hope this will aid Josh in pushing the remainder of his series. --- tools/libxc/include/xenguest.h | 19 tools/libxc/xc_sr_common.h | 7 ++- tools/libxc/xc_sr_save.c | 102 + 3 files changed, 94 insertions(+), 34 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 6626f0c..d5908dc 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -39,6 +39,14 @@ */ struct xenevtchn_handle; +/* For save's precopy_policy(). */ +struct precopy_stats +{ +unsigned iteration; +unsigned total_written; +long dirty_count; /* -1 if unknown */ +}; + /* callbacks provided by xc_domain_save */ struct save_callbacks { /* Called after expiration of checkpoint interval, @@ -46,6 +54,17 @@ struct save_callbacks { */ int (*suspend)(void* data); +/* Called after every batch of page data sent during the precopy phase of a + * live migration to ask the caller what to do next based on the current + * state of the precopy migration. + */ +#define XGS_POLICY_ABORT (-1) /* Abandon the migration entirely and +* tidy up. */ +#define XGS_POLICY_CONTINUE_PRECOPY 0 /* Remain in the precopy phase. */ +#define XGS_POLICY_STOP_AND_COPY1 /* Immediately suspend and transmit the +* remaining dirty pages. */ +int (*precopy_policy)(struct precopy_stats stats, void *data); + /* Called after the guest's dirty pages have been * copied into an output buffer. * Callback function resumes the guest & the device model, diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h index a83f22a..2bc261b 100644 --- a/tools/libxc/xc_sr_common.h +++ b/tools/libxc/xc_sr_common.h @@ -198,12 +198,11 @@ struct xc_sr_context /* Further debugging information in the stream. */ bool debug; -/* Parameters for tweaking live migration. */ -unsigned max_iterations; -unsigned dirty_threshold; - unsigned long p2m_size; +struct precopy_stats stats; +int policy_decision; + xen_pfn_t *batch_pfns; unsigned nr_batch_pfns; unsigned long *deferred_pages; diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c index 1e7502d..03dfa61 100644 --- a/tools/libxc/xc_sr_save.c +++ b/tools/libxc/xc_sr_save.c @@ -452,8 +452,7 @@ static int update_progress_string(struct xc_sr_context *ctx, xc_interface *xch = ctx->xch; char *new_str = NULL; -if ( asprintf(_str, "Frames iteration %u of %u", - iter, ctx->save.max_iterations) == -1 ) +if ( asprintf(_str, "Frames iteration %u", iter) == -1 ) { PERROR("Unable to allocate new progress string"); return -1; @@ -467,6 +466,25 @@ static int update_progress_string(struct xc_sr_context *ctx, } /* + * This is the live migration precopy policy - it's called periodically during + * the precopy phase of live migrations, and is responsible for deciding whe
[Xen-devel] QEMU XenServer/XenProject Working group meeting 10th May 2017
QEMU XenServer/XenProject Working group meeting 5th May 2017 = Attendees: * Paul Durrant * Andrew Cooper * Ian Jackson * Jenny Herbert * Igor Druzhinin * Simon Crow * Marcus Granado Reviewed previous action points * Paul to carry on with Xen device model work. - Done. * Ian - No progress with re factoring libxl split, XenStore restrictions or using and FD for QMP cd insert and emulated hotplug, due to feature freeze on 4.8.9. * Andrew to look over Ian's patch series to see how bad extra XenStore permission would be. - No. * Jenny – XenServer is continuing to make slow progress. We have XenCentre patch that means it can talk to QEMU trad, working on some bugs. Various other small bit of work done. Andrew to double check dm-op sub ops for continuation support. XenServer uses XenStore to communicate the VNC QEMU clipboard to the guest and back. It was concluded this isnt nice, as has caused veriose security problems in the past. New plan is to implement this using a shared page. Next XenStore restriction was discussed. Including an approach involving using a shared ring buffer and removing the previous xs-restrict. Instead implementing an xs restriction prefixing command within the xenbus command. Slightly fiddly, but small number of parts. Its doable and not controversial. Andrew had a diffident plan, to remove any use of XenStore from QEMU – specifically for the hvm case. Currently little us in the hvm case. Two uses should be trivial to implement in QMP, and the other is physmap, which Igor has been working on. Its agreed the phymap key needs removing, and that removing the last few uses of XenStore would be a good idea, however, if it seemed that dealing with the physmap issue where to take a long time, the xs-ristrict could be used as an intermidiate plan, which would allow this project to move on while physmap or any other complications are being sorted out. Andrew is not convinced that allowing a guest to fiddle with the phymap keys (as wolud have to be permitted under the xs-restict scheme) would be safe. Certainly the guest could destroy itself, which in itself is permit-able, but as it changes the interval virtual memory layout for QEMU, there might be some exploit hidden there – this would need to be checked. The conversation turns to the physmap keys, being the biggest XenStore concern. Andrew suggests that the correct way to fix it in a compatible way would be if grant map foreign took in a pointer that would be nmaped. The pointer would allow the range to be mapped exactly where it pointed. A new library call to xen foreign map could be written, but the complication of needing compatibility for QEMU with older versions of Xen was raised. Ian suggested that this wasn’t really a problem, since we could do the old thing with the old versions. Libxl already tests for the version of QEMU, and so if it finds this too old, it does the phymap keys thing, otherwise, it can use QEMU-depriv including the the new physmap mechanism. The new physmap mechanism would not require a substitute for the physmap keys, as QEMU already has all the information it needs. The keys where entirely to allow two parts of startup logic in QEMU to be reversed. Andrew says that Igor has a solution but doestn think its upstreamable. Ian suggests that given the size of the xs-restrict patch queue, if can fix the physmap issue relatively easily, we should, and then he could drop most of the xs-restict patchque. Ian offers to help with the libxl part of the phymap work. Igor explains an approach he tried hit a block of QXL, which initialised certain registers and write then into various memory locations. He tried in each place to insert an 'if Xen and runstate=resuming', doent touch. But the fix is very intrusive in terms of QXL code, and so he didn’t try to upstream it. The idea of using helper functions was discussed. Other helper functions have been used before for similar problems. A function could be created to access vram, instead of just writing to a pointer. All the access parts of QXL code could eb redirected though this helper function, and that would be likely ustreamable. In particular, its been suggested before that helper function could be used for range checking. They could be added for range checking, and then also modified for with the 'if xen and resuming' clauses. Igor explains how another approach he had been looking at was to change the order of QEMU startup, and move the memory map, and actually map the memory where QEMU expects it to be mapped. Here again, there was the issue of compatibility. The compact layer is still needed to work with old versions of libxc. It was discussed how it would have QEMU would have to be able to work with both skews. It could be decided at compile time, as you at this point if you have the new function call. It the new function call is not available, it would
Re: [Xen-devel] [PATCH v8 for-4.9 1/5] hvm/dmop: Box dmop_args rather than passing multiple parameters around
Hi Julien, This is extending an existing feature. Once 4.9 is released, the existing feature will be frozen, and the only way to later get the extra functionality would be to created a completely new dm_op, which does something very similar to an existing one. Although not the end of the world, this wouldnt look so nice. The benefits of the feature are that a VM can request multiple extents to be marked as modified at once, without having to loop though them, calling the existing call many many times. This will be more efficient and faster. As an extra, additional accessors have been created for dm_op operations, which new dm_ops can take advantage of. The benefits of introducing the feature for 4.9 as opposed to later is that we wont' have to support the same feature, with multiple dm_opts with varying parameters - which as well as looking less good, also unnesseserily bloats the code. I think risks are low, with a minor, affecting dm_op operations only. The core change, in 5/5 will only affect the modified memory call, which has been tested. The remaining patches are to tidy up and fix existing behaviour. -jenny On 21/04/17 15:17, Julien Grall wrote: Hi Jennifer, I don't see any cover letter for this series, so I will answer here. Looking at the code, it looks like a new feature rather than a bug fix. Am I right? Could you explain what would be the benefits and risks to get this code in the release? I also like to hear the opinion of the x86 maintainers about getting this code in Xen 4.9. Cheers, On 21/04/17 15:05, jennifer.herb...@citrix.com wrote: From: Jennifer Herbert <jennifer.herb...@citrix.com> No functional change. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> Reviewed-by: Jan Beulich <jbeul...@suse.com> Reviewed-by: Paul Durrant <paul.durr...@citrix.com> -- CC: Paul Durrant <paul.durr...@citrix.com> CC: Andrew Cooper <andrew.coop...@citrix.com> CC: Jan Beulich <jbeul...@suse.com> CC: Julien Grall <julien.gr...@arm.com> --- No change. --- xen/arch/x86/hvm/dm.c | 49 + 1 file changed, 29 insertions(+), 20 deletions(-) diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index d72b7bd..e583e41 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -25,6 +25,13 @@ #include +struct dmop_args { +domid_t domid; +unsigned int nr_bufs; +/* Reserve enough buf elements for all current hypercalls. */ +struct xen_dm_op_buf buf[2]; +}; + static bool copy_buf_from_guest(const xen_dm_op_buf_t bufs[], unsigned int nr_bufs, void *dst, unsigned int idx, size_t dst_size) @@ -56,7 +63,7 @@ static bool copy_buf_to_guest(const xen_dm_op_buf_t bufs[], } static int track_dirty_vram(struct domain *d, xen_pfn_t first_pfn, -unsigned int nr, struct xen_dm_op_buf *buf) +unsigned int nr, const struct xen_dm_op_buf *buf) { if ( nr > (GB(1) >> PAGE_SHIFT) ) return -EINVAL; @@ -287,16 +294,14 @@ static int inject_event(struct domain *d, return 0; } -static int dm_op(domid_t domid, - unsigned int nr_bufs, - xen_dm_op_buf_t bufs[]) +static int dm_op(const struct dmop_args *op_args) { struct domain *d; struct xen_dm_op op; bool const_op = true; long rc; -rc = rcu_lock_remote_domain_by_id(domid, ); +rc = rcu_lock_remote_domain_by_id(op_args->domid, ); if ( rc ) return rc; @@ -307,7 +312,7 @@ static int dm_op(domid_t domid, if ( rc ) goto out; -if ( !copy_buf_from_guest(bufs, nr_bufs, , 0, sizeof(op)) ) +if ( !copy_buf_from_guest(_args->buf[0], op_args->nr_bufs, , 0, sizeof(op)) ) { rc = -EFAULT; goto out; @@ -466,10 +471,10 @@ static int dm_op(domid_t domid, if ( data->pad ) break; -if ( nr_bufs < 2 ) +if ( op_args->nr_bufs < 2 ) break; -rc = track_dirty_vram(d, data->first_pfn, data->nr, [1]); +rc = track_dirty_vram(d, data->first_pfn, data->nr, _args->buf[1]); break; } @@ -564,7 +569,7 @@ static int dm_op(domid_t domid, if ( (!rc || rc == -ERESTART) && !const_op && - !copy_buf_to_guest(bufs, nr_bufs, 0, , sizeof(op)) ) + !copy_buf_to_guest(_args->buf[0], op_args->nr_bufs, 0, , sizeof(op)) ) rc = -EFAULT; out: @@ -587,20 +592,21 @@ CHECK_dm_op_set_mem_type; CHECK_dm_op_inject_event; CHECK_dm_op_inject_msi; -#define MAX_NR_BUFS 2 - int compat_dm_op(domid_t domid, unsigned int nr_bufs, XEN_GUEST_HANDLE_PARAM(void) bufs) { -struct xen_dm_op_buf nat[MAX_NR_
Re: [Xen-devel] [PATCH 2/4] hvm/dmop: Implement copy_{to, from}_guest_buf() in terms of raw accessors
On 21/04/17 11:38, Jan Beulich wrote: On 21.04.17 at 12:11, <jennifer.herb...@citrix.com> wrote: On 21/04/17 10:46, Jan Beulich wrote: On 21.04.17 at 11:11, <andrew.coop...@citrix.com> wrote: On 21/04/2017 09:54, Andrew Cooper wrote: On 21/04/2017 08:27, Jan Beulich wrote: On 20.04.17 at 19:59, <jennifer.herb...@citrix.com> wrote: From: Jennifer Herbert <jennifer.herb...@citrix.com> Is this correct, considering that iirc the patch was new in v5 and ... This also allows the usual cases to be simplified, by omitting an unnecessary buf parameters, and because the macros can appropriately size the object. This makes copying to or from a buf that isn't big enough an error. If the buffer isnt big enough, trying to carry on regardless can only cause trouble later on. Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> ... this sequence of S-o-b-s? --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -32,36 +32,47 @@ struct dmop_args { struct xen_dm_op_buf buf[2]; }; -static bool copy_buf_from_guest(const xen_dm_op_buf_t bufs[], -unsigned int nr_bufs, void *dst, -unsigned int idx, size_t dst_size) +static bool _raw_copy_from_guest_buf(void *dst, + const struct dmop_args *args, + unsigned int buf_idx, + size_t dst_bytes) { -size_t size; +size_t buf_bytes; -if ( idx >= nr_bufs ) +if ( buf_idx >= args->nr_bufs ) return false; -memset(dst, 0, dst_size); +buf_bytes = args->buf[buf_idx].size; -size = min_t(size_t, dst_size, bufs[idx].size); +if ( dst_bytes > buf_bytes ) +return false; While this behavioral change is now being mentioned in the description, I'm not sure I buy the argument of basically being guaranteed to cause trouble down the road. Did you consider the forward compatibility aspect here, allowing us to extend interface structures by adding fields to their ends without breaking old callers? Paul, what are your thoughts here? DMOP is a stable ABI. There is no legal extending of any objects. The previous semantics are guaranteed to break the ABI with future subops, which is why I removed it. In the case that the guest originally passed an overly long buffer, and someone tried be "clever" here passing the same object here expecting _raw_copy_from_guest_buf() to DTRT, the function will end up copying too much data from the guest, and you will end up with something the guest wasn't intending to be part of the structure replacing the zero extension. New subops which take a longer object should use a brand new object. $MAGIC compatibility logic like you want has no business living in the copy helper. Had I spotted the intention during the original dmop series, I would have rejected it during review. Actually, the existing behaviour is already broken. If a guest passes an overly short buf[0], the dmop logic won't get a failure, and instead get a truncated structure which has been zero extended. This is very definitely the wrong thing to do, because such a truncated structure might actually contain some legitimate operations. So what, if that's what the caller meant to happen? Considering this is a controversial change, I think it is a bad idea to merge this into the otherwise uncontroversial change here. How about we move the min(..) and memset out of the helper function, and into dm_op? Wouldn't that result in different buffers being treated differently, which I'd rather not see happening? I think in the general case, its wrong to assume that its ok to truncate a buffer. In specific cases, such as buf[0], we don't really know how big the buf needs to be until we start parsing it. In this case, we want to say 'give me what you have, upto this maximum'. This is the special case, not the general one, an as such does not belong in the general case accesser helper. Certainly when we get to updating the accesser helper to deal with offsets, its get more confused. If we give it an offset greater then the size of the buffer, do we just memset? That would seem wrong. With an offset of 0, we want it to do the memset thing, to deal with buf[0]. But what about if we have an offset. I'd think the calling function would really want to know if its got nonsense. Another solution would be to have two variants of the helper function. A 'normal' one, and the best effort one. -jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/4] hvm/dmop: Implement copy_{to, from}_guest_buf() in terms of raw accessors
On 21/04/17 10:46, Jan Beulich wrote: On 21.04.17 at 11:11, <andrew.coop...@citrix.com> wrote: On 21/04/2017 09:54, Andrew Cooper wrote: On 21/04/2017 08:27, Jan Beulich wrote: On 20.04.17 at 19:59, <jennifer.herb...@citrix.com> wrote: From: Jennifer Herbert <jennifer.herb...@citrix.com> Is this correct, considering that iirc the patch was new in v5 and ... This also allows the usual cases to be simplified, by omitting an unnecessary buf parameters, and because the macros can appropriately size the object. This makes copying to or from a buf that isn't big enough an error. If the buffer isnt big enough, trying to carry on regardless can only cause trouble later on. Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com> Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> ... this sequence of S-o-b-s? --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -32,36 +32,47 @@ struct dmop_args { struct xen_dm_op_buf buf[2]; }; -static bool copy_buf_from_guest(const xen_dm_op_buf_t bufs[], -unsigned int nr_bufs, void *dst, -unsigned int idx, size_t dst_size) +static bool _raw_copy_from_guest_buf(void *dst, + const struct dmop_args *args, + unsigned int buf_idx, + size_t dst_bytes) { -size_t size; +size_t buf_bytes; -if ( idx >= nr_bufs ) +if ( buf_idx >= args->nr_bufs ) return false; -memset(dst, 0, dst_size); +buf_bytes = args->buf[buf_idx].size; -size = min_t(size_t, dst_size, bufs[idx].size); +if ( dst_bytes > buf_bytes ) +return false; While this behavioral change is now being mentioned in the description, I'm not sure I buy the argument of basically being guaranteed to cause trouble down the road. Did you consider the forward compatibility aspect here, allowing us to extend interface structures by adding fields to their ends without breaking old callers? Paul, what are your thoughts here? DMOP is a stable ABI. There is no legal extending of any objects. The previous semantics are guaranteed to break the ABI with future subops, which is why I removed it. In the case that the guest originally passed an overly long buffer, and someone tried be "clever" here passing the same object here expecting _raw_copy_from_guest_buf() to DTRT, the function will end up copying too much data from the guest, and you will end up with something the guest wasn't intending to be part of the structure replacing the zero extension. New subops which take a longer object should use a brand new object. $MAGIC compatibility logic like you want has no business living in the copy helper. Had I spotted the intention during the original dmop series, I would have rejected it during review. Actually, the existing behaviour is already broken. If a guest passes an overly short buf[0], the dmop logic won't get a failure, and instead get a truncated structure which has been zero extended. This is very definitely the wrong thing to do, because such a truncated structure might actually contain some legitimate operations. So what, if that's what the caller meant to happen? Considering this is a controversial change, I think it is a bad idea to merge this into the otherwise uncontroversial change here. How about we move the min(..) and memset out of the helper function, and into dm_op? ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 for-4.9 1/4] hvm/dmop: Box dmop_bufs rather than passing two parameters around
On 10/04/17 10:40, Paul Durrant wrote: -Original Message- From: Andrew Cooper Sent: 10 April 2017 10:29 To: Paul Durrant <paul.durr...@citrix.com>; Xen-devel Cc: Jennifer Herbert <jennifer.herb...@citrix.com>; Jan Beulich <jbeul...@suse.com>; Julien Grall <julien.gr...@arm.com> Subject: Re: [PATCH v5 for-4.9 1/4] hvm/dmop: Box dmop_bufs rather than passing two parameters around On 10/04/17 10:04, Paul Durrant wrote: -Original Message- From: Andrew Cooper [mailto:andrew.coop...@citrix.com] Sent: 07 April 2017 20:36 To: Xen-devel <xen-devel@lists.xen.org> Cc: Jennifer Herbert <jennifer.herb...@citrix.com>; Andrew Cooper <andrew.coop...@citrix.com>; Jan Beulich <jbeul...@suse.com>; Paul Durrant <paul.durr...@citrix.com>; Julien Grall <julien.gr...@arm.com> Subject: [PATCH v5 for-4.9 1/4] hvm/dmop: Box dmop_bufs rather than passing two parameters around From: Jennifer Herbert <jennifer.herb...@citrix.com> No functional change. Why is this a good thing? Passing two parameters around allowed for them to be in registers. I preferred the code as it was before. a) It will always be inlined, so registers aren't relevant. Why? I see nothing forcing the compiler to make it so. Even if they were, all values are available directly with the pointer as a base, so there is no reduction in expressiveness. (i.e. the previous code only increases register pressure). b) passing multiple parameters like that is a recipe for mistakes, and in this case, mistakes mean security vulnerabilities. Given the locality of the code I don't buy that as an argument unless you're going to assert that passing more than one parameter is always wrong. To two parameters should go everywhere together - they work together, and should never be separated. But by having them separate, you end up forcing people to think about why do I need this nr_buffs, even though in many situations it looks superfluous. Its mostly there as a check, and not part of the core flow of how the code works. It just confuses and clutters things to have it separate. If its left separate, its possible someone will feed in a different, but similar variable for this parameter, and its purpose defeated. c) It is necessary to make legible code for the later changes in this series. From my reading I don't think it would have an overly negative effect on the subsequent patches if this patch were dropped. No, but since we're having this debate, we clearly care about small negative effects too. -jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC 07/20] migration: defer precopy policy to libxl
Hi, I would like to encourage this patch - as I have use for it outside of your postcopy work. Some things people will comment on: You've used 'unsigned' without the int keyword, which people don't like. Also on line 324, your missing space between 'if (' and 'ctx->save.policy_decision'. Also, I'm not a fan of your CONSULT_POLICY macro, which you've defined at a odd point in your function, and I think could be done more elegantly. Worst of all ... its a macro - which I think should generally be avoided unless there is little choice. I'm sure you could write a helper function to replace this. Cheers, -jenny On 27/03/17 10:06, Joshua Otto wrote: The precopy phase of the xc_domain_save() live migration algorithm has historically been implemented to run until either a) (almost) no pages are dirty or b) some fixed, hard-coded maximum number of precopy iterations has been exceeded. This policy and its implementation are less than ideal for a few reasons: - the logic of the policy is intertwined with the control flow of the mechanism of the precopy stage - it can't take into account facts external to the immediate migration context, such as interactive user input or the passage of wall-clock time - it does not permit the user to change their mind, over time, about what to do at the end of the precopy (they get an unconditional transition into the stop-and-copy phase of the migration) To permit users to implement arbitrary higher-level policies governing when the live migration precopy phase should end, and what should be done next: - add a precopy_policy() callback to the xc_domain_save() user-supplied callbacks - during the precopy phase of live migrations, consult this policy after each batch of pages transmitted and take the dictated action, which may be to a) abort the migration entirely, b) continue with the precopy, or c) proceed to the stop-and-copy phase. - provide an implementation of the old policy as such a callback in libxl and plumb it through the IPC machinery to libxc, effectively maintaing the old policy for now Signed-off-by: Joshua Otto--- tools/libxc/include/xenguest.h | 23 - tools/libxc/xc_nomigrate.c | 3 +- tools/libxc/xc_sr_common.h | 7 +- tools/libxc/xc_sr_save.c | 194 ++--- tools/libxl/libxl_dom_save.c | 20 tools/libxl/libxl_save_callout.c | 3 +- tools/libxl/libxl_save_helper.c| 7 +- tools/libxl/libxl_save_msgs_gen.pl | 4 +- 8 files changed, 189 insertions(+), 72 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index aa8cc8b..30ffb6f 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -39,6 +39,14 @@ */ struct xenevtchn_handle; +/* For save's precopy_policy(). */ +struct precopy_stats +{ +unsigned iteration; +unsigned total_written; +long dirty_count; /* -1 if unknown */ +}; + /* callbacks provided by xc_domain_save */ struct save_callbacks { /* Called after expiration of checkpoint interval, @@ -46,6 +54,17 @@ struct save_callbacks { */ int (*suspend)(void* data); +/* Called after every batch of page data sent during the precopy phase of a + * live migration to ask the caller what to do next based on the current + * state of the precopy migration. + */ +#define XGS_POLICY_ABORT (-1) /* Abandon the migration entirely and +* tidy up. */ +#define XGS_POLICY_CONTINUE_PRECOPY 0 /* Remain in the precopy phase. */ +#define XGS_POLICY_STOP_AND_COPY1 /* Immediately suspend and transmit the +* remaining dirty pages. */ +int (*precopy_policy)(struct precopy_stats stats, void *data); + /* Called after the guest's dirty pages have been * copied into an output buffer. * Callback function resumes the guest & the device model, @@ -100,8 +119,8 @@ typedef enum { *doesn't use checkpointing * @return 0 on success, -1 on failure */ -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, - uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */, +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, + uint32_t flags /* XCFLAGS_xxx */, struct save_callbacks* callbacks, int hvm, xc_migration_stream_t stream_type, int recv_fd); diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c index 15c838f..2af64e4 100644 --- a/tools/libxc/xc_nomigrate.c +++ b/tools/libxc/xc_nomigrate.c @@ -20,8 +20,7 @@ #include #include -int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters, - uint32_t max_factor, uint32_t flags, +int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
[Xen-devel] [PATCH v4] dm_op: Add xendevicemodel_modified_memory_bulk.
From: Jennifer Herbert <jennifer.herb...@citrix.com> This new lib devicemodel call allows multiple extents of pages to be marked as modified in a single call. This is something needed for a usecase I'm working on. The xen side of the modified_memory call has been modified to accept an array of extents. The devicemodle library either provides an array of length 1, to support the original library function, or a second function allows an array to be provided. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> --- Changes as discussed on V3. Cc: Jan Beulich <jbeul...@suse.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Paul Durrant <paul.durr...@citrix.com> --- tools/libs/devicemodel/core.c | 30 -- tools/libs/devicemodel/include/xendevicemodel.h | 19 +++- xen/arch/x86/hvm/dm.c | 124 --- xen/include/public/hvm/dm_op.h | 19 +++- 4 files changed, 140 insertions(+), 52 deletions(-) diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c index a85cb49..f9e37a5 100644 --- a/tools/libs/devicemodel/core.c +++ b/tools/libs/devicemodel/core.c @@ -434,22 +434,36 @@ int xendevicemodel_track_dirty_vram( dirty_bitmap, (size_t)(nr + 7) / 8); } -int xendevicemodel_modified_memory( -xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn, -uint32_t nr) +int xendevicemodel_modified_memory_bulk( +xendevicemodel_handle *dmod, domid_t domid, +struct xen_dm_op_modified_memory_extent *extents, uint32_t nr) { struct xen_dm_op op; -struct xen_dm_op_modified_memory *data; +struct xen_dm_op_modified_memory *header; +size_t extents_size = nr * sizeof(struct xen_dm_op_modified_memory_extent); memset(, 0, sizeof(op)); op.op = XEN_DMOP_modified_memory; -data = _memory; +header = _memory; -data->first_pfn = first_pfn; -data->nr = nr; +header->nr_extents = nr; +header->pfns_processed = 0; -return xendevicemodel_op(dmod, domid, 1, , sizeof(op)); +return xendevicemodel_op(dmod, domid, 2, , sizeof(op), + extents, extents_size); +} + +int xendevicemodel_modified_memory( +xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn, +uint32_t nr) +{ +struct xen_dm_op_modified_memory_extent extent; + +extent.first_pfn = first_pfn; +extent.nr = nr; + +return xendevicemodel_modified_memory_bulk(dmod, domid, , 1); } int xendevicemodel_set_mem_type( diff --git a/tools/libs/devicemodel/include/xendevicemodel.h b/tools/libs/devicemodel/include/xendevicemodel.h index b3f600e..9c62bf9 100644 --- a/tools/libs/devicemodel/include/xendevicemodel.h +++ b/tools/libs/devicemodel/include/xendevicemodel.h @@ -236,8 +236,8 @@ int xendevicemodel_track_dirty_vram( uint32_t nr, unsigned long *dirty_bitmap); /** - * This function notifies the hypervisor that a set of domain pages - * have been modified. + * This function notifies the hypervisor that a set of contiguous + * domain pages have been modified. * * @parm dmod a handle to an open devicemodel interface. * @parm domid the domain id to be serviced @@ -250,6 +250,21 @@ int xendevicemodel_modified_memory( uint32_t nr); /** + * This function notifies the hypervisor that a set of discontiguous + * domain pages have been modified. + * + * @parm dmod a handle to an open devicemodel interface. + * @parm domid the domain id to be serviced + * @parm extents an array of extent structs, which each hold + a start_pfn and nr (number of pfns). + * @parm nr the number of extents in the array + * @return 0 on success, -1 on failure. + */ +int xendevicemodel_modified_memory_bulk( +xendevicemodel_handle *dmod, domid_t domid, +struct xen_dm_op_modified_memory_extent extents[], uint32_t nr); + +/** * This function notifies the hypervisor that a set of domain pages * are to be treated in a specific way. (See the definition of * hvmmem_type_t). diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index 2122c45..c4fa704 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -118,57 +118,108 @@ static int set_isa_irq_level(struct domain *d, uint8_t isa_irq, return 0; } -static int modified_memory(struct domain *d, - struct xen_dm_op_modified_memory *data) + +int copy_extent_from_guest_array(struct xen_dm_op_modified_memory_extent* extent, + xen_dm_op_buf_t* buf, unsigned int index) + { -xen_pfn_t last_pfn = data->first_pfn + data->nr - 1; -unsigned int iter = 0; -int rc = 0; +if ( (buf->size / sizeof(struct xen_dm_op_modified_memory_extent)) <= + index ) +return -EINVAL; -if ( (data->fir
Re: [Xen-devel] [PATCH v3] dm_op: Add xendevicemodel_modified_memory_bulk.
On 29/03/17 15:54, Jan Beulich wrote: On 28.03.17 at 15:18,wrote: @@ -441,13 +481,8 @@ static int dm_op(domid_t domid, struct xen_dm_op_modified_memory *data = _memory; -const_op = false; - -rc = -EINVAL; -if ( data->pad ) -break; - -rc = modified_memory(d, data); +rc = modified_memory(d, data, [1]); +const_op = (rc != 0); Isn't this wrong now, i.e. don't you need to copy back the header now in all cases? I only define what I'll set nr_extents to in case of error, and of course opaque is opaque. Well, but you do need the opaque value for the continuation, don't you? In which case you need to also write back on -ERESTART. And as you say you need to write back in case of error. So I'd expect const_op = !rc; Quite right, see you point now - I didn't notice I'd inverted the logic. -jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] dm_op: Add xendevicemodel_modified_memory_bulk.
On 29/03/17 11:38, Jan Beulich wrote: On 28.03.17 at 15:18,wrote: Perhaps drop "already"? Personally I also wouldn't mind you dropping the variable altogether and using header->opaque directly, but I guess that's too "opaque" for your taste? It would make the code too opaque for my taste, so I'll just drop the 'already' bit. @@ -441,13 +481,8 @@ static int dm_op(domid_t domid, struct xen_dm_op_modified_memory *data = _memory; -const_op = false; - -rc = -EINVAL; -if ( data->pad ) -break; - -rc = modified_memory(d, data); +rc = modified_memory(d, data, [1]); +const_op = (rc != 0); Isn't this wrong now, i.e. don't you need to copy back the header now in all cases? I only define what I'll set nr_extents to in case of error, and of course opaque is opaque. If I where to write back, I'd be writing back 0 to nr_extents - which wouldn’t really mean anything since I’m not defining the order for which I’m processing them. In fact the only thing it tells you is that extent 0 is the last one processed, which I don't think its all that useful. Ideally I'd prefer to leave it untouched on success, but the original value is lost on continuation, this would be more involved. By only writing back on error, I hoped to improve efficiency for the common case, (especially for existing use with calls of one extent). (I know its only a small difference) If you want me to write back - what do you want me to write back for success? Cheers, -jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3] dm_op: Add xendevicemodel_modified_memory_bulk.
From: Jennifer Herbert <jennifer.herb...@citrix.com> This new lib devicemodel call allows multiple extents of pages to be marked as modified in a single call. This is something needed for a usecase I'm working on. The xen side of the modified_memory call has been modified to accept an array of extents. The devicemodle library either provides an array of length 1, to support the original library function, or a second function allows an array to be provided. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Paul Durrant <paul.durr...@citrix.com> --- Changes as discussed on thread. tools/libs/devicemodel/core.c | 30 -- tools/libs/devicemodel/include/xendevicemodel.h | 19 +++- xen/arch/x86/hvm/dm.c | 115 +++ xen/include/public/hvm/dm_op.h | 22 - 4 files changed, 133 insertions(+), 53 deletions(-) diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c index a85cb49..f9e37a5 100644 --- a/tools/libs/devicemodel/core.c +++ b/tools/libs/devicemodel/core.c @@ -434,22 +434,36 @@ int xendevicemodel_track_dirty_vram( dirty_bitmap, (size_t)(nr + 7) / 8); } -int xendevicemodel_modified_memory( -xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn, -uint32_t nr) +int xendevicemodel_modified_memory_bulk( +xendevicemodel_handle *dmod, domid_t domid, +struct xen_dm_op_modified_memory_extent *extents, uint32_t nr) { struct xen_dm_op op; -struct xen_dm_op_modified_memory *data; +struct xen_dm_op_modified_memory *header; +size_t extents_size = nr * sizeof(struct xen_dm_op_modified_memory_extent); memset(, 0, sizeof(op)); op.op = XEN_DMOP_modified_memory; -data = _memory; +header = _memory; -data->first_pfn = first_pfn; -data->nr = nr; +header->nr_extents = nr; +header->pfns_processed = 0; -return xendevicemodel_op(dmod, domid, 1, , sizeof(op)); +return xendevicemodel_op(dmod, domid, 2, , sizeof(op), + extents, extents_size); +} + +int xendevicemodel_modified_memory( +xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn, +uint32_t nr) +{ +struct xen_dm_op_modified_memory_extent extent; + +extent.first_pfn = first_pfn; +extent.nr = nr; + +return xendevicemodel_modified_memory_bulk(dmod, domid, , 1); } int xendevicemodel_set_mem_type( diff --git a/tools/libs/devicemodel/include/xendevicemodel.h b/tools/libs/devicemodel/include/xendevicemodel.h index b3f600e..9c62bf9 100644 --- a/tools/libs/devicemodel/include/xendevicemodel.h +++ b/tools/libs/devicemodel/include/xendevicemodel.h @@ -236,8 +236,8 @@ int xendevicemodel_track_dirty_vram( uint32_t nr, unsigned long *dirty_bitmap); /** - * This function notifies the hypervisor that a set of domain pages - * have been modified. + * This function notifies the hypervisor that a set of contiguous + * domain pages have been modified. * * @parm dmod a handle to an open devicemodel interface. * @parm domid the domain id to be serviced @@ -250,6 +250,21 @@ int xendevicemodel_modified_memory( uint32_t nr); /** + * This function notifies the hypervisor that a set of discontiguous + * domain pages have been modified. + * + * @parm dmod a handle to an open devicemodel interface. + * @parm domid the domain id to be serviced + * @parm extents an array of extent structs, which each hold + a start_pfn and nr (number of pfns). + * @parm nr the number of extents in the array + * @return 0 on success, -1 on failure. + */ +int xendevicemodel_modified_memory_bulk( +xendevicemodel_handle *dmod, domid_t domid, +struct xen_dm_op_modified_memory_extent extents[], uint32_t nr); + +/** * This function notifies the hypervisor that a set of domain pages * are to be treated in a specific way. (See the definition of * hvmmem_type_t). diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index 2122c45..b5031ce 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -119,56 +119,96 @@ static int set_isa_irq_level(struct domain *d, uint8_t isa_irq, } static int modified_memory(struct domain *d, - struct xen_dm_op_modified_memory *data) + struct xen_dm_op_modified_memory *header, + xen_dm_op_buf_t* buf) { -xen_pfn_t last_pfn = data->first_pfn + data->nr - 1; -unsigned int iter = 0; -int rc = 0; - -if ( (data->first_pfn > last_pfn) || - (last_pfn > domain_get_maximum_gpfn(d)) ) -return -EINVAL; +/* Process maximum of 256 pfns before checking for continuation */ +const un
Re: [Xen-devel] [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.
On 23/03/17 15:58, Jan Beulich wrote: On 22.03.17 at 20:55,wrote: --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -119,56 +119,89 @@ static int set_isa_irq_level(struct domain *d, uint8_t --- a/xen/include/public/hvm/dm_op.h +++ b/xen/include/public/hvm/dm_op.h @@ -237,13 +237,24 @@ struct xen_dm_op_set_pci_link_route { * XEN_DMOP_modified_memory: Notify that a set of pages were modified by * an emulator. * - * NOTE: In the event of a continuation, the @first_pfn is set to the - * value of the pfn of the remaining set of pages and @nr reduced - * to the size of the remaining set. + * DMOP buf 1 contains an array of xen_dm_op_modified_memory_extent with + * @nr_extent entries. The value of @nr_extent will be reduced on + * continuation. + * + * @pfns_processed is used for continuation, and not intended to be usefull + * to the caller. It gives the number of pfns already processed (and can + * be skipped) in the last extent. This should always be set to zero. */ #define XEN_DMOP_modified_memory 11 struct xen_dm_op_modified_memory { +/* IN - number of extents. */ +uint32_t nr_extents; +/* IN - should be set to 0, and is updated on continuation. */ +uint32_t pfns_processed; I'd prefer if the field (if to be kept) wasn't named after its current purpose, nor should we state anything beyond it needing to be zero when first invoking the call. These are implementation details which callers should not rely on. Assuming we keep it, how about calling it "reserved", with a comment in dm.c explaining what its actually for? Elsewhere we use "opaque", but "reserved" is probably fine too. However, we may want to name it as an OUT value, for the error-on-extent indication mentioned above (of course another option would be to make nr_extent IN/OUT). As an OUT, we're free to use it for whatever intermediate value, just so long as upon final return to the caller it has the intended value. (It's debatable whether it should be IN/OUT, due to the need for it to be set to zero.) Having an indication of which extent failed seem a sensible idea. We'd need that parameter to be initially set to something that can represent none of the extents, such that if there is an error before we get to precessing the extents, this is clear. I don't think this is a requirement - failure outside of any extent processing can reasonably be attached to the first extent. The more that the actual processing of the extent (after reading the coordinates from guest memory) can't actually fail. With that observation I'm in fact no longer convinced we really need such an indication - I simply didn't pay attention to this aspect when first suggesting it. The more that your backwards processing would invalidate a common conclusion a caller might draw: Error on extent N means all extents lower than N were processed successfully. So if you wanted to stick to providing this information I'd say use the opaque (or whatever it's going to be named) field to provide that status. Switch back to processing extents forwards, having the opaque field starting out as zero point to the first extent as the failed one initially. Initial processing during continuation handling can't fail unless the caller has fiddled with the hypercall arguments, so I wouldn't see anything wrong with not providing a reliable error indicator in that case. I was mostly considering it, from a debugging perspective. Any failure would be due to bad arguments, which would indicate a serious bug, and trying to continue after such a bug was discovered would seem most unwise. If I copied back the buffer on event of error, then we could state that nr_extent would point to one after extent that had the error, but say it is not defined which if any extents would have succeeded. This would be a trivial change, but aid with any debugging. I can continue to count extents backwards, saving one parameter. I can then have an opaque, which i say nothing about, other then it should be zero. This i'd use for for pfns_processed. How does that sound? -jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.
On 23/03/17 08:35, Jan Beulich wrote: On 22.03.17 at 20:55,wrote: --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -119,56 +119,89 @@ static int set_isa_irq_level(struct domain *d, uint8_t isa_irq, } static int modified_memory(struct domain *d, - struct xen_dm_op_modified_memory *data) + struct xen_dm_op_modified_memory *header, + xen_dm_op_buf_t* buf) { -xen_pfn_t last_pfn = data->first_pfn + data->nr - 1; -unsigned int iter = 0; +/* Process maximum of 256 pfns before checking for continuation */ +const unsigned int cont_check_interval = 0xff; Iirc the question was asked on v1 already: 256 (as the comment says) or 255 (as the value enforces)? aw shucks! I had it as 255, and Paul said "255 or 256?" and I changed it to 256, without thinking, assuming an off by one error, and that he was right. But, with a seconds thought, it should be 255, and Paul was actually referring to a comment later in the code, which was 256. Yet batches of 256 would seem a little more "usual". Ok, I can set cont_check_interval = 0x100; and leave the comment as 256. --- a/xen/include/public/hvm/dm_op.h +++ b/xen/include/public/hvm/dm_op.h @@ -237,13 +237,24 @@ struct xen_dm_op_set_pci_link_route { * XEN_DMOP_modified_memory: Notify that a set of pages were modified by * an emulator. * - * NOTE: In the event of a continuation, the @first_pfn is set to the - * value of the pfn of the remaining set of pages and @nr reduced - * to the size of the remaining set. + * DMOP buf 1 contains an array of xen_dm_op_modified_memory_extent with + * @nr_extent entries. The value of @nr_extent will be reduced on + * continuation. + * + * @pfns_processed is used for continuation, and not intended to be usefull + * to the caller. It gives the number of pfns already processed (and can + * be skipped) in the last extent. This should always be set to zero. */ #define XEN_DMOP_modified_memory 11 struct xen_dm_op_modified_memory { +/* IN - number of extents. */ +uint32_t nr_extents; +/* IN - should be set to 0, and is updated on continuation. */ +uint32_t pfns_processed; I'd prefer if the field (if to be kept) wasn't named after its current purpose, nor should we state anything beyond it needing to be zero when first invoking the call. These are implementation details which callers should not rely on. Assuming we keep it, how about calling it "reserved", with a comment in dm.c explaining what its actually for? Elsewhere we use "opaque", but "reserved" is probably fine too. However, we may want to name it as an OUT value, for the error-on-extent indication mentioned above (of course another option would be to make nr_extent IN/OUT). As an OUT, we're free to use it for whatever intermediate value, just so long as upon final return to the caller it has the intended value. (It's debatable whether it should be IN/OUT, due to the need for it to be set to zero.) Jan Having an indication of which extent failed seem a sensible idea. We'd need that parameter to be initially set to something that can represent none of the extents, such that if there is an error before we get to precessing the extents, this is clear. If we used nr_extents - the caller can check its the same value they passed in. If the value is the same, then the error occurred before processing an extent. If we used error_on_extent (previously known as pfns_processed), we could not use zero, as it might be the first extent to fail. In this case, we'd need to specify it was defaulted to 0x. Should probably define a constant - NO_EXTENT. Internally, we could just invert the value for using as a pfn_processed intermediately. However, both of these ideas suffer a problem, that if a continuation fails before processing the remaining extents, the returned value would be incorrect. For reusing nr_extents, it would appear to be the previous extent - potentially very confusing. Using error_on_extent would give you a totally different value, although very likely, one that is greater then the number of extents passed in. Both of these schemes can be solved by only allowing 31 bit number of extents and pfns, and having the fields signed, to make it more obvious. For both schemes, you could return -extent_nr, where only a negative value indicates and extent number. This would allow you to have the initial value of 0 for error_on_extent. Alternatively, for the error_on_extent scheme, the pfns_processed could be stored as a negative, which would mean that if a negative value got returned it was was no specific extent. The value would need initialised by the caller as -1. An alternative to using signed values would be to use a combination of the two values. If nr_extents == 0, then error_on_extent hold the extent number.
Re: [Xen-devel] [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.
On 22/03/17 10:33, Jan Beulich wrote: On 21.03.17 at 14:59,wrote: This version of this patch removes the need for the 'offset' parameter, by instead reducing nr_extents, and working backwards from the end of the array. This patch also removes the need to ever write back the passed array of extents to the guest, but instead putting its partial progress in a 'pfns_processed' parameter, which replaces the old 'offset' parameter. As with the 'offest' parameter, 'pfns_processed' should be set to 0 on calling. So how is the need for 'pfns_processed' better than the prior need for 'offset'? I continue to not see why you'd need any extra field, as you can write back to 'first_pfn' and 'nr' just like the old code did. (But note that I'm not outright objecting to this solution, I'd first like to understand why it was chosen.) With v1 of the patch, on continuation, two buffers get written back to guest, one to update 'offset', and one to update first_pfn (in the array). Although I can't think of an example where this would be a problem, I think that semi-randomly altering items in the passed array may not be expected, and if that data was re-used for something, could cause a bug. There is precedence for changing the op buffer, and using pfns_processed means we never have to change this array, which I think is much cleaner, intuitive, and halving the copy backs. I considered your suggestion of modifying the handle array, but I think this would make the code much harder to follow, and still require data written back to the guest - just in a different place. I thought just reducing the nr_extents a cleaner solution. I can't see the problem with having an argument for continuation - the xen_dm_op will remain the same size, if we use the argument space or not. If its just about how the API looks, I find this highly subjective - its no secret that continuations happen, and data gets stashed in these parameters - and so i see no reason why having a dedicated parameter for this is a problem. If we want to hide this continuation information, we could define: struct xen_dm_op_modified_memory { /* IN - number of extents. */ uint32_t nr_extents; } struct xen_dm_op_modified_memory modified_memory_expanded { struct xen_dm_op_modified_memory modified_memory; uint32_t pfns_processed } and include both structures in the xen_dm_op union. The caller would use xen_dm_op_modified_memory modified_memory, and due to the memset(, 0, sizeof(op));, the value for pfns_processed would end up as zero. Xen could use expanded structure, and make use of pfns_processed. Or, we could re-purpose the 'pad' member of struct xen_dm_op, call it continuation_data, and have this as a general purpose continuation variable, that the caller wouldn't pay any attention to. What do you think? --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -119,56 +119,89 @@ static int set_isa_irq_level(struct domain *d, uint8_t isa_irq, } static int modified_memory(struct domain *d, - struct xen_dm_op_modified_memory *data) + struct xen_dm_op_modified_memory *header, + xen_dm_op_buf_t* buf) { -xen_pfn_t last_pfn = data->first_pfn + data->nr - 1; -unsigned int iter = 0; +/* Process maximum of 256 pfns before checking for continuation */ +const unsigned int cont_check_interval = 0xff; Iirc the question was asked on v1 already: 256 (as the comment says) or 255 (as the value enforces)? aw shucks! I had it as 255, and Paul said "255 or 256?" and I changed it to 256, without thinking, assuming an off by one error, and that he was right. But, with a seconds thought, it should be 255, and Paul was actually referring to a comment later in the code, which was 256. int rc = 0; if ( copy_from_guest_offset(, buf->h, rem_extents - 1, 1) ) +return -EFAULT; + +pfn = extent.first_pfn + header->pfns_processed; +batch_nr = extent.nr - header->pfns_processed; Even if this looks to be harmless to the hypervisor, I dislike the overflows you allow for here. I'll add a check for (header->pfns_processed > extent.nr), returning -EINVAL. @@ -441,13 +474,8 @@ static int dm_op(domid_t domid, struct xen_dm_op_modified_memory *data = _memory; -const_op = false; - -rc = -EINVAL; -if ( data->pad ) -break; Where did this check go? data->pad is undefined here. But I could add a check for extent.pad in the modified_memory function. --- a/xen/include/public/hvm/dm_op.h +++ b/xen/include/public/hvm/dm_op.h @@ -237,13 +237,24 @@ struct xen_dm_op_set_pci_link_route { * XEN_DMOP_modified_memory: Notify that a set of pages were modified by * an emulator. * - * NOTE: In the event of a continuation, the @first_pfn is set to the - * value of the
[Xen-devel] [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.
From: Jennifer Herbert <jennifer.herb...@citrix.com> This new lib devicemodel call allows multiple extents of pages to be marked as modified in a single call. This is something needed for a usecase I'm working on. The xen side of the modified_memory call has been modified to accept an array of extents. The devicemodle library either provides an array of length 1, to support the original library function, or a second function allows an array to be provided. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Paul Durrant <paul.durr...@citrix.com> --- This version of this patch removes the need for the 'offset' parameter, by instead reducing nr_extents, and working backwards from the end of the array. This patch also removes the need to ever write back the passed array of extents to the guest, but instead putting its partial progress in a 'pfns_processed' parameter, which replaces the old 'offset' parameter. As with the 'offest' parameter, 'pfns_processed' should be set to 0 on calling. tools/libs/devicemodel/core.c | 30 +-- tools/libs/devicemodel/include/xendevicemodel.h | 19 +++- xen/arch/x86/hvm/dm.c | 110 ++- xen/include/public/hvm/dm_op.h | 17 +++- 4 files changed, 122 insertions(+), 54 deletions(-) diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c index a85cb49..f9e37a5 100644 --- a/tools/libs/devicemodel/core.c +++ b/tools/libs/devicemodel/core.c @@ -434,22 +434,36 @@ int xendevicemodel_track_dirty_vram( dirty_bitmap, (size_t)(nr + 7) / 8); } -int xendevicemodel_modified_memory( -xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn, -uint32_t nr) +int xendevicemodel_modified_memory_bulk( +xendevicemodel_handle *dmod, domid_t domid, +struct xen_dm_op_modified_memory_extent *extents, uint32_t nr) { struct xen_dm_op op; -struct xen_dm_op_modified_memory *data; +struct xen_dm_op_modified_memory *header; +size_t extents_size = nr * sizeof(struct xen_dm_op_modified_memory_extent); memset(, 0, sizeof(op)); op.op = XEN_DMOP_modified_memory; -data = _memory; +header = _memory; -data->first_pfn = first_pfn; -data->nr = nr; +header->nr_extents = nr; +header->pfns_processed = 0; -return xendevicemodel_op(dmod, domid, 1, , sizeof(op)); +return xendevicemodel_op(dmod, domid, 2, , sizeof(op), + extents, extents_size); +} + +int xendevicemodel_modified_memory( +xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn, +uint32_t nr) +{ +struct xen_dm_op_modified_memory_extent extent; + +extent.first_pfn = first_pfn; +extent.nr = nr; + +return xendevicemodel_modified_memory_bulk(dmod, domid, , 1); } int xendevicemodel_set_mem_type( diff --git a/tools/libs/devicemodel/include/xendevicemodel.h b/tools/libs/devicemodel/include/xendevicemodel.h index b3f600e..9c62bf9 100644 --- a/tools/libs/devicemodel/include/xendevicemodel.h +++ b/tools/libs/devicemodel/include/xendevicemodel.h @@ -236,8 +236,8 @@ int xendevicemodel_track_dirty_vram( uint32_t nr, unsigned long *dirty_bitmap); /** - * This function notifies the hypervisor that a set of domain pages - * have been modified. + * This function notifies the hypervisor that a set of contiguous + * domain pages have been modified. * * @parm dmod a handle to an open devicemodel interface. * @parm domid the domain id to be serviced @@ -250,6 +250,21 @@ int xendevicemodel_modified_memory( uint32_t nr); /** + * This function notifies the hypervisor that a set of discontiguous + * domain pages have been modified. + * + * @parm dmod a handle to an open devicemodel interface. + * @parm domid the domain id to be serviced + * @parm extents an array of extent structs, which each hold + a start_pfn and nr (number of pfns). + * @parm nr the number of extents in the array + * @return 0 on success, -1 on failure. + */ +int xendevicemodel_modified_memory_bulk( +xendevicemodel_handle *dmod, domid_t domid, +struct xen_dm_op_modified_memory_extent extents[], uint32_t nr); + +/** * This function notifies the hypervisor that a set of domain pages * are to be treated in a specific way. (See the definition of * hvmmem_type_t). diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index 2122c45..c90af64 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -119,56 +119,89 @@ static int set_isa_irq_level(struct domain *d, uint8_t isa_irq, } static int modified_memory(struct domain *d, - struct xen_dm_op_modified_memory *data) +
[Xen-devel] [PATCH] dm_op: Add xendevicemodel_modified_memory_bulk.
From: Jennifer Herbert <jennifer.herb...@citrix.com> This new lib devicemodel call allows multiple extents of pages to be marked as modified in a single call. This is something needed for a usecase I'm working on. The xen side of the modified_memory call has been modified to accept an array of extents. The devicemodle library either provides an array of length 1, to support the original library function, or a second function allows an array to be provided. Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com> --- Cc: Jan Beulich <jbeul...@suse.com> Cc: Ian Jackson <ian.jack...@eu.citrix.com> Cc: Wei Liu <wei.l...@citrix.com> Cc: Andrew Cooper <andrew.coop...@citrix.com> Cc: Paul Durrant <paul.durr...@citrix.com> --- tools/libs/devicemodel/core.c | 30 -- tools/libs/devicemodel/include/xendevicemodel.h | 20 +++- xen/arch/x86/hvm/dm.c | 117 +++ xen/include/public/hvm/dm_op.h | 12 ++- 4 files changed, 125 insertions(+), 54 deletions(-) diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c index a85cb49..1f7a9dc 100644 --- a/tools/libs/devicemodel/core.c +++ b/tools/libs/devicemodel/core.c @@ -434,22 +434,36 @@ int xendevicemodel_track_dirty_vram( dirty_bitmap, (size_t)(nr + 7) / 8); } -int xendevicemodel_modified_memory( -xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn, -uint32_t nr) +int xendevicemodel_modified_memory_bulk( +xendevicemodel_handle *dmod, domid_t domid, +struct xen_dm_op_modified_memory_extent *extents, uint32_t nr) { struct xen_dm_op op; -struct xen_dm_op_modified_memory *data; +struct xen_dm_op_modified_memory *header; +size_t extents_size = nr * sizeof(struct xen_dm_op_modified_memory_extent); memset(, 0, sizeof(op)); op.op = XEN_DMOP_modified_memory; -data = _memory; +header = _memory; -data->first_pfn = first_pfn; -data->nr = nr; +header->nr_extents = nr; +header->offset = 0; -return xendevicemodel_op(dmod, domid, 1, , sizeof(op)); +return xendevicemodel_op(dmod, domid, 2, , sizeof(op), + extents, extents_size); +} + +int xendevicemodel_modified_memory( +xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn, +uint32_t nr) +{ +struct xen_dm_op_modified_memory_extent extent; + +extent.first_pfn = first_pfn; +extent.nr = nr; + +return xendevicemodel_modified_memory_bulk(dmod, domid, , 1); } int xendevicemodel_set_mem_type( diff --git a/tools/libs/devicemodel/include/xendevicemodel.h b/tools/libs/devicemodel/include/xendevicemodel.h index b3f600e..b88d73c 100644 --- a/tools/libs/devicemodel/include/xendevicemodel.h +++ b/tools/libs/devicemodel/include/xendevicemodel.h @@ -236,8 +236,8 @@ int xendevicemodel_track_dirty_vram( uint32_t nr, unsigned long *dirty_bitmap); /** - * This function notifies the hypervisor that a set of domain pages - * have been modified. + * This function notifies the hypervisor that a set of contiguous + * domain pages have been modified. * * @parm dmod a handle to an open devicemodel interface. * @parm domid the domain id to be serviced @@ -249,6 +249,22 @@ int xendevicemodel_modified_memory( xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn, uint32_t nr); + +/** + * This function notifies the hypervisor that a set of discontiguous + * domain pages have been modified. + * + * @parm dmod a handle to an open devicemodel interface. + * @parm domid the domain id to be serviced + * @parm extents an array of extent structs, which each hold + a start_pfn and nr (number of pfns). + * @parm nr the number of extents in the array + * @return 0 on success, -1 on failure. + */ +int xendevicemodel_modified_memory_bulk( +xendevicemodel_handle *dmod, domid_t domid, +struct xen_dm_op_modified_memory_extent extents[], uint32_t nr); + /** * This function notifies the hypervisor that a set of domain pages * are to be treated in a specific way. (See the definition of diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c index 2122c45..0c4a820 100644 --- a/xen/arch/x86/hvm/dm.c +++ b/xen/arch/x86/hvm/dm.c @@ -119,56 +119,96 @@ static int set_isa_irq_level(struct domain *d, uint8_t isa_irq, } static int modified_memory(struct domain *d, - struct xen_dm_op_modified_memory *data) + struct xen_dm_op_modified_memory *header, + xen_dm_op_buf_t* buf) { -xen_pfn_t last_pfn = data->first_pfn + data->nr - 1; -unsigned int iter = 0; -int rc = 0; +/* Process maximum of 255 pfns before checking for continuation */ +const uint32_t max_pfns = 0xff; -if ( (data->first_pfn > last_pfn) || - (last_pfn > domain_get_maximum_gpf
Re: [Xen-devel] QEMU XenServer/XenProject Working group meeting 10th February 2017
QEMU XenServer/XenProject Working group meeting 10th February 2017 == Attendance -- Andrew Cooper Ian Jackson Paul Durrent Jennifer Herbert George Dunlap Igor Druzhinin Sergey Dyasli First topic: DMop - After an introduction, Paul started by describing his recent progress with Dmop. He has posted a privcmd series, including the dmop hypervisor parts. This adds the dmop ioctl and the restriction ioctl. This restrict ioctl, restricts access of the dmop to that of the given domain, and prevents further access via the privcmd. Paul has also written all the code for libdevicemodel - 4 patches. This code is linux specific, and will detect if its running on an OS it doest support (either old linux or windows), or where the privcmd driver doesnt support the dm ioctl. He next intends add a compatibility layer to libxenctl, which reroutes old xc calls to the new way. This would make the make calls switch transparently, but you'd know at the time you made the restriction call, if it succeeded or not, and can use local policy to determine if to continue or not. Following this, QEMU needs to be ported onto this new library. Next was a lengthy discussion QEMU's mapcahe, and on how to secure mmap calls. It was eventual concluded that Paul could just extend his current patches to also include the mmap calls. It was asked if using this library would mean having two fds, but this was said to be only true if you continued to keep xenctl open – which you shouldn’t since the new restricted ioctl fd should be able to do everything you might need the old one for. Paul detailed how the privcmd driver now has a domid associated with each fd, which defaults to 'invalid'. The restrict ioctl sets this domid, and from then on, every call from that fd is checked against that domain. The restrict ioctl won't allow the domid to be changed again later. It was suggested other calls should be checked, but Paul didn't think there where any others not already discussed. Next topic: How to prevent QEMU running as root --- There is a patch to so make QEMU change away from root user. This is expected to make a number of things stop working: * CD-ROM insert - Eject is expected to work, if you could insert one. * Emulated hotplug - unsure. * PCI-passthough – This is deemed out of scope since pci-passthough inherently unsafe anyway. It should give a useful error message suggesting turning off de-priv. This could be made to work in future. It is thought that some of these problems could be solved by passing fds over QMP. Furthermore, it was thought that DavidV. had looked, and found you could already do this. If so, only toolstack changes would be needed. Making emulated network hotplug or IDE hotplug work is probably not important. NOTE: Need to check QMP and it applicability to this problem. Next Topic: Xenstore Juergen has a proposal for xenstored, to have something similar to dm-op ioctl. Each command can have a prefix which requests it's executed with a specified domain security context. Its also necessary to run the multilpe QEMUs, such that the PV back-ends can be run in a separate instance from the emulators. In doing this, you also need to separate out the xenstore paths, such that each QEMU uses its own xenstore branch. Part of the reason is that the backend paths that you are using to control the emulated deprivileged QEMU, have to be writeable by the guest. This is because the emulated QEMU is going to have to write to them. The emulator patch ID stuff is already upstream – and can be invoked. But toostack has to be able to cope. Ian wrote a LibXL patch series (20~30 patches) 18 months ago, but that has bit rotted. This patch was said to be surprisingly long. Andrew is concerned that giving a guest permission to write to the xenstore paths used by its QEMU may cause a significant security issue, while others think it would merely offer a new way for it to crash itself. Andrew to inspect Ian's patch series to evaluate this risk. There are only a hand full of xenstore paths needed by the emulated QEMU – which will only drop as QMP use increased. Another thing to consider is you'll have to be able to talk to multiple QEMU's via QMP – knowing which is which! There was a suggestion that you could differentiate between QEMU pv and QEMU emulated by how your communicating with it – but this wouldn't really be possible, as QEMU upstream won't accept anything using xenstore, if it might be useful to more then just the xen community. XenServer is currently working towards moving to QEMU upstream, with it's core functionally currently working using temporary python shims, translating the API calls. If depriv is unstreamed in time, XenServer won't have to transition the APIs twice
[Xen-devel] QEMU XenServer/XenProject Working group meeting 29th September 2016
QEMU XenServer/XenProject Working group meeting 29th September 2016 === Attendees - David Vrabel Jennifer Herbert Ian Jackson Andrew Cooper Paul Durrant Lars Kurth QEMU depriv === DMOP There has been agreement on list on the DMOP proposal. The HVMCTL patch series, which was proposed should need only mechanical changes to use it as a basis for DMOP. Privcmd --- The privcmd changes should be fairly trivial to implement. Libxc would need changing, but this code is also in the HVMCTL patch series. This mean only thing needed for QEMU it to call the restrict ioctl, to enable it. If restrict ioctl missing, an error would be returned. QEMU would probably want an option to it, to indicate de-priv is required. Given this option, the QEMU would raise an error if the restrict ioctl was not present. In order to avoid accidents due to ABI instability, old DMOP numbers would be retired when a DMOP in changed in an ABI-incompatible way - there is no shortage of new DMOP numbers. Eventchan - Eventchan has resections in 4.7, but the libxc parts need to be done. This should not be much work. XenStore For the non-pv part of QEMU, XenStore is only used in two places. There is the DM state, and the physmap mechanism. Although there is a vague plan for replacing the physmap mechanism, it is some way off. The DM state key is used for knowing when the qemu process is running etcetera, QMP would seem to be an option to replace it - however there is no (nice) way to wait on a socket until it has been opened. One solution might be to use Xenstore to let you know the QMP sockets where available, before QEMU drops privileges, and then QMP could be used to know QEMU is in the running state. To avoid the need to use xs-restrict, you would need to both replace physmap and rework qemu startup procedure. The use of xs-restrict would be more expedient, and does not look to need that much work. Discussion was had over how secure it would be to allow a guest access to these Xenstore keys - it was concluded that a guest could mostly only mess itself up. If I guest attempted to prevent itself from being migrated, the tool stack time it out, and could kill it. There followed a discussion on the Xenbus protocol, and additions needed. The aim is to merely restrict the permission for the command, to that of the guest who's domID you provide. It was proposed that it uses the header as is, with its 16 bytes, with the command 'one-time-restrict' , and then the payload would have two additional field at the start. These two field would correspond to the domid to restrict as, and the real command. Transaction ID and tags would be taken from the real header. Although inter domain xs-restrict is not specifically needed for this project, it is thought it might be a blocking items for upstream acceptance. It it thoughts these changes would not require that much work to implement, and may be useful in use use cases. Only a few changes to QEMU would be needed, and libxl should be able to track QEMU versions. Ian Jackson volunteered to look at this, with David helping with the kernel bits. Ian won't have time to look at this until after Xen 4.8 is released. There discussion about what may fail once privileges are taken away, which would include CDs and PCI pass though. It is thought the full list can only be known by trying. Not everything needs to work for acceptance upstream, such as PCI pass though. If such an incompatible feature is needed, restrictions can be turned off. These problems can be fixed in a later phase, with CDs likely being at teh top of the list. Action items Hypervisor bits really needed first, but can't be done until 4.8 has been. Ian to look at the Xenstore items David is to look at the kernel items. Paul is to audit the HVMops, checking parameters etc; It is too late to get this in 4.8, but it is desired to get this in early into 4.9 so that there can be a period of stabilisation. With the release of 4.8 imminent, little work will happen until after that. However Paul, David and Ian are asked to have a think about their respective areas, and have a plan for when they can be done. They are welcome to start tackling them if they have time. disaggregation = A disaggregation proposal which had previously been posted to a QEMU forum was discussed. It was not previously accepted by all. The big question was how to separate the device models from the machine, with a particular point of contention being around PIIX and the idea of starting a QEMU instance without one. The general desire from us is we want to have a specific device emulated and nothing else. It is suggested you would have a software interface between each device that looked a software version of PCI. The PIIX device could be attached to CPU this pseudo PCI interface. This would fit in well
[Xen-devel] XenProject/XenServer QEMU working group minutes, 30th August 2016
QEMU XenServer/XenProject Working group meeting 30th August 2016 Attendance -- Andrew Cooper Ian Jackson Paul Durrant David Vrabel Jennifer Herbert Introduction Introduced Paul Durrant to the working group. Started by recapping our purpose: A way to make it possible for qemu to be able to make hypercalls without too much privilege, in a way which is up streamable.Dom0 guest must not be able abuse interface to compromise dom0 kernel. QEMU Hypercalls – DM op --- Has been much discussion on XenDevel - a problem identified is when you have operations with references to other user memory objects, such as with Track dirty VRAM (As used with the VGA buffer) At the moment, apparently there is only that one, but others may emerge. Most obvious solution would involve the guest kernel validating the virtual address passed, however that would would rely on the guest kernel knowing where to objects where. This is to be avoided. Ian recounts how there where variose proposals, on XenDevel involving, essentially, informing the hypervisor, or some way providing the information about which virtual addresses where being talked about by the hypercall to the hypervisor. Many of these involved this information being transmitted via a different channel. Ian suggest the idea provide a way for the kernel to tell the hypervisor is user virtual rages, dm op allowed memory. And there would be a flag, in the dm op, in a fixed location, that would tell the hypervisor, this only talks about special dm pre-approved memory. A scheme of pre-nominating an area in QEMU, maybe using hypercall buffers is briefly discussed,as well as a few other ideas, but concludes that doesn’t really address the problem of future DM ops – of which there could easily be. Even if we can avoid the problem by special cases for our current set-up, we still need a story for how to add future interfaces with handles, without the need to change the kernel interface. Once we come up with story, we wouldn't necessarily have to implement it. The concept of using physical addressed hypercall buffers was discussed. Privcmd could allocate you a place, and mmap it into user ram, and this is the only memory that would be used with the hypercalls. A hypercall would tell you the buffer range. Each qemu would need to be associated with the correct set of physical buffers. A recent AMD proposal was discussed, which would use only physical addresses, no virtual address. The upshot being we should come up with a solution that is not incompatible this. Ideas further discussed: User code could just put stuff in mmaped memory, and only refer to offset within that buffer. The privcmd driver would fill in physical details. All dm ops would have 3 arguments: dm op, pointer to to struct, and optional pointer to restriction array – the last of which is filled in my privcmd driver. It is discussed how privcmd driver must not look at the dm op number, in particular, to know how to validate addresses, as it must be independent from the API. A scheme where qemu calls an ioctl before it drops privileges, to set up restrictions ahead of time, is discussed. One scheme may work by setting up a rang for a given domain or VCPU. The assumption is that all device model, running in the same domain, have the same virtual address layout. Then there would be a flag, in the stable bit of the api, if to apply that restriction - any kernel dm op would not apply that restriction. The idea can be extended – to have more one address range, or can have range explicitly provided in the hypercall. This latter suggestion is preferred, however each platform would have different valid address ranges, and privcmd is platform independent. Its discussed how a function could be created to return valid rages for your given platform, but this is not considered a element solution. The third parameter of the dm op could be array of ranges, where common case for virtual addresses may be 0-3GB, but for physical addresses, it might be quite fragmented. A further ideas is proposed to extend the dm op, to have a fixed part, to have an array of guest handles, the kernel can audit. The arguments would be: Arg1: Dom ID: Arg2: Guests handle array of tuples(address, size) Arg3: Number guest handles. The first element of the array could be the DM op structure itself, containing the DM Op code, and othe argument to the perticular op. The Privcmd driver would only pass though what is provided by the user. Any extra elements would be ignored by the hypercall, and if there where insufficient, the hypercall code would see a NULL, and be able to gracefully fail. The initial block (of dm arguments) passed in the array would be copied into pre-zeroed memory of max op size, having checked the size is not greater then this. No need to check minimum, buffer initialised to zero, so zero length
Re: [Xen-devel] Device model operation hypercall (DMOP, re qemu depriv)
On 01/08/16 12:32, Ian Jackson wrote: I think we need to introduce a new hypercall (which I will call DMOP for now) which may augment or replace some of HVMCTL. Let me explain: I believe the new 'DMOP' hypercall is a good idea, but following on from discussions, I propose a revised design, which I present below. Please let me know what you think. Thanks, Jenny. DMOP (multi-buffer variant) Introduction A previous proposal for a 'DMOP' was put forward by Ian Jackson on the 1st of August. This proposal seem very promising, however a problem was identified with it, that this proposal addresses. The aim of DMOP, as before, is to prevent a compromised device model from compromising domains other then the one it is associated with. (And is therefore likely already compromised) The previous proposal adds a DMOP hypercall, for use by the device models, which places the domain ID in a fixed place, within the calling args, such that the priv driver can always find it, and not need to know any further details about the particular DMOP in order to validate it against previously set (vie ioctl) domain. The problem occurs when you have a DMOP with references to other user memory objects, such as with Track dirty VRAM (As used with the VGA buffer). Is this case, the address of this other user object needs to be vetted, to ensure it is not within a restricted address ranges, such as kernel memory. The real problem comes down to how you would vet this address - the idea place to do this is within the privcmd driver, since it would have knowledge of the address space involved. However, with a principal goal of DMOP to allow privcmd to be free from any knowledge of DMOP's sub ops, it would have no way to identify any user buffer addresses that need checking. The alternative of allowing the hypervisor to vet the address is also problematic, since it has no knowledge of the guest memory layout. The Design -- As with the previous design, we provide a new restriction ioctl, which takes a domid parameter. After that restriction ioctl is called, the privcmd driver will permit only DMOP hypercalls, and only with the specified target domid. In the previous design, a DMOP consisted of one buffer, containing all of the DMOP parameters, which may include further explicit references to more buffers. In this design, an array of buffers with length is presented, with the first one containing the DMOP parameters, which could implicitly reference to further buffers from within in the array. Here, the only user buffers passed, are that found with the array, and so all can be audited from privcmd. With the passing of the length of the buffers array, privcmd does not need to know which DMOP it is to audit them. If the hypervisor ends up with the wrong number of buffers, it can reject the DMOP at that point. The following code illustrates this idea: typedef struct dm_op_buffer { XEN_GUEST_HANDLE(void) h; size_t len; } dm_op_buffer_t; int HYPERVISOR_device_model_op( domid_t domid, unsigned int nr_buffers, XEN_GUEST_HANDLE_PARAM(dm_op_buffer_t) buffers) @domid: the domain the hypercall operates on. @nr_buffers; the number of buffers in the @buffers array. @buffers: an array of buffers. @buffers[0] contains device_model_op - the structure describing the sub-op and its parameters. @buffers[1], @buffers[2] etc. may be used by a sub-op for passing additional buffers. struct device_model_op { uint32_t op; union { struct op_1 op1; struct op_2 op2; /* etc... */ } u; }; It is forbidden for the above struct (device_model_op) to contain any guest handles - if they are needed, they should instead be in HYPERVISOR_device_model_op->buffers. Validation by privcmd driver -- If the privcmd driver has been restricted to specific domain (using a new ioctl), when it received an op, it will: 1. Check hypercall is DMOP. 2. Check domid == restricted domid. 3. For each @nr_buffers in @buffers: Check @h and @len give a buffer wholey in the user space part of the virtual address space. (e.g., on Linux use access_ok()). Xen Implementation -- Since a DMOP sub of may need to copy or return a buffer from the guest, as well as the DMOP itself to git the initial buffer, functions for doing this would be written as below. Note that care is taken to prevent damage from buffer under or over run situations. If the DMOP is called with insufficient buffers, zeros will be read, while extra is ignored. int copy_dm_buffer_from_guest( void *dst,/* Kernel destination buffer */ size_t max_len, /* Size of destination buffer */ XEN_GUEST_HANDLE_PARAM(dm_op_buffer_t) buffers, /* dm_op_buffers passed into DMOP */ unsigned int nr_buffers, /* total number of dm_op_buffers */
[Xen-devel] XenProject/XenServer QEMU working group, Friday 8th July, 2016, 15:00.
XenProject/XenServer QEMU working group, Friday 8th July, 2016, 15:00. Date and Attendees == XenProject/XenServer QEMU working group, held on Friday the 8th of July, 2016 at 15:00. The Following were present: Ian Jackson [platform team] David Vrabel [ring0] Andrew Cooper [ring0] Jennifer Herbert [ring0] Purpose of meeting == Both XenServer (currently using qemu-trad with de-priv) and XenProject (using QEMU (upstream) without dep-priv), would like to move to using QEMU (upstream) de-privelaged. This meeting was intended to restart XenServer/ring0 and XenProject's collaboration. Agenda includes: * Discuss requirements * Review the our status and strategy on achieving these requirements. * Agree next steps. Meeting Actions === Ian: Find all the Xenstore keys used by QEMU, and evaluate how much work it would be to either read before privileges are dropped, or otherwise stop reading/writing them when de-privelaged. Ian: Write up DM_Opp design (discussed during meeting), talk to Jan about this. XenServer: Go though All the priv-cmd ops, check how they would fit within the dm-opp design. David: Post Event channel / priv command patch to upstream. (Now done) Meeting Transcript - abridged = Discussed Goals --- Everyone agrees we want to stop anything gaining control of QEMU from also gaining access to the platform. XenProject would like to use depriv by default. Should not preclude other configurations such as PCI pass though. Ian: There is a project in progress to use stub domains with QEMU – would expect a user to run depriv or stubdoms, but not at the same time. XS: states that is does not want to go down the stub domains route, as is not considered scalable. (Boot storms etc;) Ian: In upstream we expect to pursue both depriv qemu, and stub qemu, as implementation strategies. Depriv qemu is probably deliverable sooner. Stub qemu is more secure but not suitable for all users (eg, maybe not for XenServer as David suggested). Going though list technology areas listed in document shared beforehand. Disk / Nsetwork / Host IO - Can be addressed by running as unprivileged user. Andrew: Rather linux centric - not posix solution for all these problem: Ian: There are similar interfaces on other platforms. In upstream we expect to at least be able to run as an unprivileged user, which is indeed portable. Mem map XenServer has a solution. Ian: Would like to see this shared immediately. Andrew: Consider rest of design first. HyperCalls -- Three options: 1: XSM 2: Fix ABIan: Re-arrange priv-cmd to make it easier to parse and restrict. 3: Loadable pattern matching solution. All agree this would be horrible to maintain. Ian: Difference between not having Stable API, and the instability of ABI stopping the determination of domain. Ian: Version compatibility out of scope. Andrew: Disagree, should do both together. Andrew: Requirement: – no interdependence between QEMU and the kernel, such that you need to update both together. All: Agree xen interface should not be embedded in the kernel. Jen: Asked why Ian was against using XSM. Ian: XSM wholesale replace existing checks, and current XSM polices are unproven. Andrew: XS is experimenting, and there are problems. Ian & Andrew: Agree would need auditing. Ian: Reluctant to link xsm to this work. David: Can do both – XSM and 'Fix ABI'. Ian: With both, wouldn't need the nesting. David: Would still need to nesting to wrap domains. Ian: Doesn't have to do all …. David: David: (wearing a linux hat): Sceptical for stable API – don't want to have a new kernel when API new features added. Ian: Suggest ideas of DM-Space – D-ctrl space. David: Would need to be a strong component. Both ends would want commitment ~ no snow flakes. Ian: Shouldn't be a ploblem. Andrew: PVH would relpcaes QEMU. Ian: Not relevant. Idea of DM-ops is discussed more. Dm-op restricted to domain No need to version this. Important fields (domain) fixed, such that DM-ops can be added changed, without affecting kernel interface. Andrew bring up XenGT. Is this considered part of QEMU. QEMU shouldn't set pci bars. Question if DM-OP would just be HVM-OP. David: Should enumerate hyper-calls, make sure in DM-ops. Andrew & Ian : would need to discuss with Jan. Some things may need to be in both dm-op & other things, ie guest interface. - This is ok. dm-ops is tentatively agreed on as a way forward. XenStore Restrict. -- Jen : Upstream discussing maybe dropping sockets. (Necessary for restrict) David: Problem, is it gives the same permissions as the guest – does this restrict too much? Should look though all uses, see if they can be used before it drops privileges. Ian: Already looke
Re: [Xen-devel] [PATCHv1 3/4] spinlock: shrink struct lock_debug
On 18/12/15 16:58, Jan Beulich wrote: On 18.12.15 at 15:09,wrote: --- a/xen/common/spinlock.c +++ b/xen/common/spinlock.c @@ -16,7 +16,7 @@ static atomic_t spin_debug __read_mostly = ATOMIC_INIT(0); static void check_lock(struct lock_debug *debug) { -int irq_safe = !local_irq_is_enabled(); +s16 irq_safe = !local_irq_is_enabled(); if ( unlikely(atomic_read(_debug) <= 0) ) return; I can't figure out why this odd looking change is needed. Jan This patch shrinks struct lock_debug's irq_safe member. Since you end up with many instances of this embedded in other structures such as struct domain, this become a useful saving in memory. At one point, I had an issue with struct domain exceeding 4k, which caused trouble. Given the change to struct_lock debug, this local variable irq_safe, is compared with it, and used in cmpxchg with it it, and therefore should match type. - Jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv1 4/4] spinlock: add fair read-write locks
Hi Jan, Thank you for your comments. Sorry for the slow response - xmas and all... Responses below... On 22/12/15 13:54, Jan Beulich wrote: +/** + * rspin_until_writer_unlock - inc reader count & spin until writer is gone Stale comment - the function doesn't increment anything. Also throughout the file, with Linux coding style converted to Xen style, comment style should be made Xen-like too. Oh yes, missed that - will fix. +/* + * Readers come here when they cannot get the lock without waiting + */ +if ( unlikely(in_irq()) ) +{ +/* + * Readers in interrupt context will spin until the lock is + * available without waiting in the queue. + */ +smp_rmb(); +cnts = atomic_read(>cnts); +rspin_until_writer_unlock(lock, cnts); +return; +} I can't immediately see the reason for this re-introduction of unfairness - can you say a word on this, or perhaps extend the comment? We haven't found a reason this was introduced to Linux, but assume this was to reduce interrupt latency. I had thought to leave it there, in case we want to use it in the future, but now feel it would probably better to remove it, and deal with any irq related issues, if and when we use rw locks from irq handlers. -x = lock->lock & ~RW_WRITE_FLAG; -} -preempt_disable(); -} - -void _write_lock_irq(rwlock_t *lock) -{ -uint32_t x; +cnts = atomic_read(>cnts); +if ( !(cnts & _QW_WMASK) && + (atomic_cmpxchg(>cnts, cnts, + cnts | _QW_WAITING) == cnts) ) +break; Considering that (at least on x86) cmpxchg is relatively expensive, and that atomic OR would be carried out by some cmpxchg-like mechanism on most other arches anyway, can't this be an atomic OR, followed by a read to check for another active writer? I wanted to port know proven code. We have now run this code in xen for quite a while, and while I think you may well be correct, I'd rather get this version in, and then consider further optimisation testing in future patches. +unlock: +spin_unlock(>lock); Labels indented by at least one space please. Also - are you using any of the static functions in spinlock.c? If not, putting the rwlock code in a new rwlock.c would help review quite a bit, since code removal and code addition would then be separate rather than intermixed. Great idea. I think the reasons for keeping them together must just be historical. I'll try and split that out. +/* + * Writer states & reader shift and bias + */ +#define_QW_WAITING 1 /* A writer is waiting */ +#define_QW_LOCKED 0xff/* A writer holds the lock */ +#define_QW_WMASK0xff/* Writer mask */ +#define_QR_SHIFT8 /* Reader count shift */ +#define_QR_BIAS (1U << _QR_SHIFT) Is there a particular reason for the 8-bit writer mask (a 2-bit one would seem to suffice)? You picked up on optimisation in your later comment - I should add a comment here. +static inline int _write_trylock(rwlock_t *lock) +{ +u32 cnts; + +cnts = atomic_read(>cnts); +if ( unlikely(cnts) ) +return 0; + +return likely(atomic_cmpxchg(>cnts, + cnts, cnts | _QW_LOCKED) == cnts); The | is pointless here considering that cnts is zero. I theorised that this was like this to aid the readability of the code, although I don't know if it does. I'd happily change it over, and replace the cnts with 0s. +static inline void _write_unlock(rwlock_t *lock) +{ +/* + * If the writer field is atomic, it can be cleared directly. + * Otherwise, an atomic subtraction will be used to clear it. + */ +atomic_sub(_QW_LOCKED, >cnts); +} Ah, I guess the comment here is the explanation for the 8-bit write mask. Yes. A comment on the declaration would no doubt help too. +static inline int _rw_is_write_locked(rwlock_t *lock) +{ +return atomic_read(>cnts) & _QW_WMASK; +} This returns true for write-locked or writer-waiting - is this intended? It was, but having thought about it a bit more, it would only have been "more useful" like this in unsafe (or at lest ugly) code, and so for that reason I should change it. I don't think this would have affected the safe cases. Jan Cheers -jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V2] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()
Unlikely that it may seem localtime_r could fail, which would result in a null pointer dereference. In this case, it shoud log the errno, (instead of the date/time), and and continue its logging, as this is still useful. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xtl_logger_stdio.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/tools/libxc/xtl_logger_stdio.c b/tools/libxc/xtl_logger_stdio.c index d8646e0..5569c50 100644 --- a/tools/libxc/xtl_logger_stdio.c +++ b/tools/libxc/xtl_logger_stdio.c @@ -61,10 +61,13 @@ static void stdiostream_vmessage(xentoollog_logger *logger_in, struct tm lt_buf; time_t now = time(0); struct tm *lt= localtime_r(now, lt_buf); -fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s , -lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday, -lt-tm_hour, lt-tm_min, lt-tm_sec, -tzname[!!lt-tm_isdst]); +if (lt != NULL) +fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s , +lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday, +lt-tm_hour, lt-tm_min, lt-tm_sec, +tzname[!!lt-tm_isdst]); +else +fprintf(lg-f, [localtime_r failed: %d] , errno); } if (lg-flags XTL_STDIOSTREAM_SHOW_PID) fprintf(lg-f, [%lu] , (unsigned long)getpid()); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V3] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()
Unlikely that it may seem localtime_r could fail, which would result in a null pointer dereference. In this case, it shoud log the errno, (instead of the date/time), and and continue its logging, as this is still useful. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xtl_logger_stdio.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/tools/libxc/xtl_logger_stdio.c b/tools/libxc/xtl_logger_stdio.c index d8646e0..b28ef73 100644 --- a/tools/libxc/xtl_logger_stdio.c +++ b/tools/libxc/xtl_logger_stdio.c @@ -61,10 +61,13 @@ static void stdiostream_vmessage(xentoollog_logger *logger_in, struct tm lt_buf; time_t now = time(0); struct tm *lt= localtime_r(now, lt_buf); -fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s , -lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday, -lt-tm_hour, lt-tm_min, lt-tm_sec, -tzname[!!lt-tm_isdst]); +if (lt != NULL) +fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s , +lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday, +lt-tm_hour, lt-tm_min, lt-tm_sec, +tzname[!!lt-tm_isdst]); +else +fprintf(lg-f, [localtime_r failed: %d] , errno); } if (lg-flags XTL_STDIOSTREAM_SHOW_PID) fprintf(lg-f, [%lu] , (unsigned long)getpid()); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V2] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()
I guess I didn't build what I thought I built. V3 coming up On 07/07/15 16:51, Andrew Cooper wrote: On 07/07/15 16:48, Jennifer Herbert wrote: Unlikely that it may seem localtime_r could fail, which would result in a null pointer dereference. In this case, it shoud log the errno, (instead of the date/time), and and continue its logging, as this is still useful. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xtl_logger_stdio.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/tools/libxc/xtl_logger_stdio.c b/tools/libxc/xtl_logger_stdio.c index d8646e0..5569c50 100644 --- a/tools/libxc/xtl_logger_stdio.c +++ b/tools/libxc/xtl_logger_stdio.c @@ -61,10 +61,13 @@ static void stdiostream_vmessage(xentoollog_logger *logger_in, struct tm lt_buf; time_t now = time(0); struct tm *lt= localtime_r(now, lt_buf); -fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s , -lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday, -lt-tm_hour, lt-tm_min, lt-tm_sec, -tzname[!!lt-tm_isdst]); +if (lt != NULL) +fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s , +lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday, +lt-tm_hour, lt-tm_min, lt-tm_sec, +tzname[!!lt-tm_isdst]); +else +fprintf(lg-f, [localtime_r failed: %d] , errno); lg-f ? ~Andrew } if (lg-flags XTL_STDIOSTREAM_SHOW_PID) fprintf(lg-f, [%lu] , (unsigned long)getpid()); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V2] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()
On 07/07/15 17:05, Ian Jackson wrote: Andrew Cooper writes (Re: [Xen-devel] [PATCH V2] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()): On 07/07/15 16:48, Jennifer Herbert wrote: +else +fprintf(lg-f, [localtime_r failed: %d] , errno); lg-f ? I didn't spot this. I guess the patch wasn't build-tested... Not sure what happened there - something was built tested, but I presumably without this patch. Sorry about that. -jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/7] libxc: Use const pointer in local_file_dump()
On 03/07/15 16:27, Ian Jackson wrote: Jennifer Herbert writes ([Xen-devel] [PATCH 2/7] libxc: Use const pointer in local_file_dump()): By adding the const keyword, it is clearer to people and static analysis tools that no changes to the data are to be made. Would it be wrong for a future patch to add a field to dump_args which gets modified ? AFAICT the answer is `no'. So I don't understand why it ought to be const. Does your Coverity instance complain about every instance where a struct exists which is not marked const but which could be ? Coverity see da-fd being passed to local_file_dump, as a modifiable entity and concludes this function may be overwriting this, and end up leaking the handle. It wouldn't be wrong to modify a new field, I just hadn't considered that likely. Since maybe it is, I'll just mark as false positive. -jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 5/7] libxc: Removing dead code from xc_dom_allocate()
The only place that jumps to 'err:' does so because !dom, which is rechecked in 'err:'. This patch simplifies, giving the same result. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xc_dom_core.c |7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index b100ce1..cde943c 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -720,7 +720,7 @@ struct xc_dom_image *xc_dom_allocate(xc_interface *xch, __FUNCTION__, cmdline, features); dom = malloc(sizeof(*dom)); if ( !dom ) -goto err; +return NULL; memset(dom, 0, sizeof(*dom)); dom-xch = xch; @@ -742,11 +742,6 @@ struct xc_dom_image *xc_dom_allocate(xc_interface *xch, dom-alloc_malloc += sizeof(*dom); return dom; - - err: -if ( dom ) -xc_dom_release(dom); -return NULL; } int xc_dom_kernel_max_size(struct xc_dom_image *dom, size_t sz) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 7/7] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()
Unlikely that it may seem localtime_r could fail, which would result in a null pointer dereference. In this case, one can simply just skip logging the date/time, and logging anything is more useful then nothing. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xtl_logger_stdio.c |9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/libxc/xtl_logger_stdio.c b/tools/libxc/xtl_logger_stdio.c index d8646e0..74d66a5 100644 --- a/tools/libxc/xtl_logger_stdio.c +++ b/tools/libxc/xtl_logger_stdio.c @@ -61,10 +61,11 @@ static void stdiostream_vmessage(xentoollog_logger *logger_in, struct tm lt_buf; time_t now = time(0); struct tm *lt= localtime_r(now, lt_buf); -fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s , -lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday, -lt-tm_hour, lt-tm_min, lt-tm_sec, -tzname[!!lt-tm_isdst]); +if (lt != NULL) +fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s , +lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday, +lt-tm_hour, lt-tm_min, lt-tm_sec, +tzname[!!lt-tm_isdst]); } if (lg-flags XTL_STDIOSTREAM_SHOW_PID) fprintf(lg-f, [%lu] , (unsigned long)getpid()); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/7] libxc: Use const pointer in local_file_dump()
By adding the const keyword, it is clearer to people and static analysis tools that no changes to the data are to be made. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xc_core.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libxc/xc_core.c b/tools/libxc/xc_core.c index dfa424b..f759fd6 100644 --- a/tools/libxc/xc_core.c +++ b/tools/libxc/xc_core.c @@ -926,7 +926,7 @@ struct dump_args { static int local_file_dump(xc_interface *xch, void *args, char *buffer, unsigned int length) { -struct dump_args *da = args; +const struct dump_args *da = args; if ( write_exact(da-fd, buffer, length) == -1 ) { -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/7] libxc: Fix uninitialized valiables in xc_cpuid_hvm_policy()
If xc_hvm_param_get fails, is_pae and/or is_nestedhvm are left undefined. This patch Indicates error and defaults to false. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xc_cpuid_x86.c | 18 +- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c index 847b701..cc6f18a 100644 --- a/tools/libxc/xc_cpuid_x86.c +++ b/tools/libxc/xc_cpuid_x86.c @@ -268,11 +268,16 @@ static void xc_cpuid_hvm_policy( { DECLARE_DOMCTL; char brand[13]; -uint64_t val; -int is_pae, is_nestedhvm; +uint64_t val = 0; +int is_pae; +int is_nestedhvm = 0; uint64_t xfeature_mask; +int rc; + +rc = xc_hvm_param_get(xch, domid, HVM_PARAM_PAE_ENABLED, val); +if ( rc 0 ) + ERROR(xc_hvm_param_get failed to get PAE_ENABLED with rc %d\n, rc); -xc_hvm_param_get(xch, domid, HVM_PARAM_PAE_ENABLED, val); is_pae = !!val; /* Detecting Xen's atitude towards XSAVE */ @@ -282,8 +287,11 @@ static void xc_cpuid_hvm_policy( do_domctl(xch, domctl); xfeature_mask = domctl.u.vcpuextstate.xfeature_mask; -xc_hvm_param_get(xch, domid, HVM_PARAM_NESTEDHVM, val); -is_nestedhvm = !!val; +rc = xc_hvm_param_get(xch, domid, HVM_PARAM_NESTEDHVM, val); +if ( rc = 0 ) +is_nestedhvm = !!val; +else +ERROR(xc_hvm_param_get failed to get NESTEDHVM with rc %d\n, rc); switch ( input[0] ) { -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 6/7] libxc: Fix misleading use of strncpy code in build_hvm_info()
hvm_info-signature is not a string, but an 64 bit int, and is not NULL terminated. The use of strncpy to populate it is inappropriate and potentially misleading. A cursory glance might have you thinking someone had miscounted the length of the string literal - not realising it was intentionally cropping of the null termination. Also, since we wish to initialise all of hvm_info-signature, and certainly no more, the use of sizeof is safer. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xc_hvm_build_x86.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index 003ea06..ec5ef4d 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -99,7 +99,7 @@ static void build_hvm_info(void *hvm_info_page, memset(hvm_info_page, 0, PAGE_SIZE); /* Fill in the header. */ -strncpy(hvm_info-signature, HVM INFO, 8); +memcpy(hvm_info-signature, HVM INFO, sizeof(hvm_info-signature)); hvm_info-length = sizeof(struct hvm_info_table); /* Sensible defaults: these can be overridden by the caller. */ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/7] libxc: Prevent dereferencing NULL pointers returned from xc_dom_allocate()
The return from xc_dom_allocate is not checked for a NULL value. This patch fixes this, causing it to return from the function with an error. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xc_dom_compat_linux.c |6 ++ 1 file changed, 6 insertions(+) diff --git a/tools/libxc/xc_dom_compat_linux.c b/tools/libxc/xc_dom_compat_linux.c index 2c14a0f..617cd96 100644 --- a/tools/libxc/xc_dom_compat_linux.c +++ b/tools/libxc/xc_dom_compat_linux.c @@ -91,6 +91,8 @@ int xc_linux_build_mem(xc_interface *xch, uint32_t domid, xc_dom_loginit(xch); dom = xc_dom_allocate(xch, cmdline, features); +if (dom == NULL) +return -1; if ( (rc = xc_dom_kernel_mem(dom, image_buffer, image_size)) != 0 ) goto out; if ( initrd ((rc = xc_dom_ramdisk_mem(dom, initrd, initrd_len)) != 0) ) @@ -123,6 +125,8 @@ int xc_linux_build(xc_interface *xch, uint32_t domid, xc_dom_loginit(xch); dom = xc_dom_allocate(xch, cmdline, features); +if (dom == NULL) +return -1; if ( (rc = xc_dom_kernel_file(dom, image_name)) != 0 ) goto out; if ( initrd_name strlen(initrd_name) @@ -146,6 +150,8 @@ int xc_get_bit_size(xc_interface *xch, int rc; *bit_size = 0; dom = xc_dom_allocate(xch, cmdline, features); +if (dom == NULL) +return -1; if ( (rc = xc_dom_kernel_file(dom, image_name)) != 0 ) goto out; if ( (rc = xc_dom_parse_image(dom)) != 0 ) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/7] libxc: Fix a number of coverity issues.
Fix a number of coverity issues in libxc. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/7] libxc: fix uninitialized variable in xc_cpuid_pv_policy()
If xc_domain_get_guest_width were to fail, guest_width is not set, and hence guest_64bit becomes undefined. Fix is to initialise to 0, and report error if call fails. Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com --- tools/libxc/xc_cpuid_x86.c |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c index c97f91a..847b701 100644 --- a/tools/libxc/xc_cpuid_x86.c +++ b/tools/libxc/xc_cpuid_x86.c @@ -437,14 +437,16 @@ static void xc_cpuid_pv_policy( { DECLARE_DOMCTL; unsigned int guest_width; -int guest_64bit; +int guest_64bit = 0; char brand[13]; uint64_t xfeature_mask; xc_cpuid_brand_get(brand); -xc_domain_get_guest_width(xch, domid, guest_width); -guest_64bit = (guest_width == 8); +if (xc_domain_get_guest_width(xch, domid, guest_width) == 0) +guest_64bit = (guest_width == 8); +else +ERROR(Could not read guest word width.); /* Detecting Xen's atitude towards XSAVE */ memset(domctl, 0, sizeof(domctl)); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCHv3 0/4] Use ticket locks for spinlocks
On 21/04/15 11:22, Jan Beulich wrote: Despite both pictures saying micro-seconds at the respective axis (and hence the problem not being _as bad_) - did the data collection reveal where these IRQ disable regions are, so we could look into eliminating them? (ISTR there being some open coded IRQ-disable, lock, unlock, IRQ-restore/enable sequences, but I'm not sure whether they got eliminated already.) Jan The location of the locks was not collected, (Other then if it was from a spinlock or not) but I'm certainly considering doing this, as another graph of duration against cumulative time, shows the majority of time spent with interrupts disabled, is during code that keeps it disabled for long contiguous periods, all using raw (not spinlock) Interrupt disable. (A few occurrences, but very long when they happen) It would certainly be interesting to know where these come from. -jenny ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel