Re: [Xen-devel] [PATCH v3 2/3] Introduce migration precopy policy

2017-09-26 Thread Jennifer Herbert


On 26/09/17 12:02, Andrew Cooper wrote:

On 25/09/17 19:55, Jennifer Herbert wrote:

@@ -46,7 +60,22 @@ struct save_callbacks {
   */
  int (*suspend)(void* data);
  
-/* Called after the guest's dirty pages have been

+/*
+ * Called before and after every batch of page data sent during
+ * the precopy phase of a live migration to ask the caller what
+ * to do next based on the current state of the precopy migration.

How is the callback supposed to determine whether it is before or ahead
of the data batch?

As far as I can tell, its not possible.

~Andrew


You look at the dirty_count.  It works well for me.
I can add that info to the comment.

-jenny


+ *
+ * Should return one of the values listed below:
+ */
+#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely
+* and tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit the
+* remaining dirty pages. */
+precopy_policy_t precopy_policy;
+
+/*
+ * Called after the guest's dirty pages have been
   *  copied into an output buffer.
   * Callback function resumes the guest & the device model,
   *  returns to xc_domain_save.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 2/3] Introduce migration precopy policy

2017-09-26 Thread Jennifer Herbert

On 26/09/17 11:58, Andrew Cooper wrote:

On 25/09/17 19:55, Jennifer Herbert wrote:

+/*
+ * A precopy_policy callback may not be running in the same address
+ * space as libxc an so precopy_stats is passed by value.
+ */

Please take a step back and thing about what is written here...

As I've said repeatedly, the structure vs pointer argument here is
entirely unrelated to the IPC boundary.

I am also unhappy that, after multiple review requests saying "turn this
into a pointer", its remained being passed by value.  It is not ok to
hack things like this up simply because its the easier cause of action.


Passing tiny structures by value is not a hack - it is perfectly valid 
C.  Bare in mind that a pointer is not a zero
sized entity - it itself needs to be copied.  Passing by value here is 
quite possibly faster, and use less memory.   Furthermore, the multiple 
reviews concluded that this was related to IPC boundary, and that the 
tiny struct size was acceptable.


Maybe you could elaborate on why you do not think it is related to the 
IPC boundary.


Instead, we could pass each element of the structure, as an individual 
parameters.  This would seem a retrograde

step to me, but  would address your sensitivities.

-jenny

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 3/3] RFC: migration: defer precopy policy to libxl

2017-09-26 Thread Jennifer Herbert

On 26/09/17 09:48, Wei Liu wrote:

On Mon, Sep 25, 2017 at 07:55:35PM +0100, Jennifer Herbert wrote:

Provide an implementation of the old policy as a callback in
libxl and plumb it through the IPC machinery to libxc.

This serves as an example for defining a libxl policy,
and provides no advantage over the default policy in libxc.

Signed-off-by: Joshua Otto <jto...@uwaterloo.ca>
Reviewed-by: Roger Pau Monné <roger@citrix.com>

The code looks plausible but it is not really usable by libxl users
as-is, because a public API is still missing.


This is why I excluded it to start with, and now labelled  it RFC.
Its only the first two patches I would like to see pulled atm.

Cheers

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 2/3] Introduce migration precopy policy

2017-09-25 Thread Jennifer Herbert
This Patch allows a migration precopy policy to be specified.

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
  mechanism of the precopy stage
- it can't take into account facts external to the immediate
  migration context, such external state transfer state, interactive
  user input, or the passage of wall-clock time.
- it does not permit the user to change their mind, over time, about
  what to do at the end of the precopy (they get an unconditional
  transition into the stop-and-copy phase of the migration)

To permit callers to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
  callbacks
- during the precopy phase of live migrations, consult this policy after
  each batch of pages transmitted and take the dictated action, which
  may be to a) abort the migration entirely, b) continue with the
  precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy, used when
  precopy_policy callback  is not provided.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
Signed-off-by: Joshua Otto <jto...@uwaterloo.ca>
Reviewed-by: Paul Durrant <paul.durr...@citrix.com>

---

v3:

Have changed/added comments, removed iteration parmeter from
update_progress_string and made a few formatting corrections.

v2:

Have made a few formatting corrections, added typedef as suggested.

This is updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product.
I hope this will aid Josh in pushing the remainder of his series.
---
 tools/libxc/include/xenguest.h |  37 +--
 tools/libxc/xc_sr_common.h |   6 +--
 tools/libxc/xc_sr_save.c   | 105 -
 3 files changed, 109 insertions(+), 39 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 6626f0c..de97f1b 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -39,6 +39,20 @@
  */
 struct xenevtchn_handle;
 
+/* For save's precopy_policy(). */
+struct precopy_stats
+{
+unsigned int iteration;
+unsigned int total_written;
+long dirty_count; /* -1 if unknown */
+};
+
+/*
+ * A precopy_policy callback may not be running in the same address
+ * space as libxc an so precopy_stats is passed by value.
+ */
+typedef int (*precopy_policy_t)(struct precopy_stats, void *);
+
 /* callbacks provided by xc_domain_save */
 struct save_callbacks {
 /* Called after expiration of checkpoint interval,
@@ -46,7 +60,22 @@ struct save_callbacks {
  */
 int (*suspend)(void* data);
 
-/* Called after the guest's dirty pages have been
+/*
+ * Called before and after every batch of page data sent during
+ * the precopy phase of a live migration to ask the caller what
+ * to do next based on the current state of the precopy migration.
+ *
+ * Should return one of the values listed below:
+ */
+#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely
+* and tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit the
+* remaining dirty pages. */
+precopy_policy_t precopy_policy;
+
+/*
+ * Called after the guest's dirty pages have been
  *  copied into an output buffer.
  * Callback function resumes the guest & the device model,
  *  returns to xc_domain_save.
@@ -55,7 +84,8 @@ struct save_callbacks {
  */
 int (*postcopy)(void* data);
 
-/* Called after the memory checkpoint has been flushed
+/*
+ * Called after the memory checkpoint has been flushed
  * out into the network. Typical actions performed in this
  * callback include:
  *   (a) send the saved device model state (for HVM guests),
@@ -65,7 +95,8 @@ struct save_callbacks {
  *
  * returns:
  * 0: terminate checkpointing gracefully
- * 1: take another checkpoint */
+ * 1: take another checkpoint
+ */
 int (*checkpoint)(void* data);
 
 /*
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22a..3635704 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -198,12 +198,10 @@ struct xc_sr_context
 

[Xen-devel] [PATCH v3 3/3] RFC: migration: defer precopy policy to libxl

2017-09-25 Thread Jennifer Herbert
Provide an implementation of the old policy as a callback in
libxl and plumb it through the IPC machinery to libxc.

This serves as an example for defining a libxl policy,
and provides no advantage over the default policy in libxc.

Signed-off-by: Joshua Otto 
Reviewed-by: Roger Pau Monné 
---
 tools/libxl/libxl_dom_save.c   | 20 
 tools/libxl/libxl_save_msgs_gen.pl |  4 +++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 77fe30e..6d28cce 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -328,6 +328,25 @@ int 
libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss,
 return rc;
 }
 
+/*
+ * This is the live migration precopy policy - it's called periodically during
+ * the precopy phase of live migrations, and is responsible for deciding when
+ * the precopy phase should terminate and what should be done next.
+ *
+ * The policy implemented here behaves identically to the policy previously
+ * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy phase of
+ * the live migration when there are either fewer than 50 dirty pages, or more
+ * than 5 precopy rounds have completed.
+ */
+static int libxl__save_live_migration_simple_precopy_policy(
+struct precopy_stats stats, void *user)
+{
+return ((stats.dirty_count >= 0 && stats.dirty_count < 50) ||
+stats.iteration >= 5)
+? XGS_POLICY_STOP_AND_COPY
+: XGS_POLICY_CONTINUE_PRECOPY;
+}
+
 /*- main code for saving, in order of execution -*/
 
 void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
@@ -401,6 +420,7 @@ void libxl__domain_save(libxl__egc *egc, 
libxl__domain_save_state *dss)
 if (dss->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_NONE)
 callbacks->suspend = libxl__domain_suspend_callback;
 
+callbacks->precopy_policy = 
libxl__save_live_migration_simple_precopy_policy;
 callbacks->switch_qemu_logdirty = 
libxl__domain_suspend_common_switch_qemu_logdirty;
 
 dss->sws.ao  = dss->ao;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl 
b/tools/libxl/libxl_save_msgs_gen.pl
index 3ae7373..bb1d4e9 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -33,6 +33,7 @@ our @msgs = (
   'xen_pfn_t', 'console_gfn'] ],
 [  9, 'srW',"complete",  [qw(int retval
  int errnoval)] ],
+[ 10, 'scxW',   "precopy_policy", ['struct precopy_stats', 'stats'] ]
 );
 
 #
@@ -141,7 +142,8 @@ static void bytes_put(unsigned char *const buf, int *len,
 
 END
 
-foreach my $simpletype (qw(int uint16_t uint32_t unsigned), 'unsigned long', 
'xen_pfn_t') {
+foreach my $simpletype (qw(int uint16_t uint32_t unsigned),
+'unsigned long', 'xen_pfn_t', 'struct precopy_stats') {
 my $typeid = typeid($simpletype);
 $out_body{'callout'} .= <

[Xen-devel] [PATCH v3 0/3] Introduce migration precopy policy

2017-09-25 Thread Jennifer Herbert
Hi all,

v3:

In patch 2/3, have changed/added comments, removed iteration parmeter from
update_progress_string and made a few formatting corrections.

v2:

Have tidied some formatting, s.o.b Joshua, and added a RFC patch
showing how to use this in  libxl.

v1:

Here I present a updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product,
and hence am trying to get this subset  pulled in now. I also
hope this will aid Josh in pushing the remainder of his series.

Here I present two patches, the first which does some tidy up, removing
unused and unhelpful paramaters to xc_domain_save(), and the second
which allows a precopy callback to be specified, providing the
test for when to end the live phase of migration should end.
If none is provided, a default policy of the current behaviour
is used.


Jennifer Herbert (3):
  Tidy libxc xc_domain_save
  Introduce migration precopy policy
  RFC: migration: defer precopy policy to libxl

 tools/libxc/include/xenguest.h |  41 --
 tools/libxc/xc_nomigrate.c |   3 +-
 tools/libxc/xc_sr_common.h |   6 +-
 tools/libxc/xc_sr_save.c   | 113 +
 tools/libxl/libxl_dom_save.c   |  20 +++
 tools/libxl/libxl_save_callout.c   |   4 +-
 tools/libxl/libxl_save_helper.c|   7 +--
 tools/libxl/libxl_save_msgs_gen.pl |   4 +-
 8 files changed, 142 insertions(+), 56 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 1/3] Tidy libxc xc_domain_save

2017-09-25 Thread Jennifer Herbert
Tidy up libxc's xc_domain_save, removing unused paramaters
max_iters and max_factor, making matching changes to libxl.

Signed-off-by: Joshua Otto <jto...@uwaterloo.ca>
Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
Acked-by: Wei Liu <wei.l...@citrix.com>
---
 tools/libxc/include/xenguest.h   | 4 ++--
 tools/libxc/xc_nomigrate.c   | 3 +--
 tools/libxc/xc_sr_save.c | 8 +++-
 tools/libxl/libxl_save_callout.c | 4 ++--
 tools/libxl/libxl_save_helper.c  | 7 ++-
 5 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 5cd8111..6626f0c 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -100,8 +100,8 @@ typedef enum {
  *doesn't use checkpointing
  * @return 0 on success, -1 on failure
  */
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+   uint32_t flags /* XCFLAGS_xxx */,
struct save_callbacks* callbacks, int hvm,
xc_migration_stream_t stream_type, int recv_fd);
 
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index 317c8ce..fe8f68c 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -20,8 +20,7 @@
 #include 
 #include 
 
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-   uint32_t max_factor, uint32_t flags,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags,
struct save_callbacks* callbacks, int hvm,
xc_migration_stream_t stream_type, int recv_fd)
 {
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index ca6913b..1e7502d 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -916,9 +916,8 @@ static int save(struct xc_sr_context *ctx, uint16_t 
guest_type)
 };
 
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
-   uint32_t max_iters, uint32_t max_factor, uint32_t flags,
-   struct save_callbacks* callbacks, int hvm,
-   xc_migration_stream_t stream_type, int recv_fd)
+   uint32_t flags, struct save_callbacks* callbacks,
+   int hvm, xc_migration_stream_t stream_type, int recv_fd)
 {
 struct xc_sr_context ctx =
 {
@@ -955,8 +954,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t 
dom,
 if ( ctx.save.checkpointed == XC_MIG_STREAM_COLO )
 assert(callbacks->wait_checkpoint);
 
-DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d",
-io_fd, dom, max_iters, max_factor, flags, hvm);
+DPRINTF("fd %d, dom %u, flags %u, hvm %d", io_fd, dom, flags, hvm);
 
 if ( xc_domain_getinfo(xch, dom, 1, ) != 1 )
 {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 891c669..6452d70 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -89,8 +89,8 @@ void libxl__xc_domain_save(libxl__egc *egc, 
libxl__domain_save_state *dss,
 libxl__srm_callout_enumcallbacks_save(>callbacks.save.a);
 
 const unsigned long argnums[] = {
-dss->domid, 0, 0, dss->xcflags, dss->hvm,
-cbflags, dss->checkpointed_stream,
+dss->domid, dss->xcflags, dss->hvm, cbflags,
+dss->checkpointed_stream,
 };
 
 shs->ao = ao;
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 1dece23..38089a0 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -251,8 +251,6 @@ int main(int argc, char **argv)
 io_fd = atoi(NEXTARG);
 recv_fd =   atoi(NEXTARG);
 uint32_t dom =  strtoul(NEXTARG,0,10);
-uint32_t max_iters =strtoul(NEXTARG,0,10);
-uint32_t max_factor =   strtoul(NEXTARG,0,10);
 uint32_t flags =strtoul(NEXTARG,0,10);
 int hvm =   atoi(NEXTARG);
 unsigned cbflags =  strtoul(NEXTARG,0,10);
@@ -264,9 +262,8 @@ int main(int argc, char **argv)
 startup("save");
 setup_signals(save_signal_handler);
 
-r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
-   _save_callbacks, hvm, stream_type,
-   recv_fd);
+r = xc_domain_save(xch, io_fd, dom, flags, _save_callbacks,
+   hvm, stream_type, recv_fd);
 complete(r);
 
 } else if (!strcmp(mode,"--restore-domain&q

Re: [Xen-devel] [PATCH v2 2/3] Introduce migration precopy policy

2017-09-20 Thread Jennifer Herbert

On 20/09/17 11:20, Roger Pau Monné wrote:

On Tue, Sep 19, 2017 at 07:06:26PM +0100, Jennifer Herbert wrote:

This Patch allows a migration precopy policy to be specified.

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
   mechanism of the precopy stage
- it can't take into account facts external to the immediate
   migration context, such external state transfer state, interactive
   user input, or the passage of wall-clock time.
- it does not permit the user to change their mind, over time, about
   what to do at the end of the precopy (they get an unconditional
   transition into the stop-and-copy phase of the migration)

To permit callers to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
   callbacks
- during the precopy phase of live migrations, consult this policy after
   each batch of pages transmitted and take the dictated action, which
   may be to a) abort the migration entirely, b) continue with the
   precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy, used when
   precopy_policy callback  is not provided.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
Signed-off-by: Joshua Otto <jto...@uwaterloo.ca>

---

v2:

Have made a few formatting corrections, added typedef as suggested.

v1:

This is updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product.
I hope this will aid Josh in pushing the remainder of his series.

---
  tools/libxc/include/xenguest.h | 31 --
  tools/libxc/xc_sr_common.h |  6 +--
  tools/libxc/xc_sr_save.c   | 97 +-
  3 files changed, 97 insertions(+), 37 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 6626f0c..a2a654c 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -39,6 +39,16 @@
   */
  struct xenevtchn_handle;
  
+/* For save's precopy_policy(). */

+struct precopy_stats
+{
+unsigned int iteration;
+unsigned int total_written;
+long dirty_count; /* -1 if unknown */
+};
+
+typedef int (*precopy_policy_t)(struct precopy_stats, void *);

Shouldn't precopy_stats be a pointer (const pointer probably seeing
it's usage)?


In April Joshua described how he did it like this to help with IPC 
plumbing into libxl.
Ian Jackson explained that you can't pass a pointer across the IPC 
boundary. The two

bits of code run in different processes, with different address spaces.

Since the precopy_stats structure is tiny, it was concluded it would 
have very small impact on performance.
I'd also agree since a pointer to this structure would almost half as 
big at the structure itself.



I think I'll add a comment above the line to explain the decision.


+
  /* callbacks provided by xc_domain_save */
  struct save_callbacks {
  /* Called after expiration of checkpoint interval,
@@ -46,7 +56,20 @@ struct save_callbacks {
   */
  int (*suspend)(void* data);
  
-/* Called after the guest's dirty pages have been

+/*
+ * Called after every batch of page data sent during the precopy
+ * phase of a live migration to ask the caller what to do next
+ * based on the current state of the precopy migration.

I would add:

"Should return one of the values listed below:"


+ */
+#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely
+* and tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit the
+* remaining dirty pages. */
+precopy_policy_t precopy_policy;
+
+/*
+ * Called after the guest's dirty pages have been
   *  copied into an output buffer.
   * Callback function resumes the guest & the device model,
   *  returns to xc_domain_save.
@@ -55,7 +78,8 @@ struct save_callbacks {
   */
  int (*postcopy)(void* data);
  
-/* Called after the memory checkpoint has been flushed

+/*
+ * Called after the memory checkpoint has been flushed
   * out into the network. Typical actions performed in this
   * callback include:
   *   (a) send the saved device model state (for HVM guests),
@@ 

[Xen-devel] [PATCH v2 3/3] RFC: migration: defer precopy policy to libxl

2017-09-19 Thread Jennifer Herbert
Provide an implementation of the old policy as a callback in
libxl and plumb it through the IPC machinery to libxc.

This serves as an example for defining a libxl policy,
and provides no advantage over the default policy in libxc.

Signed-off-by: Joshua Otto 
---

I have included this patch, as rfc, as requested by Ian, to show how
libxl can provide a migration precopy policy. This was part of the
same larger patch from Joshua - I have not changed or tested it.

 tools/libxl/libxl_dom_save.c   | 20 
 tools/libxl/libxl_save_msgs_gen.pl |  4 +++-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_dom_save.c b/tools/libxl/libxl_dom_save.c
index 77fe30e..6d28cce 100644
--- a/tools/libxl/libxl_dom_save.c
+++ b/tools/libxl/libxl_dom_save.c
@@ -328,6 +328,25 @@ int 
libxl__save_emulator_xenstore_data(libxl__domain_save_state *dss,
 return rc;
 }
 
+/*
+ * This is the live migration precopy policy - it's called periodically during
+ * the precopy phase of live migrations, and is responsible for deciding when
+ * the precopy phase should terminate and what should be done next.
+ *
+ * The policy implemented here behaves identically to the policy previously
+ * hard-coded into xc_domain_save() - it proceeds to the stop-and-copy phase of
+ * the live migration when there are either fewer than 50 dirty pages, or more
+ * than 5 precopy rounds have completed.
+ */
+static int libxl__save_live_migration_simple_precopy_policy(
+struct precopy_stats stats, void *user)
+{
+return ((stats.dirty_count >= 0 && stats.dirty_count < 50) ||
+stats.iteration >= 5)
+? XGS_POLICY_STOP_AND_COPY
+: XGS_POLICY_CONTINUE_PRECOPY;
+}
+
 /*- main code for saving, in order of execution -*/
 
 void libxl__domain_save(libxl__egc *egc, libxl__domain_save_state *dss)
@@ -401,6 +420,7 @@ void libxl__domain_save(libxl__egc *egc, 
libxl__domain_save_state *dss)
 if (dss->checkpointed_stream == LIBXL_CHECKPOINTED_STREAM_NONE)
 callbacks->suspend = libxl__domain_suspend_callback;
 
+callbacks->precopy_policy = 
libxl__save_live_migration_simple_precopy_policy;
 callbacks->switch_qemu_logdirty = 
libxl__domain_suspend_common_switch_qemu_logdirty;
 
 dss->sws.ao  = dss->ao;
diff --git a/tools/libxl/libxl_save_msgs_gen.pl 
b/tools/libxl/libxl_save_msgs_gen.pl
index 3ae7373..bb1d4e9 100755
--- a/tools/libxl/libxl_save_msgs_gen.pl
+++ b/tools/libxl/libxl_save_msgs_gen.pl
@@ -33,6 +33,7 @@ our @msgs = (
   'xen_pfn_t', 'console_gfn'] ],
 [  9, 'srW',"complete",  [qw(int retval
  int errnoval)] ],
+[ 10, 'scxW',   "precopy_policy", ['struct precopy_stats', 'stats'] ]
 );
 
 #
@@ -141,7 +142,8 @@ static void bytes_put(unsigned char *const buf, int *len,
 
 END
 
-foreach my $simpletype (qw(int uint16_t uint32_t unsigned), 'unsigned long', 
'xen_pfn_t') {
+foreach my $simpletype (qw(int uint16_t uint32_t unsigned),
+'unsigned long', 'xen_pfn_t', 'struct precopy_stats') {
 my $typeid = typeid($simpletype);
 $out_body{'callout'} .= <

[Xen-devel] [PATCH v2 2/3] Introduce migration precopy policy

2017-09-19 Thread Jennifer Herbert
This Patch allows a migration precopy policy to be specified.

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
  mechanism of the precopy stage
- it can't take into account facts external to the immediate
  migration context, such external state transfer state, interactive
  user input, or the passage of wall-clock time.
- it does not permit the user to change their mind, over time, about
  what to do at the end of the precopy (they get an unconditional
  transition into the stop-and-copy phase of the migration)

To permit callers to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
  callbacks
- during the precopy phase of live migrations, consult this policy after
  each batch of pages transmitted and take the dictated action, which
  may be to a) abort the migration entirely, b) continue with the
  precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy, used when
  precopy_policy callback  is not provided.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
Signed-off-by: Joshua Otto <jto...@uwaterloo.ca>

---

v2:

Have made a few formatting corrections, added typedef as suggested.

v1:

This is updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product.
I hope this will aid Josh in pushing the remainder of his series.

---
 tools/libxc/include/xenguest.h | 31 --
 tools/libxc/xc_sr_common.h |  6 +--
 tools/libxc/xc_sr_save.c   | 97 +-
 3 files changed, 97 insertions(+), 37 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 6626f0c..a2a654c 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -39,6 +39,16 @@
  */
 struct xenevtchn_handle;
 
+/* For save's precopy_policy(). */
+struct precopy_stats
+{
+unsigned int iteration;
+unsigned int total_written;
+long dirty_count; /* -1 if unknown */
+};
+
+typedef int (*precopy_policy_t)(struct precopy_stats, void *);
+
 /* callbacks provided by xc_domain_save */
 struct save_callbacks {
 /* Called after expiration of checkpoint interval,
@@ -46,7 +56,20 @@ struct save_callbacks {
  */
 int (*suspend)(void* data);
 
-/* Called after the guest's dirty pages have been
+/*
+ * Called after every batch of page data sent during the precopy
+ * phase of a live migration to ask the caller what to do next
+ * based on the current state of the precopy migration.
+ */
+#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely
+* and tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit the
+* remaining dirty pages. */
+precopy_policy_t precopy_policy;
+
+/*
+ * Called after the guest's dirty pages have been
  *  copied into an output buffer.
  * Callback function resumes the guest & the device model,
  *  returns to xc_domain_save.
@@ -55,7 +78,8 @@ struct save_callbacks {
  */
 int (*postcopy)(void* data);
 
-/* Called after the memory checkpoint has been flushed
+/*
+ * Called after the memory checkpoint has been flushed
  * out into the network. Typical actions performed in this
  * callback include:
  *   (a) send the saved device model state (for HVM guests),
@@ -65,7 +89,8 @@ struct save_callbacks {
  *
  * returns:
  * 0: terminate checkpointing gracefully
- * 1: take another checkpoint */
+ * 1: take another checkpoint 
+ */
 int (*checkpoint)(void* data);
 
 /*
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22a..3635704 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -198,12 +198,10 @@ struct xc_sr_context
 /* Further debugging information in the stream. */
 bool debug;
 
-/* Parameters for tweaking live migration. */
-unsigned max_iterations;
-unsigned dirty_threshold;
-
 unsigned long p2m_size;
 
+struct precopy_stats stats;
+
 xen_pfn_t *batch_pfns;
 unsigned nr_batch_pfns;
 unsigned long *

[Xen-devel] [PATCH v2 1/3] Tidy libxc xc_domain_save

2017-09-19 Thread Jennifer Herbert
Tidy up libxc's xc_domain_save, removing unused paramaters
max_iters and max_factor, making matching changes to libxl.

Signed-off-by: Joshua Otto <jto...@uwaterloo.ca>
Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
Reviewed-by: Paul Durrant <paul.durr...@citrix.com>
Acked-by: Wei Liu <wei.l...@citrix.com>
---
 tools/libxc/include/xenguest.h   | 4 ++--
 tools/libxc/xc_nomigrate.c   | 3 +--
 tools/libxc/xc_sr_save.c | 8 +++-
 tools/libxl/libxl_save_callout.c | 4 ++--
 tools/libxl/libxl_save_helper.c  | 7 ++-
 5 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 5cd8111..6626f0c 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -100,8 +100,8 @@ typedef enum {
  *doesn't use checkpointing
  * @return 0 on success, -1 on failure
  */
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+   uint32_t flags /* XCFLAGS_xxx */,
struct save_callbacks* callbacks, int hvm,
xc_migration_stream_t stream_type, int recv_fd);
 
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index 317c8ce..fe8f68c 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -20,8 +20,7 @@
 #include 
 #include 
 
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-   uint32_t max_factor, uint32_t flags,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags,
struct save_callbacks* callbacks, int hvm,
xc_migration_stream_t stream_type, int recv_fd)
 {
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index ca6913b..1e7502d 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -916,9 +916,8 @@ static int save(struct xc_sr_context *ctx, uint16_t 
guest_type)
 };
 
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
-   uint32_t max_iters, uint32_t max_factor, uint32_t flags,
-   struct save_callbacks* callbacks, int hvm,
-   xc_migration_stream_t stream_type, int recv_fd)
+   uint32_t flags, struct save_callbacks* callbacks,
+   int hvm, xc_migration_stream_t stream_type, int recv_fd)
 {
 struct xc_sr_context ctx =
 {
@@ -955,8 +954,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t 
dom,
 if ( ctx.save.checkpointed == XC_MIG_STREAM_COLO )
 assert(callbacks->wait_checkpoint);
 
-DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d",
-io_fd, dom, max_iters, max_factor, flags, hvm);
+DPRINTF("fd %d, dom %u, flags %u, hvm %d", io_fd, dom, flags, hvm);
 
 if ( xc_domain_getinfo(xch, dom, 1, ) != 1 )
 {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 891c669..6452d70 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -89,8 +89,8 @@ void libxl__xc_domain_save(libxl__egc *egc, 
libxl__domain_save_state *dss,
 libxl__srm_callout_enumcallbacks_save(>callbacks.save.a);
 
 const unsigned long argnums[] = {
-dss->domid, 0, 0, dss->xcflags, dss->hvm,
-cbflags, dss->checkpointed_stream,
+dss->domid, dss->xcflags, dss->hvm, cbflags,
+dss->checkpointed_stream,
 };
 
 shs->ao = ao;
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 1dece23..38089a0 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -251,8 +251,6 @@ int main(int argc, char **argv)
 io_fd = atoi(NEXTARG);
 recv_fd =   atoi(NEXTARG);
 uint32_t dom =  strtoul(NEXTARG,0,10);
-uint32_t max_iters =strtoul(NEXTARG,0,10);
-uint32_t max_factor =   strtoul(NEXTARG,0,10);
 uint32_t flags =strtoul(NEXTARG,0,10);
 int hvm =   atoi(NEXTARG);
 unsigned cbflags =  strtoul(NEXTARG,0,10);
@@ -264,9 +262,8 @@ int main(int argc, char **argv)
 startup("save");
 setup_signals(save_signal_handler);
 
-r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
-   _save_callbacks, hvm, stream_type,
-   recv_fd);
+r = xc_domain_save(xch, io_fd, dom, flags, _save_callbacks,
+   hvm, stream_type, recv_fd);
 complete(r);
 
 } else if (!strcmp(mode,"--restore-domain&q

[Xen-devel] [PATCH v2 0/3] Introduce migration precopy policy

2017-09-19 Thread Jennifer Herbert
Hi all,

v2:

Have tidied some formatting, s.o.b Joshua, and added a RFC patch
showing how to use this in  libxl.

v1:

Here I present a updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product,
and hence am trying to get this subset  pulled in now. I also
hope this will aid Josh in pushing the remainder of his series.

Here I present two patches, the first which does some tidy up, removing
unused and unhelpful paramaters to xc_domain_save(), and the second
which allows a precopy callback to be specified, providing the
test for when to end the live phase of migration should end.
If none is provided, a default policy of the current behaviour
is used.

Jennifer Herbert (3):
  Tidy libxc xc_domain_save
  Introduce migration precopy policy
  RFC: migration: defer precopy policy to libxl

 tools/libxc/include/xenguest.h |  35 +++--
 tools/libxc/xc_nomigrate.c |   3 +-
 tools/libxc/xc_sr_common.h |   6 +--
 tools/libxc/xc_sr_save.c   | 105 -
 tools/libxl/libxl_dom_save.c   |  20 +++
 tools/libxl/libxl_save_callout.c   |   4 +-
 tools/libxl/libxl_save_helper.c|   7 +--
 tools/libxl/libxl_save_msgs_gen.pl |   4 +-
 8 files changed, 130 insertions(+), 54 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] Introduce migration precopy policy

2017-09-18 Thread Jennifer Herbert


On 18/09/17 14:30, Ian Jackson wrote:

Jennifer Herbert writes ("[PATCH 2/2] Introduce migration precopy policy"):

This Patch allows a migration precopy policy to be specified.

But only for direct libxc users.  How do you think this should be
exposed via libxl ?


The original patch had a small amount of code to pass this though such 
that libxl could provide one.  I'm not entity sure how the perl code 
worked, but I've not made any changes that wouldn't stop this working.  
That section would form a follow on patch.  Since I don't use or have 
much understanding of libxl, I didn't really feel I was the best person 
to do this.


-jenny


The general approach, so far, seems sound.

Thanks,
Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/2] Tidy libxc xc_domain_save

2017-09-14 Thread Jennifer Herbert
Tidy up libxc's xc_domain_save, removing unused paramaters
max_iters and max_factor, making matching changes to libxl.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
---
 tools/libxc/include/xenguest.h   | 4 ++--
 tools/libxc/xc_nomigrate.c   | 3 +--
 tools/libxc/xc_sr_save.c | 8 +++-
 tools/libxl/libxl_save_callout.c | 4 ++--
 tools/libxl/libxl_save_helper.c  | 7 ++-
 5 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 5cd8111..6626f0c 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -100,8 +100,8 @@ typedef enum {
  *doesn't use checkpointing
  * @return 0 on success, -1 on failure
  */
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+   uint32_t flags /* XCFLAGS_xxx */,
struct save_callbacks* callbacks, int hvm,
xc_migration_stream_t stream_type, int recv_fd);
 
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index 317c8ce..fe8f68c 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -20,8 +20,7 @@
 #include 
 #include 
 
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-   uint32_t max_factor, uint32_t flags,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags,
struct save_callbacks* callbacks, int hvm,
xc_migration_stream_t stream_type, int recv_fd)
 {
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index ca6913b..1e7502d 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -916,9 +916,8 @@ static int save(struct xc_sr_context *ctx, uint16_t 
guest_type)
 };
 
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
-   uint32_t max_iters, uint32_t max_factor, uint32_t flags,
-   struct save_callbacks* callbacks, int hvm,
-   xc_migration_stream_t stream_type, int recv_fd)
+   uint32_t flags, struct save_callbacks* callbacks,
+   int hvm, xc_migration_stream_t stream_type, int recv_fd)
 {
 struct xc_sr_context ctx =
 {
@@ -955,8 +954,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t 
dom,
 if ( ctx.save.checkpointed == XC_MIG_STREAM_COLO )
 assert(callbacks->wait_checkpoint);
 
-DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d",
-io_fd, dom, max_iters, max_factor, flags, hvm);
+DPRINTF("fd %d, dom %u, flags %u, hvm %d", io_fd, dom, flags, hvm);
 
 if ( xc_domain_getinfo(xch, dom, 1, ) != 1 )
 {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 891c669..6452d70 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -89,8 +89,8 @@ void libxl__xc_domain_save(libxl__egc *egc, 
libxl__domain_save_state *dss,
 libxl__srm_callout_enumcallbacks_save(>callbacks.save.a);
 
 const unsigned long argnums[] = {
-dss->domid, 0, 0, dss->xcflags, dss->hvm,
-cbflags, dss->checkpointed_stream,
+dss->domid, dss->xcflags, dss->hvm, cbflags,
+dss->checkpointed_stream,
 };
 
 shs->ao = ao;
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 1dece23..38089a0 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -251,8 +251,6 @@ int main(int argc, char **argv)
 io_fd = atoi(NEXTARG);
 recv_fd =   atoi(NEXTARG);
 uint32_t dom =  strtoul(NEXTARG,0,10);
-uint32_t max_iters =strtoul(NEXTARG,0,10);
-uint32_t max_factor =   strtoul(NEXTARG,0,10);
 uint32_t flags =strtoul(NEXTARG,0,10);
 int hvm =   atoi(NEXTARG);
 unsigned cbflags =  strtoul(NEXTARG,0,10);
@@ -264,9 +262,8 @@ int main(int argc, char **argv)
 startup("save");
 setup_signals(save_signal_handler);
 
-r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
-   _save_callbacks, hvm, stream_type,
-   recv_fd);
+r = xc_domain_save(xch, io_fd, dom, flags, _save_callbacks,
+   hvm, stream_type, recv_fd);
 complete(r);
 
 } else if (!strcmp(mode,"--restore-domain")) {
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/2] Tidy libxc xc_domain_save

2017-09-14 Thread Jennifer Herbert
Tidy up libxc's xc_domain_save, removing unused paramaters
max_iters and max_factor, making matching changes to libxl.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
---
 tools/libxc/include/xenguest.h   | 4 ++--
 tools/libxc/xc_nomigrate.c   | 3 +--
 tools/libxc/xc_sr_save.c | 8 +++-
 tools/libxl/libxl_save_callout.c | 4 ++--
 tools/libxl/libxl_save_helper.c  | 7 ++-
 5 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 5cd8111..6626f0c 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -100,8 +100,8 @@ typedef enum {
  *doesn't use checkpointing
  * @return 0 on success, -1 on failure
  */
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+   uint32_t flags /* XCFLAGS_xxx */,
struct save_callbacks* callbacks, int hvm,
xc_migration_stream_t stream_type, int recv_fd);
 
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c
index 317c8ce..fe8f68c 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -20,8 +20,7 @@
 #include 
 #include 
 
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-   uint32_t max_factor, uint32_t flags,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t flags,
struct save_callbacks* callbacks, int hvm,
xc_migration_stream_t stream_type, int recv_fd)
 {
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index ca6913b..1e7502d 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -916,9 +916,8 @@ static int save(struct xc_sr_context *ctx, uint16_t 
guest_type)
 };
 
 int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
-   uint32_t max_iters, uint32_t max_factor, uint32_t flags,
-   struct save_callbacks* callbacks, int hvm,
-   xc_migration_stream_t stream_type, int recv_fd)
+   uint32_t flags, struct save_callbacks* callbacks,
+   int hvm, xc_migration_stream_t stream_type, int recv_fd)
 {
 struct xc_sr_context ctx =
 {
@@ -955,8 +954,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t 
dom,
 if ( ctx.save.checkpointed == XC_MIG_STREAM_COLO )
 assert(callbacks->wait_checkpoint);
 
-DPRINTF("fd %d, dom %u, max_iters %u, max_factor %u, flags %u, hvm %d",
-io_fd, dom, max_iters, max_factor, flags, hvm);
+DPRINTF("fd %d, dom %u, flags %u, hvm %d", io_fd, dom, flags, hvm);
 
 if ( xc_domain_getinfo(xch, dom, 1, ) != 1 )
 {
diff --git a/tools/libxl/libxl_save_callout.c b/tools/libxl/libxl_save_callout.c
index 891c669..6452d70 100644
--- a/tools/libxl/libxl_save_callout.c
+++ b/tools/libxl/libxl_save_callout.c
@@ -89,8 +89,8 @@ void libxl__xc_domain_save(libxl__egc *egc, 
libxl__domain_save_state *dss,
 libxl__srm_callout_enumcallbacks_save(>callbacks.save.a);
 
 const unsigned long argnums[] = {
-dss->domid, 0, 0, dss->xcflags, dss->hvm,
-cbflags, dss->checkpointed_stream,
+dss->domid, dss->xcflags, dss->hvm, cbflags,
+dss->checkpointed_stream,
 };
 
 shs->ao = ao;
diff --git a/tools/libxl/libxl_save_helper.c b/tools/libxl/libxl_save_helper.c
index 1dece23..38089a0 100644
--- a/tools/libxl/libxl_save_helper.c
+++ b/tools/libxl/libxl_save_helper.c
@@ -251,8 +251,6 @@ int main(int argc, char **argv)
 io_fd = atoi(NEXTARG);
 recv_fd =   atoi(NEXTARG);
 uint32_t dom =  strtoul(NEXTARG,0,10);
-uint32_t max_iters =strtoul(NEXTARG,0,10);
-uint32_t max_factor =   strtoul(NEXTARG,0,10);
 uint32_t flags =strtoul(NEXTARG,0,10);
 int hvm =   atoi(NEXTARG);
 unsigned cbflags =  strtoul(NEXTARG,0,10);
@@ -264,9 +262,8 @@ int main(int argc, char **argv)
 startup("save");
 setup_signals(save_signal_handler);
 
-r = xc_domain_save(xch, io_fd, dom, max_iters, max_factor, flags,
-   _save_callbacks, hvm, stream_type,
-   recv_fd);
+r = xc_domain_save(xch, io_fd, dom, flags, _save_callbacks,
+   hvm, stream_type, recv_fd);
 complete(r);
 
 } else if (!strcmp(mode,"--restore-domain")) {
-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/2] Introduce migration precopy policy

2017-09-14 Thread Jennifer Herbert
This Patch allows a migration precopy policy to be specified.

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
  mechanism of the precopy stage
- it can't take into account facts external to the immediate
  migration context, such external state transfer state, interactive
  user input, or the passage of wall-clock time.
- it does not permit the user to change their mind, over time, about
  what to do at the end of the precopy (they get an unconditional
  transition into the stop-and-copy phase of the migration)

To permit callers to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
  callbacks
- during the precopy phase of live migrations, consult this policy after
  each batch of pages transmitted and take the dictated action, which
  may be to a) abort the migration entirely, b) continue with the
  precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy, used when
  precopy_policy callback  is not provided.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>

---

This is updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product.
I hope this will aid Josh in pushing the remainder of his series.
---
 tools/libxc/include/xenguest.h |  19 
 tools/libxc/xc_sr_common.h |   7 ++-
 tools/libxc/xc_sr_save.c   | 102 +
 3 files changed, 94 insertions(+), 34 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 6626f0c..d5908dc 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -39,6 +39,14 @@
  */
 struct xenevtchn_handle;
 
+/* For save's precopy_policy(). */
+struct precopy_stats
+{
+unsigned iteration;
+unsigned total_written;
+long dirty_count; /* -1 if unknown */
+};
+
 /* callbacks provided by xc_domain_save */
 struct save_callbacks {
 /* Called after expiration of checkpoint interval,
@@ -46,6 +54,17 @@ struct save_callbacks {
  */
 int (*suspend)(void* data);
 
+/* Called after every batch of page data sent during the precopy phase of a
+ * live migration to ask the caller what to do next based on the current
+ * state of the precopy migration.
+ */
+#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely and
+* tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit the
+* remaining dirty pages. */
+int (*precopy_policy)(struct precopy_stats stats, void *data);
+
 /* Called after the guest's dirty pages have been
  *  copied into an output buffer.
  * Callback function resumes the guest & the device model,
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22a..2bc261b 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -198,12 +198,11 @@ struct xc_sr_context
 /* Further debugging information in the stream. */
 bool debug;
 
-/* Parameters for tweaking live migration. */
-unsigned max_iterations;
-unsigned dirty_threshold;
-
 unsigned long p2m_size;
 
+struct precopy_stats stats;
+int policy_decision;
+
 xen_pfn_t *batch_pfns;
 unsigned nr_batch_pfns;
 unsigned long *deferred_pages;
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 1e7502d..03dfa61 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -452,8 +452,7 @@ static int update_progress_string(struct xc_sr_context *ctx,
 xc_interface *xch = ctx->xch;
 char *new_str = NULL;
 
-if ( asprintf(_str, "Frames iteration %u of %u",
-  iter, ctx->save.max_iterations) == -1 )
+if ( asprintf(_str, "Frames iteration %u", iter) == -1 )
 {
 PERROR("Unable to allocate new progress string");
 return -1;
@@ -467,6 +466,25 @@ static int update_progress_string(struct xc_sr_context 
*ctx,
 }
 
 /*
+ * This is the live migration precopy policy - it's called periodically during
+ * the precopy phase of live migrations, and is responsible for deciding whe

[Xen-devel] [PATCH 0/2] Introduce migration precopy policy

2017-09-14 Thread Jennifer Herbert
Hi all,

Here I present a updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product,
and hence am trying to get this subset  pulled in now. I also
hope this will aid Josh in pushing the remainder of his series.

Here I present two patches, the first which does some tidy up, removing
unused and unhelpful paramaters to xc_domain_save(), and the second
which allows a precopy callback to be specified, providing the
test for when to end the live phase of migration should end.
If none is provided, a default policy of the current behaviour
is used.

Cheers,

Jennifer Herbert (2):
  Tidy libxc xc_domain_save
  Introduce migration precopy policy

 tools/libxc/include/xenguest.h   |  23 +++-
 tools/libxc/xc_nomigrate.c   |   3 +-
 tools/libxc/xc_sr_common.h   |   7 ++-
 tools/libxc/xc_sr_save.c | 110 ++-
 tools/libxl/libxl_save_callout.c |   4 +-
 tools/libxl/libxl_save_helper.c  |   7 +--
 6 files changed, 104 insertions(+), 50 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 0/2] Introduce migration precopy policy

2017-09-14 Thread Jennifer Herbert
Hi all,

Here I present a updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product,
and hence am trying to get this subset  pulled in now. I also
hope this will aid Josh in pushing the remainder of his series.

Here I present two patches, the first which does some tidy up, removing
unused and unhelpful paramaters to xc_domain_save(), and the second
which allows a precopy callback to be specified, providing the
test for when to end the live phase of migration should end.
If none is provided, a default policy of the current behaviour
is used.

Cheers,

Jennifer Herbert (2):
  Tidy libxc xc_domain_save
  Introduce migration precopy policy

 tools/libxc/include/xenguest.h   |  23 +++-
 tools/libxc/xc_nomigrate.c   |   3 +-
 tools/libxc/xc_sr_common.h   |   7 ++-
 tools/libxc/xc_sr_save.c | 110 ++-
 tools/libxl/libxl_save_callout.c |   4 +-
 tools/libxl/libxl_save_helper.c  |   7 +--
 6 files changed, 104 insertions(+), 50 deletions(-)

-- 
1.8.3.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/2] Introduce migration precopy policy

2017-09-14 Thread Jennifer Herbert
This Patch allows a migration precopy policy to be specified.

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
  mechanism of the precopy stage
- it can't take into account facts external to the immediate
  migration context, such external state transfer state, interactive
  user input, or the passage of wall-clock time.
- it does not permit the user to change their mind, over time, about
  what to do at the end of the precopy (they get an unconditional
  transition into the stop-and-copy phase of the migration)

To permit callers to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
  callbacks
- during the precopy phase of live migrations, consult this policy after
  each batch of pages transmitted and take the dictated action, which
  may be to a) abort the migration entirely, b) continue with the
  precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy, used when
  precopy_policy callback  is not provided.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>

---

This is updated/modified subset of patch 7/20, part of
Joshua Otto's "Add postcopy live migration support." patch,
dated 27th March 2017.  As indicated on the original thread,
I wish to make use of this this within the XenServer product.
I hope this will aid Josh in pushing the remainder of his series.
---
 tools/libxc/include/xenguest.h |  19 
 tools/libxc/xc_sr_common.h |   7 ++-
 tools/libxc/xc_sr_save.c   | 102 +
 3 files changed, 94 insertions(+), 34 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 6626f0c..d5908dc 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -39,6 +39,14 @@
  */
 struct xenevtchn_handle;
 
+/* For save's precopy_policy(). */
+struct precopy_stats
+{
+unsigned iteration;
+unsigned total_written;
+long dirty_count; /* -1 if unknown */
+};
+
 /* callbacks provided by xc_domain_save */
 struct save_callbacks {
 /* Called after expiration of checkpoint interval,
@@ -46,6 +54,17 @@ struct save_callbacks {
  */
 int (*suspend)(void* data);
 
+/* Called after every batch of page data sent during the precopy phase of a
+ * live migration to ask the caller what to do next based on the current
+ * state of the precopy migration.
+ */
+#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely and
+* tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit the
+* remaining dirty pages. */
+int (*precopy_policy)(struct precopy_stats stats, void *data);
+
 /* Called after the guest's dirty pages have been
  *  copied into an output buffer.
  * Callback function resumes the guest & the device model,
diff --git a/tools/libxc/xc_sr_common.h b/tools/libxc/xc_sr_common.h
index a83f22a..2bc261b 100644
--- a/tools/libxc/xc_sr_common.h
+++ b/tools/libxc/xc_sr_common.h
@@ -198,12 +198,11 @@ struct xc_sr_context
 /* Further debugging information in the stream. */
 bool debug;
 
-/* Parameters for tweaking live migration. */
-unsigned max_iterations;
-unsigned dirty_threshold;
-
 unsigned long p2m_size;
 
+struct precopy_stats stats;
+int policy_decision;
+
 xen_pfn_t *batch_pfns;
 unsigned nr_batch_pfns;
 unsigned long *deferred_pages;
diff --git a/tools/libxc/xc_sr_save.c b/tools/libxc/xc_sr_save.c
index 1e7502d..03dfa61 100644
--- a/tools/libxc/xc_sr_save.c
+++ b/tools/libxc/xc_sr_save.c
@@ -452,8 +452,7 @@ static int update_progress_string(struct xc_sr_context *ctx,
 xc_interface *xch = ctx->xch;
 char *new_str = NULL;
 
-if ( asprintf(_str, "Frames iteration %u of %u",
-  iter, ctx->save.max_iterations) == -1 )
+if ( asprintf(_str, "Frames iteration %u", iter) == -1 )
 {
 PERROR("Unable to allocate new progress string");
 return -1;
@@ -467,6 +466,25 @@ static int update_progress_string(struct xc_sr_context 
*ctx,
 }
 
 /*
+ * This is the live migration precopy policy - it's called periodically during
+ * the precopy phase of live migrations, and is responsible for deciding whe

[Xen-devel] QEMU XenServer/XenProject Working group meeting 10th May 2017

2017-06-05 Thread Jennifer Herbert

QEMU XenServer/XenProject Working group meeting 5th May 2017
=

Attendees:
* Paul Durrant
* Andrew Cooper
* Ian Jackson
* Jenny Herbert
* Igor Druzhinin
* Simon Crow
* Marcus Granado

Reviewed previous action points

* Paul to carry on with Xen device model work. - Done.
* Ian - No progress with re factoring libxl split, XenStore restrictions
  or using and FD for QMP cd insert and emulated hotplug, due to feature
  freeze on 4.8.9.
* Andrew to look over Ian's patch series to see how bad extra XenStore
  permission would be.  - No.
* Jenny – XenServer is continuing to make slow progress. We have XenCentre
  patch that means it can talk to QEMU trad, working on some bugs.  Various
  other small bit of work done.

Andrew to double check dm-op sub ops for continuation support.

XenServer uses XenStore to communicate the VNC QEMU clipboard to the
guest and back.  It was concluded this isnt nice, as has caused
veriose security problems in the past.  New plan is to implement this
using a shared page.

Next XenStore restriction was discussed. Including an approach
involving using a shared ring buffer and removing the previous
xs-restrict.  Instead implementing an xs restriction prefixing command
within the xenbus command.  Slightly fiddly, but small number of
parts.  Its doable and not controversial.

Andrew had a diffident plan, to remove any use of XenStore from QEMU –
specifically for the hvm case.  Currently little us in the hvm case.
Two uses should be trivial to implement in QMP, and the other is
physmap, which Igor has been working on.

Its agreed the phymap key needs removing, and that removing the last
few uses of XenStore would be a good idea, however, if it seemed that
dealing with the physmap issue where to take a long time, the
xs-ristrict could be used as an intermidiate plan, which would allow
this project to move on while physmap or any other complications are
being sorted out.

Andrew is not convinced that allowing a guest to fiddle with the
phymap keys (as wolud have to be permitted under the xs-restict
scheme) would be safe.  Certainly the guest could destroy itself,
which in itself is permit-able, but as it changes the interval virtual
memory layout for QEMU, there might be some exploit hidden there –
this would need to be checked.

The conversation turns to the physmap keys, being the biggest XenStore
concern.  Andrew suggests that the correct way to fix it in a
compatible way would be if grant map foreign took in a pointer that
would be nmaped. The pointer would allow the range to be mapped
exactly where it pointed.  A new library call to xen foreign map could
be written, but the complication of needing compatibility for QEMU
with older versions of Xen was raised.

Ian suggested that this wasn’t really a problem, since we could do the
old thing with the old versions.  Libxl already tests for the version
of QEMU, and so if it finds this too old, it does the phymap keys
thing, otherwise, it can use QEMU-depriv including the the new physmap
mechanism.

The new physmap mechanism would not require a substitute for the
physmap keys, as QEMU already has all the information it needs.  The
keys where entirely to allow two parts of startup logic in QEMU to be
reversed.  Andrew  says that Igor has a solution but doestn think its
upstreamable.

Ian suggests that given the size of the xs-restrict patch queue, if
can fix the physmap issue relatively easily, we should, and then he
could drop most of the xs-restict patchque.  Ian offers to help with
the libxl part of the phymap work.

Igor explains an approach he tried hit a block of QXL, which
initialised certain registers and write then into various memory
locations.  He tried in each place to insert an  'if Xen and
runstate=resuming', doent touch.   But the fix is very intrusive in
terms of QXL code, and so he didn’t try to upstream it.

The idea of using helper functions was discussed.  Other helper
functions have been used before for similar problems.  A function
could be created to access vram, instead of just writing to a pointer.
All the access parts of QXL code could eb redirected though this
helper function, and that would be likely ustreamable.  In particular,
its been suggested before that helper function could be used for range
checking.  They could be added for range checking, and then also
modified for with the 'if xen and resuming' clauses.

Igor explains how another approach he had been looking at was to
change the order of QEMU startup, and move the memory map, and
actually map the memory where QEMU expects it to be mapped.  Here
again, there was the issue of compatibility.  The compact layer is
still needed to work with old versions of libxc.  It was discussed how
it would have QEMU would have to be able to work with both skews.  It
could be decided at compile time, as you at this point if you have the
new function call.  It the new function call is not available, it
would 

Re: [Xen-devel] [PATCH v8 for-4.9 1/5] hvm/dmop: Box dmop_args rather than passing multiple parameters around

2017-04-21 Thread Jennifer Herbert

Hi Julien,

This is extending an existing feature.
Once 4.9 is released, the existing feature will be frozen, and the only 
way to later get the
extra functionality would be to created a completely new dm_op, which 
does something very similar
to an existing one.  Although not the end of the world, this wouldnt 
look so nice.


The benefits of the feature are that a VM can request multiple extents 
to be marked as modified at once,
without having to loop though them, calling the existing call many many 
times.  This will be more efficient and faster.
As an extra, additional accessors have been created for dm_op 
operations, which new dm_ops can take advantage of.


The benefits of introducing the feature for 4.9 as opposed to later is 
that we wont' have to support the same feature, with multiple dm_opts 
with varying parameters - which as well as looking less good, also 
unnesseserily bloats the code.


I think risks are low, with a minor, affecting dm_op operations only.  
The core change, in 5/5 will only affect the modified memory call, which 
has been tested.  The remaining patches are to tidy up and fix existing 
behaviour.


-jenny

On 21/04/17 15:17, Julien Grall wrote:

Hi Jennifer,

I don't see any cover letter for this series, so I will answer here.

Looking at the code, it looks like a new feature rather than a bug 
fix. Am I right?


Could you explain what would be the benefits and risks to get this 
code in the release?


I also like to hear the opinion of the x86 maintainers about getting 
this code in Xen 4.9.


Cheers,

On 21/04/17 15:05, jennifer.herb...@citrix.com wrote:

From: Jennifer Herbert <jennifer.herb...@citrix.com>

No functional change.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Reviewed-by: Jan Beulich <jbeul...@suse.com>
Reviewed-by: Paul Durrant <paul.durr...@citrix.com>

--
CC: Paul Durrant <paul.durr...@citrix.com>
CC: Andrew Cooper <andrew.coop...@citrix.com>
CC: Jan Beulich <jbeul...@suse.com>
CC: Julien Grall <julien.gr...@arm.com>
---
No change.
---
 xen/arch/x86/hvm/dm.c | 49 
+

 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index d72b7bd..e583e41 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -25,6 +25,13 @@

 #include 

+struct dmop_args {
+domid_t domid;
+unsigned int nr_bufs;
+/* Reserve enough buf elements for all current hypercalls. */
+struct xen_dm_op_buf buf[2];
+};
+
 static bool copy_buf_from_guest(const xen_dm_op_buf_t bufs[],
 unsigned int nr_bufs, void *dst,
 unsigned int idx, size_t dst_size)
@@ -56,7 +63,7 @@ static bool copy_buf_to_guest(const xen_dm_op_buf_t 
bufs[],

 }

 static int track_dirty_vram(struct domain *d, xen_pfn_t first_pfn,
-unsigned int nr, struct xen_dm_op_buf *buf)
+unsigned int nr, const struct 
xen_dm_op_buf *buf)

 {
 if ( nr > (GB(1) >> PAGE_SHIFT) )
 return -EINVAL;
@@ -287,16 +294,14 @@ static int inject_event(struct domain *d,
 return 0;
 }

-static int dm_op(domid_t domid,
- unsigned int nr_bufs,
- xen_dm_op_buf_t bufs[])
+static int dm_op(const struct dmop_args *op_args)
 {
 struct domain *d;
 struct xen_dm_op op;
 bool const_op = true;
 long rc;

-rc = rcu_lock_remote_domain_by_id(domid, );
+rc = rcu_lock_remote_domain_by_id(op_args->domid, );
 if ( rc )
 return rc;

@@ -307,7 +312,7 @@ static int dm_op(domid_t domid,
 if ( rc )
 goto out;

-if ( !copy_buf_from_guest(bufs, nr_bufs, , 0, sizeof(op)) )
+if ( !copy_buf_from_guest(_args->buf[0], op_args->nr_bufs, 
, 0, sizeof(op)) )

 {
 rc = -EFAULT;
 goto out;
@@ -466,10 +471,10 @@ static int dm_op(domid_t domid,
 if ( data->pad )
 break;

-if ( nr_bufs < 2 )
+if ( op_args->nr_bufs < 2 )
 break;

-rc = track_dirty_vram(d, data->first_pfn, data->nr, [1]);
+rc = track_dirty_vram(d, data->first_pfn, data->nr, 
_args->buf[1]);

 break;
 }

@@ -564,7 +569,7 @@ static int dm_op(domid_t domid,

 if ( (!rc || rc == -ERESTART) &&
  !const_op &&
- !copy_buf_to_guest(bufs, nr_bufs, 0, , sizeof(op)) )
+ !copy_buf_to_guest(_args->buf[0], op_args->nr_bufs, 0, 
, sizeof(op)) )

 rc = -EFAULT;

  out:
@@ -587,20 +592,21 @@ CHECK_dm_op_set_mem_type;
 CHECK_dm_op_inject_event;
 CHECK_dm_op_inject_msi;

-#define MAX_NR_BUFS 2
-
 int compat_dm_op(domid_t domid,
  unsigned int nr_bufs,
  XEN_GUEST_HANDLE_PARAM(void) bufs)
 {
-struct xen_dm_op_buf nat[MAX_NR_

Re: [Xen-devel] [PATCH 2/4] hvm/dmop: Implement copy_{to, from}_guest_buf() in terms of raw accessors

2017-04-21 Thread Jennifer Herbert



On 21/04/17 11:38, Jan Beulich wrote:

On 21.04.17 at 12:11, <jennifer.herb...@citrix.com> wrote:

On 21/04/17 10:46, Jan Beulich wrote:

On 21.04.17 at 11:11, <andrew.coop...@citrix.com> wrote:

On 21/04/2017 09:54, Andrew Cooper wrote:

On 21/04/2017 08:27, Jan Beulich wrote:

On 20.04.17 at 19:59, <jennifer.herb...@citrix.com> wrote:

From: Jennifer Herbert <jennifer.herb...@citrix.com>

Is this correct, considering that iirc the patch was new in v5 and ...


This also allows the usual cases to be simplified, by omitting an
unnecessary
buf parameters, and because the macros can appropriately size the object.

This makes copying to or from a buf that isn't big enough an error.
If the buffer isnt big enough, trying to carry on regardless
can only cause trouble later on.

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>

... this sequence of S-o-b-s?


--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -32,36 +32,47 @@ struct dmop_args {
   struct xen_dm_op_buf buf[2];
   };
   
-static bool copy_buf_from_guest(const xen_dm_op_buf_t bufs[],

-unsigned int nr_bufs, void *dst,
-unsigned int idx, size_t dst_size)
+static bool _raw_copy_from_guest_buf(void *dst,
+ const struct dmop_args *args,
+ unsigned int buf_idx,
+ size_t dst_bytes)
   {
-size_t size;
+size_t buf_bytes;
   
-if ( idx >= nr_bufs )

+if ( buf_idx >= args->nr_bufs )
   return false;
   
-memset(dst, 0, dst_size);

+buf_bytes =  args->buf[buf_idx].size;
   
-size = min_t(size_t, dst_size, bufs[idx].size);

+if ( dst_bytes > buf_bytes )
+return false;

While this behavioral change is now being mentioned in the
description, I'm not sure I buy the argument of basically being
guaranteed to cause trouble down the road. Did you consider the
forward compatibility aspect here, allowing us to extend interface
structures by adding fields to their ends without breaking old
callers? Paul, what are your thoughts here?

DMOP is a stable ABI.  There is no legal extending of any objects.

The previous semantics are guaranteed to break the ABI with future
subops, which is why I removed it.  In the case that the guest
originally passed an overly long buffer, and someone tried be "clever"
here passing the same object here expecting _raw_copy_from_guest_buf()
to DTRT, the function will end up copying too much data from the guest,
and you will end up with something the guest wasn't intending to be part
of the structure replacing the zero extension.

New subops which take a longer object should use a brand new object.

$MAGIC compatibility logic like you want has no business living in the
copy helper.  Had I spotted the intention during the original dmop
series, I would have rejected it during review.

Actually, the existing behaviour is already broken.  If a guest passes
an overly short buf[0], the dmop logic won't get a failure, and instead
get a truncated structure which has been zero extended.

This is very definitely the wrong thing to do, because such a truncated
structure might actually contain some legitimate operations.

So what, if that's what the caller meant to happen?

Considering this is a controversial change, I think it is a bad idea
to merge this into the otherwise uncontroversial change here.

How about we move the min(..) and memset out of the helper function, and
into dm_op?

Wouldn't that result in different buffers being treated differently,
which I'd rather not see happening?


I think in the general case, its wrong to assume that its ok to truncate 
a buffer.
In specific cases, such as buf[0], we don't really know how big the buf 
needs to be until we start parsing it. In this case, we want to say 
'give me what you have, upto this maximum'.  This is the special case, 
not the general one, an as such does not belong in the general case 
accesser helper.
Certainly when we get to updating the accesser helper to deal with 
offsets, its get more confused.  If we give it an offset greater then 
the size of the buffer, do we just memset?  That would seem wrong.  With 
an offset of 0, we want it to do the memset thing, to deal with buf[0].  
But what about if we have an offset.  I'd think the calling function 
would really want to know if its got nonsense.


Another solution would be to have two variants of the helper function. A 
'normal' one, and the best effort one.


-jenny


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/4] hvm/dmop: Implement copy_{to, from}_guest_buf() in terms of raw accessors

2017-04-21 Thread Jennifer Herbert



On 21/04/17 10:46, Jan Beulich wrote:

On 21.04.17 at 11:11, <andrew.coop...@citrix.com> wrote:

On 21/04/2017 09:54, Andrew Cooper wrote:

On 21/04/2017 08:27, Jan Beulich wrote:

On 20.04.17 at 19:59, <jennifer.herb...@citrix.com> wrote:

From: Jennifer Herbert <jennifer.herb...@citrix.com>

Is this correct, considering that iirc the patch was new in v5 and ...


This also allows the usual cases to be simplified, by omitting an
unnecessary
buf parameters, and because the macros can appropriately size the object.

This makes copying to or from a buf that isn't big enough an error.
If the buffer isnt big enough, trying to carry on regardless
can only cause trouble later on.

Signed-off-by: Andrew Cooper <andrew.coop...@citrix.com>
Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>

... this sequence of S-o-b-s?


--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -32,36 +32,47 @@ struct dmop_args {
  struct xen_dm_op_buf buf[2];
  };
  
-static bool copy_buf_from_guest(const xen_dm_op_buf_t bufs[],

-unsigned int nr_bufs, void *dst,
-unsigned int idx, size_t dst_size)
+static bool _raw_copy_from_guest_buf(void *dst,
+ const struct dmop_args *args,
+ unsigned int buf_idx,
+ size_t dst_bytes)
  {
-size_t size;
+size_t buf_bytes;
  
-if ( idx >= nr_bufs )

+if ( buf_idx >= args->nr_bufs )
  return false;
  
-memset(dst, 0, dst_size);

+buf_bytes =  args->buf[buf_idx].size;
  
-size = min_t(size_t, dst_size, bufs[idx].size);

+if ( dst_bytes > buf_bytes )
+return false;

While this behavioral change is now being mentioned in the
description, I'm not sure I buy the argument of basically being
guaranteed to cause trouble down the road. Did you consider the
forward compatibility aspect here, allowing us to extend interface
structures by adding fields to their ends without breaking old
callers? Paul, what are your thoughts here?

DMOP is a stable ABI.  There is no legal extending of any objects.

The previous semantics are guaranteed to break the ABI with future
subops, which is why I removed it.  In the case that the guest
originally passed an overly long buffer, and someone tried be "clever"
here passing the same object here expecting _raw_copy_from_guest_buf()
to DTRT, the function will end up copying too much data from the guest,
and you will end up with something the guest wasn't intending to be part
of the structure replacing the zero extension.

New subops which take a longer object should use a brand new object.

$MAGIC compatibility logic like you want has no business living in the
copy helper.  Had I spotted the intention during the original dmop
series, I would have rejected it during review.

Actually, the existing behaviour is already broken.  If a guest passes
an overly short buf[0], the dmop logic won't get a failure, and instead
get a truncated structure which has been zero extended.

This is very definitely the wrong thing to do, because such a truncated
structure might actually contain some legitimate operations.

So what, if that's what the caller meant to happen?

Considering this is a controversial change, I think it is a bad idea
to merge this into the otherwise uncontroversial change here.


How about we move the min(..) and memset out of the helper function, and 
into dm_op?



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 for-4.9 1/4] hvm/dmop: Box dmop_bufs rather than passing two parameters around

2017-04-10 Thread Jennifer Herbert


On 10/04/17 10:40, Paul Durrant wrote:

-Original Message-
From: Andrew Cooper
Sent: 10 April 2017 10:29
To: Paul Durrant <paul.durr...@citrix.com>; Xen-devel 
Cc: Jennifer Herbert <jennifer.herb...@citrix.com>; Jan Beulich
<jbeul...@suse.com>; Julien Grall <julien.gr...@arm.com>
Subject: Re: [PATCH v5 for-4.9 1/4] hvm/dmop: Box dmop_bufs rather than
passing two parameters around

On 10/04/17 10:04, Paul Durrant wrote:

-Original Message-
From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
Sent: 07 April 2017 20:36
To: Xen-devel <xen-devel@lists.xen.org>
Cc: Jennifer Herbert <jennifer.herb...@citrix.com>; Andrew Cooper
<andrew.coop...@citrix.com>; Jan Beulich <jbeul...@suse.com>; Paul
Durrant <paul.durr...@citrix.com>; Julien Grall <julien.gr...@arm.com>
Subject: [PATCH v5 for-4.9 1/4] hvm/dmop: Box dmop_bufs rather than
passing two parameters around

From: Jennifer Herbert <jennifer.herb...@citrix.com>

No functional change.


Why is this a good thing? Passing two parameters around allowed for them

to be in registers. I preferred the code as it was before.

a) It will always be inlined, so registers aren't relevant.

Why? I see nothing forcing the compiler to make it so.


  Even if
they were, all values are available directly with the pointer as a base,
so there is no reduction in expressiveness.  (i.e. the previous code
only increases register pressure).
b) passing multiple parameters like that is a recipe for mistakes, and
in this case, mistakes mean security vulnerabilities.

Given the locality of the code I don't buy that as an argument unless you're 
going to assert that passing more than one parameter is always wrong.


To two parameters should go everywhere together - they work together, 
and should never be separated.  But by having them separate, you end up 
forcing people to think about why do I need this nr_buffs, even though 
in many situations it looks superfluous. Its mostly there as a check, 
and not part of the core flow of how the code works.

It just confuses and clutters things to have it separate.

If its left separate, its possible someone will feed in a different, but 
similar variable for this parameter, and its purpose defeated.



c) It is necessary to make legible code for the later changes in this
series.


 From my reading I don't think it would have an overly negative effect on the 
subsequent patches if this patch were dropped.


No, but since we're having this debate, we clearly care about small 
negative effects too.


-jenny

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 07/20] migration: defer precopy policy to libxl

2017-03-29 Thread Jennifer Herbert

Hi,

I would like to encourage this patch - as I have use for it outside
of your postcopy work.

Some things people will comment on:
You've used 'unsigned' without the int keyword, which people don't like.
Also on line 324, your missing space between 'if (' and 
'ctx->save.policy_decision'.


Also, I'm not a fan of your CONSULT_POLICY macro, which you've defined at
a odd point in your function, and I think could be done more elegantly.
Worst of all ... its a macro - which I think should generally be avoided 
unless
there is little choice.   I'm sure you could write a helper function to 
replace this.


Cheers,

-jenny

On 27/03/17 10:06, Joshua Otto wrote:

The precopy phase of the xc_domain_save() live migration algorithm has
historically been implemented to run until either a) (almost) no pages
are dirty or b) some fixed, hard-coded maximum number of precopy
iterations has been exceeded.  This policy and its implementation are
less than ideal for a few reasons:
- the logic of the policy is intertwined with the control flow of the
   mechanism of the precopy stage
- it can't take into account facts external to the immediate
   migration context, such as interactive user input or the passage of
   wall-clock time
- it does not permit the user to change their mind, over time, about
   what to do at the end of the precopy (they get an unconditional
   transition into the stop-and-copy phase of the migration)

To permit users to implement arbitrary higher-level policies governing
when the live migration precopy phase should end, and what should be
done next:
- add a precopy_policy() callback to the xc_domain_save() user-supplied
   callbacks
- during the precopy phase of live migrations, consult this policy after
   each batch of pages transmitted and take the dictated action, which
   may be to a) abort the migration entirely, b) continue with the
   precopy, or c) proceed to the stop-and-copy phase.
- provide an implementation of the old policy as such a callback in
   libxl and plumb it through the IPC machinery to libxc, effectively
   maintaing the old policy for now

Signed-off-by: Joshua Otto 
---
  tools/libxc/include/xenguest.h |  23 -
  tools/libxc/xc_nomigrate.c |   3 +-
  tools/libxc/xc_sr_common.h |   7 +-
  tools/libxc/xc_sr_save.c   | 194 ++---
  tools/libxl/libxl_dom_save.c   |  20 
  tools/libxl/libxl_save_callout.c   |   3 +-
  tools/libxl/libxl_save_helper.c|   7 +-
  tools/libxl/libxl_save_msgs_gen.pl |   4 +-
  8 files changed, 189 insertions(+), 72 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index aa8cc8b..30ffb6f 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -39,6 +39,14 @@
   */
  struct xenevtchn_handle;
  
+/* For save's precopy_policy(). */

+struct precopy_stats
+{
+unsigned iteration;
+unsigned total_written;
+long dirty_count; /* -1 if unknown */
+};
+
  /* callbacks provided by xc_domain_save */
  struct save_callbacks {
  /* Called after expiration of checkpoint interval,
@@ -46,6 +54,17 @@ struct save_callbacks {
   */
  int (*suspend)(void* data);
  
+/* Called after every batch of page data sent during the precopy phase of a

+ * live migration to ask the caller what to do next based on the current
+ * state of the precopy migration.
+ */
+#define XGS_POLICY_ABORT  (-1) /* Abandon the migration entirely and
+* tidy up. */
+#define XGS_POLICY_CONTINUE_PRECOPY 0  /* Remain in the precopy phase. */
+#define XGS_POLICY_STOP_AND_COPY1  /* Immediately suspend and transmit the
+* remaining dirty pages. */
+int (*precopy_policy)(struct precopy_stats stats, void *data);
+
  /* Called after the guest's dirty pages have been
   *  copied into an output buffer.
   * Callback function resumes the guest & the device model,
@@ -100,8 +119,8 @@ typedef enum {
   *doesn't use checkpointing
   * @return 0 on success, -1 on failure
   */
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t 
max_iters,
-   uint32_t max_factor, uint32_t flags /* XCFLAGS_xxx */,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom,
+   uint32_t flags /* XCFLAGS_xxx */,
 struct save_callbacks* callbacks, int hvm,
 xc_migration_stream_t stream_type, int recv_fd);
  
diff --git a/tools/libxc/xc_nomigrate.c b/tools/libxc/xc_nomigrate.c

index 15c838f..2af64e4 100644
--- a/tools/libxc/xc_nomigrate.c
+++ b/tools/libxc/xc_nomigrate.c
@@ -20,8 +20,7 @@
  #include 
  #include 
  
-int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iters,

-   uint32_t max_factor, uint32_t flags,
+int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, 

[Xen-devel] [PATCH v4] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-29 Thread Jennifer Herbert
From: Jennifer Herbert <jennifer.herb...@citrix.com>

This new lib devicemodel call allows multiple extents of pages to be
marked as modified in a single call.  This is something needed for a
usecase I'm working on.

The xen side of the modified_memory call has been modified to accept
an array of extents.  The devicemodle library either provides an array
of length 1, to support the original library function, or a second
function allows an array to be provided.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
---
Changes as discussed on V3.

Cc: Jan Beulich <jbeul...@suse.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Paul Durrant <paul.durr...@citrix.com>
---
 tools/libs/devicemodel/core.c   |   30 --
 tools/libs/devicemodel/include/xendevicemodel.h |   19 +++-
 xen/arch/x86/hvm/dm.c   |  124 ---
 xen/include/public/hvm/dm_op.h  |   19 +++-
 4 files changed, 140 insertions(+), 52 deletions(-)

diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index a85cb49..f9e37a5 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -434,22 +434,36 @@ int xendevicemodel_track_dirty_vram(
  dirty_bitmap, (size_t)(nr + 7) / 8);
 }
 
-int xendevicemodel_modified_memory(
-xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
-uint32_t nr)
+int xendevicemodel_modified_memory_bulk(
+xendevicemodel_handle *dmod, domid_t domid,
+struct xen_dm_op_modified_memory_extent *extents, uint32_t nr)
 {
 struct xen_dm_op op;
-struct xen_dm_op_modified_memory *data;
+struct xen_dm_op_modified_memory *header;
+size_t extents_size = nr * sizeof(struct xen_dm_op_modified_memory_extent);
 
 memset(, 0, sizeof(op));
 
 op.op = XEN_DMOP_modified_memory;
-data = _memory;
+header = _memory;
 
-data->first_pfn = first_pfn;
-data->nr = nr;
+header->nr_extents = nr;
+header->pfns_processed = 0;
 
-return xendevicemodel_op(dmod, domid, 1, , sizeof(op));
+return xendevicemodel_op(dmod, domid, 2, , sizeof(op),
+ extents, extents_size);
+}
+
+int xendevicemodel_modified_memory(
+xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
+uint32_t nr)
+{
+struct xen_dm_op_modified_memory_extent extent;
+
+extent.first_pfn = first_pfn;
+extent.nr = nr;
+
+return xendevicemodel_modified_memory_bulk(dmod, domid, , 1);
 }
 
 int xendevicemodel_set_mem_type(
diff --git a/tools/libs/devicemodel/include/xendevicemodel.h 
b/tools/libs/devicemodel/include/xendevicemodel.h
index b3f600e..9c62bf9 100644
--- a/tools/libs/devicemodel/include/xendevicemodel.h
+++ b/tools/libs/devicemodel/include/xendevicemodel.h
@@ -236,8 +236,8 @@ int xendevicemodel_track_dirty_vram(
 uint32_t nr, unsigned long *dirty_bitmap);
 
 /**
- * This function notifies the hypervisor that a set of domain pages
- * have been modified.
+ * This function notifies the hypervisor that a set of contiguous
+ * domain pages have been modified.
  *
  * @parm dmod a handle to an open devicemodel interface.
  * @parm domid the domain id to be serviced
@@ -250,6 +250,21 @@ int xendevicemodel_modified_memory(
 uint32_t nr);
 
 /**
+ * This function notifies the hypervisor that a set of discontiguous
+ * domain pages have been modified.
+ *
+ * @parm dmod a handle to an open devicemodel interface.
+ * @parm domid the domain id to be serviced
+ * @parm extents an array of extent structs, which each hold
+ a start_pfn and nr (number of pfns).
+ * @parm nr the number of extents in the array
+ * @return 0 on success, -1 on failure.
+ */
+int xendevicemodel_modified_memory_bulk(
+xendevicemodel_handle *dmod, domid_t domid,
+struct xen_dm_op_modified_memory_extent extents[], uint32_t nr);
+
+/**
  * This function notifies the hypervisor that a set of domain pages
  * are to be treated in a specific way. (See the definition of
  * hvmmem_type_t).
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 2122c45..c4fa704 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -118,57 +118,108 @@ static int set_isa_irq_level(struct domain *d, uint8_t 
isa_irq,
 return 0;
 }
 
-static int modified_memory(struct domain *d,
-   struct xen_dm_op_modified_memory *data)
+
+int copy_extent_from_guest_array(struct xen_dm_op_modified_memory_extent* 
extent,
+   xen_dm_op_buf_t* buf, unsigned int index)
+
 {
-xen_pfn_t last_pfn = data->first_pfn + data->nr - 1;
-unsigned int iter = 0;
-int rc = 0;
+if ( (buf->size / sizeof(struct xen_dm_op_modified_memory_extent)) <=
+ index )
+return -EINVAL;
 
-if ( (data->fir

Re: [Xen-devel] [PATCH v3] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-29 Thread Jennifer Herbert

On 29/03/17 15:54, Jan Beulich wrote:

On 28.03.17 at 15:18,  wrote:

@@ -441,13 +481,8 @@ static int dm_op(domid_t domid,
   struct xen_dm_op_modified_memory *data =
   _memory;
   
-const_op = false;

-
-rc = -EINVAL;
-if ( data->pad )
-break;
-
-rc = modified_memory(d, data);
+rc = modified_memory(d, data, [1]);
+const_op = (rc != 0);

Isn't this wrong now, i.e. don't you need to copy back the
header now in all cases?

I only define what I'll set nr_extents to in case of error, and of
course opaque
is opaque.

Well, but you do need the opaque value for the continuation,
don't you? In which case you need to also write back on
-ERESTART. And as you say you need to write back in case
of error. So I'd expect

const_op = !rc;



Quite right, see you point now - I didn't notice I'd inverted the logic.

-jenny


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-29 Thread Jennifer Herbert



On 29/03/17 11:38, Jan Beulich wrote:

On 28.03.17 at 15:18,  wrote:
Perhaps drop "already"? Personally I also wouldn't mind you dropping 
the variable altogether and using header->opaque directly, but I 
guess that's too "opaque" for your taste? 


It would make the code too opaque for my taste, so I'll just drop the 
'already' bit.



@@ -441,13 +481,8 @@ static int dm_op(domid_t domid,
  struct xen_dm_op_modified_memory *data =
  _memory;
  
-const_op = false;

-
-rc = -EINVAL;
-if ( data->pad )
-break;
-
-rc = modified_memory(d, data);
+rc = modified_memory(d, data, [1]);
+const_op = (rc != 0);

Isn't this wrong now, i.e. don't you need to copy back the
header now in all cases?


I only define what I'll set nr_extents to in case of error, and of 
course opaque

is opaque.
If I where to write back, I'd be writing back 0 to nr_extents - which 
wouldn’t really
mean anything since I’m not defining the order for which I’m processing 
them.
In fact the only thing it tells you is that extent 0 is the last one 
processed, which

I don't think its all that useful.

Ideally I'd prefer to leave it untouched on success, but the original 
value is lost on

continuation, this would be more involved.

By only writing back on error, I hoped to improve efficiency for the 
common case,
(especially for existing use with calls of one extent).  (I know its 
only a small difference)
If you want me to write back - what do you want me to write back for 
success?



Cheers,

-jenny


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-28 Thread Jennifer Herbert
From: Jennifer Herbert <jennifer.herb...@citrix.com>

This new lib devicemodel call allows multiple extents of pages to be
marked as modified in a single call.  This is something needed for a
usecase I'm working on.

The xen side of the modified_memory call has been modified to accept
an array of extents.  The devicemodle library either provides an array
of length 1, to support the original library function, or a second
function allows an array to be provided.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Paul Durrant <paul.durr...@citrix.com>
---
Changes as discussed on thread.

 tools/libs/devicemodel/core.c   |   30 --
 tools/libs/devicemodel/include/xendevicemodel.h |   19 +++-
 xen/arch/x86/hvm/dm.c   |  115 +++
 xen/include/public/hvm/dm_op.h  |   22 -
 4 files changed, 133 insertions(+), 53 deletions(-)

diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index a85cb49..f9e37a5 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -434,22 +434,36 @@ int xendevicemodel_track_dirty_vram(
  dirty_bitmap, (size_t)(nr + 7) / 8);
 }
 
-int xendevicemodel_modified_memory(
-xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
-uint32_t nr)
+int xendevicemodel_modified_memory_bulk(
+xendevicemodel_handle *dmod, domid_t domid,
+struct xen_dm_op_modified_memory_extent *extents, uint32_t nr)
 {
 struct xen_dm_op op;
-struct xen_dm_op_modified_memory *data;
+struct xen_dm_op_modified_memory *header;
+size_t extents_size = nr * sizeof(struct xen_dm_op_modified_memory_extent);
 
 memset(, 0, sizeof(op));
 
 op.op = XEN_DMOP_modified_memory;
-data = _memory;
+header = _memory;
 
-data->first_pfn = first_pfn;
-data->nr = nr;
+header->nr_extents = nr;
+header->pfns_processed = 0;
 
-return xendevicemodel_op(dmod, domid, 1, , sizeof(op));
+return xendevicemodel_op(dmod, domid, 2, , sizeof(op),
+ extents, extents_size);
+}
+
+int xendevicemodel_modified_memory(
+xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
+uint32_t nr)
+{
+struct xen_dm_op_modified_memory_extent extent;
+
+extent.first_pfn = first_pfn;
+extent.nr = nr;
+
+return xendevicemodel_modified_memory_bulk(dmod, domid, , 1);
 }
 
 int xendevicemodel_set_mem_type(
diff --git a/tools/libs/devicemodel/include/xendevicemodel.h 
b/tools/libs/devicemodel/include/xendevicemodel.h
index b3f600e..9c62bf9 100644
--- a/tools/libs/devicemodel/include/xendevicemodel.h
+++ b/tools/libs/devicemodel/include/xendevicemodel.h
@@ -236,8 +236,8 @@ int xendevicemodel_track_dirty_vram(
 uint32_t nr, unsigned long *dirty_bitmap);
 
 /**
- * This function notifies the hypervisor that a set of domain pages
- * have been modified.
+ * This function notifies the hypervisor that a set of contiguous
+ * domain pages have been modified.
  *
  * @parm dmod a handle to an open devicemodel interface.
  * @parm domid the domain id to be serviced
@@ -250,6 +250,21 @@ int xendevicemodel_modified_memory(
 uint32_t nr);
 
 /**
+ * This function notifies the hypervisor that a set of discontiguous
+ * domain pages have been modified.
+ *
+ * @parm dmod a handle to an open devicemodel interface.
+ * @parm domid the domain id to be serviced
+ * @parm extents an array of extent structs, which each hold
+ a start_pfn and nr (number of pfns).
+ * @parm nr the number of extents in the array
+ * @return 0 on success, -1 on failure.
+ */
+int xendevicemodel_modified_memory_bulk(
+xendevicemodel_handle *dmod, domid_t domid,
+struct xen_dm_op_modified_memory_extent extents[], uint32_t nr);
+
+/**
  * This function notifies the hypervisor that a set of domain pages
  * are to be treated in a specific way. (See the definition of
  * hvmmem_type_t).
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 2122c45..b5031ce 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -119,56 +119,96 @@ static int set_isa_irq_level(struct domain *d, uint8_t 
isa_irq,
 }
 
 static int modified_memory(struct domain *d,
-   struct xen_dm_op_modified_memory *data)
+   struct xen_dm_op_modified_memory *header,
+   xen_dm_op_buf_t* buf)
 {
-xen_pfn_t last_pfn = data->first_pfn + data->nr - 1;
-unsigned int iter = 0;
-int rc = 0;
-
-if ( (data->first_pfn > last_pfn) ||
- (last_pfn > domain_get_maximum_gpfn(d)) )
-return -EINVAL;
+/* Process maximum of 256 pfns before checking for continuation */
+const un

Re: [Xen-devel] [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-23 Thread Jennifer Herbert



On 23/03/17 15:58, Jan Beulich wrote:

On 22.03.17 at 20:55,  wrote:
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -119,56 +119,89 @@ static int set_isa_irq_level(struct domain *d, uint8_t



--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -237,13 +237,24 @@ struct xen_dm_op_set_pci_link_route {
 * XEN_DMOP_modified_memory: Notify that a set of pages were modified by
 *   an emulator.
 *
- * NOTE: In the event of a continuation, the @first_pfn is set to the
- *   value of the pfn of the remaining set of pages and @nr reduced
- *   to the size of the remaining set.
+ * DMOP buf 1 contains an array of xen_dm_op_modified_memory_extent with
+ * @nr_extent entries.  The value of  @nr_extent will be reduced on
+ *  continuation.
+ *
+ * @pfns_processed is used for continuation, and not intended to be usefull
+ * to the caller.  It gives the number of pfns already processed (and can
+ * be skipped) in the last extent. This should always be set to zero.
 */
#define XEN_DMOP_modified_memory 11

struct xen_dm_op_modified_memory {

+/* IN - number of extents. */
+uint32_t nr_extents;
+/* IN - should be set to 0, and is updated on continuation. */
+uint32_t pfns_processed;

I'd prefer if the field (if to be kept) wasn't named after its current
purpose, nor should we state anything beyond it needing to be
zero when first invoking the call. These are implementation details
which callers should not rely on.

Assuming we keep it, how about calling it "reserved", with a comment in
dm.c explaining what its actually for?

Elsewhere we use "opaque", but "reserved" is probably fine too.
However, we may want to name it as an OUT value, for the
error-on-extent indication mentioned above (of course another
option would be to make nr_extent IN/OUT). As an OUT, we're
free to use it for whatever intermediate value, just so long as
upon final return to the caller it has the intended value. (It's
debatable whether it should be IN/OUT, due to the need for it
to be set to zero.)

Having an indication of which extent failed seem a sensible idea. We'd
need that
parameter to be initially set to something that can represent none of
the extents,
such that if there is an error before we get to precessing the extents,
this is clear.

I don't think this is a requirement - failure outside of any extent
processing can reasonably be attached to the first extent. The
more that the actual processing of the extent (after reading the
coordinates from guest memory) can't actually fail. With that
observation I'm in fact no longer convinced we really need such
an indication - I simply didn't pay attention to this aspect when
first suggesting it. The more that your backwards processing
would invalidate a common conclusion a caller might draw: Error
on extent N means all extents lower than N were processed
successfully.

So if you wanted to stick to providing this information I'd say
use the opaque (or whatever it's going to be named) field to
provide that status. Switch back to processing extents forwards,
having the opaque field starting out as zero point to the first
extent as the failed one initially. Initial processing during
continuation handling can't fail unless the caller has fiddled with
the hypercall arguments, so I wouldn't see anything wrong with
not providing a reliable error indicator in that case.



I was mostly considering it, from a debugging perspective.  Any failure 
would be due to bad
arguments, which would indicate a serious bug, and trying to continue 
after such a bug

was discovered would seem most unwise.
If I copied back the buffer on event of error, then we could state that 
nr_extent would point
to one after extent that had the error, but say it is not defined which 
if any extents would have

succeeded.  This would be a trivial change, but aid with any debugging.

I can continue to count extents backwards, saving one parameter.
I can then have an opaque, which i say nothing about, other then it 
should be zero.

This i'd use for for pfns_processed.

How does that sound?


-jenny



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-23 Thread Jennifer Herbert

On 23/03/17 08:35, Jan Beulich wrote:

On 22.03.17 at 20:55,  wrote:
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -119,56 +119,89 @@ static int set_isa_irq_level(struct domain *d, uint8_t 
isa_irq,
   }
   
   static int modified_memory(struct domain *d,

-   struct xen_dm_op_modified_memory *data)
+   struct xen_dm_op_modified_memory *header,
+   xen_dm_op_buf_t* buf)
   {
-xen_pfn_t last_pfn = data->first_pfn + data->nr - 1;
-unsigned int iter = 0;
+/* Process maximum of 256 pfns before checking for continuation */
+const unsigned int cont_check_interval = 0xff;

Iirc the question was asked on v1 already: 256 (as the comment
says) or 255 (as the value enforces)?

aw shucks!  I had it as 255, and Paul said "255 or 256?" and I changed
it to 256, without thinking, assuming an off by one error, and that he
was right.  But, with a seconds thought, it should be 255, and Paul  was
actually referring to a comment later in the code, which was 256.

Yet batches of 256 would seem a little more "usual".


Ok, I can set cont_check_interval = 0x100; and leave the comment as 256.


--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -237,13 +237,24 @@ struct xen_dm_op_set_pci_link_route {
* XEN_DMOP_modified_memory: Notify that a set of pages were modified by
*   an emulator.
*
- * NOTE: In the event of a continuation, the @first_pfn is set to the
- *   value of the pfn of the remaining set of pages and @nr reduced
- *   to the size of the remaining set.
+ * DMOP buf 1 contains an array of xen_dm_op_modified_memory_extent with
+ * @nr_extent entries.  The value of  @nr_extent will be reduced on
+ *  continuation.
+ *
+ * @pfns_processed is used for continuation, and not intended to be usefull
+ * to the caller.  It gives the number of pfns already processed (and can
+ * be skipped) in the last extent. This should always be set to zero.
*/
   #define XEN_DMOP_modified_memory 11
   
   struct xen_dm_op_modified_memory {

+/* IN - number of extents. */
+uint32_t nr_extents;
+/* IN - should be set to 0, and is updated on continuation. */
+uint32_t pfns_processed;

I'd prefer if the field (if to be kept) wasn't named after its current
purpose, nor should we state anything beyond it needing to be
zero when first invoking the call. These are implementation details
which callers should not rely on.

Assuming we keep it, how about calling it "reserved", with a comment in
dm.c explaining what its actually for?

Elsewhere we use "opaque", but "reserved" is probably fine too.
However, we may want to name it as an OUT value, for the
error-on-extent indication mentioned above (of course another
option would be to make nr_extent IN/OUT). As an OUT, we're
free to use it for whatever intermediate value, just so long as
upon final return to the caller it has the intended value. (It's
debatable whether it should be IN/OUT, due to the need for it
to be set to zero.)

Jan


Having an indication of which extent failed seem a sensible idea. We'd 
need that
parameter to be initially set to something that can represent none of 
the extents,
such that if there is an error before we get to precessing the extents, 
this is clear.


If we used nr_extents - the caller can check its the same value they 
passed in.
If the value is the same, then the error occurred before processing an 
extent.


If we used  error_on_extent (previously known as pfns_processed), we 
could not
use zero, as it might be the first extent to fail.  In this case, we'd 
need to specify
it was defaulted to 0x.  Should probably define a constant - 
NO_EXTENT.
Internally, we could just invert the value for using as a pfn_processed 
intermediately.


However, both of these ideas suffer a problem, that if a continuation 
fails before processing
the remaining extents, the returned value would be incorrect.  For 
reusing nr_extents, it
would appear to be the previous extent - potentially very confusing.   
Using
error_on_extent would give you a totally different value, although very 
likely, one that

is greater then the number of extents passed in.

Both of these schemes can be solved by only allowing 31 bit number of 
extents and pfns,

and having the fields signed, to make it more obvious.
For both schemes, you could return -extent_nr, where only a negative 
value indicates
and extent number.  This would allow you to have the initial value of 0 
for error_on_extent.


Alternatively, for the error_on_extent scheme, the pfns_processed could 
be stored as a negative,
which would mean that if a negative value got returned it was was no 
specific extent.

The value would need initialised by the caller as -1.

An alternative to using signed values would be to use a combination of 
the two values.
If nr_extents == 0, then error_on_extent hold the extent number. 

Re: [Xen-devel] [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-22 Thread Jennifer Herbert

On 22/03/17 10:33, Jan Beulich wrote:

On 21.03.17 at 14:59,  wrote:

This version of this patch removes the need for the 'offset' parameter,
by instead reducing nr_extents, and working backwards from the end of
the array.

This patch also removes the need to ever write back the passed array of
extents to the guest, but instead putting its partial progress in a
'pfns_processed' parameter, which replaces the old 'offset' parameter.
As with the 'offest' parameter, 'pfns_processed' should be set to 0 on
calling.

So how is the need for 'pfns_processed' better than the prior need
for 'offset'? I continue to not see why you'd need any extra field, as
you can write back to 'first_pfn' and 'nr' just like the old code did. (But
note that I'm not outright objecting to this solution, I'd first like to
understand why it was chosen.)



With v1 of the patch, on continuation, two buffers get written back to 
guest, one
to update 'offset', and one to update first_pfn (in the array). Although 
I can't think of an example
where this would be a problem, I think that semi-randomly altering items 
in the passed
array may not be expected, and if that data was re-used for something, 
could cause a bug.


There is precedence for changing the op buffer,  and using 
pfns_processed means we never have
to change this array, which I think is much cleaner, intuitive, and 
halving the copy backs.


I considered your suggestion of modifying the handle array, but I think 
this would make the code much harder to follow,  and still require data 
written back to the guest - just in a different place. I thought just 
reducing the nr_extents a cleaner solution.


I can't see the problem with having an argument for continuation - the 
xen_dm_op will remain the same size, if we use the argument space or 
not.  If its just about how the API looks, I find this highly subjective 
- its no secret that continuations happen, and data gets stashed in 
these parameters - and so i see no reason why having a dedicated 
parameter for this is a problem.


If we want to hide this continuation information, we could define:

struct xen_dm_op_modified_memory {
/* IN - number of extents. */
uint32_t nr_extents;
}

struct xen_dm_op_modified_memory modified_memory_expanded {
struct xen_dm_op_modified_memory modified_memory;
uint32_t  pfns_processed
}

and include both structures in the xen_dm_op union.
The caller would use xen_dm_op_modified_memory modified_memory, and due 
to the memset(, 0, sizeof(op));, the value for pfns_processed would 
end up as zero.  Xen could use expanded structure, and make use of 
pfns_processed.


Or, we could re-purpose the 'pad' member of struct xen_dm_op, call it 
continuation_data, and have this as a general purpose continuation 
variable, that the caller wouldn't pay any attention to.


What do you think?


--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -119,56 +119,89 @@ static int set_isa_irq_level(struct domain *d, uint8_t 
isa_irq,
  }
  
  static int modified_memory(struct domain *d,

-   struct xen_dm_op_modified_memory *data)
+   struct xen_dm_op_modified_memory *header,
+   xen_dm_op_buf_t* buf)
  {
-xen_pfn_t last_pfn = data->first_pfn + data->nr - 1;
-unsigned int iter = 0;
+/* Process maximum of 256 pfns before checking for continuation */
+const unsigned int cont_check_interval = 0xff;

Iirc the question was asked on v1 already: 256 (as the comment
says) or 255 (as the value enforces)?


aw shucks!  I had it as 255, and Paul said "255 or 256?" and I changed 
it to 256, without thinking, assuming an off by one error, and that he 
was right.  But, with a seconds thought, it should be 255, and Paul  was 
actually referring to a comment later in the code, which was 256.



  int rc = 0;
  if ( copy_from_guest_offset(, buf->h, rem_extents - 1, 1) )
+return -EFAULT;
+
+pfn = extent.first_pfn + header->pfns_processed;
+batch_nr = extent.nr - header->pfns_processed;

Even if this looks to be harmless to the hypervisor, I dislike the
overflows you allow for here.


I'll add a check for (header->pfns_processed > extent.nr), returning 
-EINVAL.



@@ -441,13 +474,8 @@ static int dm_op(domid_t domid,
  struct xen_dm_op_modified_memory *data =
  _memory;
  
-const_op = false;

-
-rc = -EINVAL;
-if ( data->pad )
-break;

Where did this check go?


data->pad is undefined here.  But I could add a check for extent.pad in 
the modified_memory function.



--- a/xen/include/public/hvm/dm_op.h
+++ b/xen/include/public/hvm/dm_op.h
@@ -237,13 +237,24 @@ struct xen_dm_op_set_pci_link_route {
   * XEN_DMOP_modified_memory: Notify that a set of pages were modified by
   *   an emulator.
   *
- * NOTE: In the event of a continuation, the @first_pfn is set to the
- *   value of the 

[Xen-devel] [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-21 Thread Jennifer Herbert
From: Jennifer Herbert <jennifer.herb...@citrix.com>

This new lib devicemodel call allows multiple extents of pages to be
marked as modified in a single call.  This is something needed for a
usecase I'm working on.

The xen side of the modified_memory call has been modified to accept
an array of extents.  The devicemodle library either provides an array
of length 1, to support the original library function, or a second
function allows an array to be provided.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Paul Durrant <paul.durr...@citrix.com>
---

This version of this patch removes the need for the 'offset' parameter,
by instead reducing nr_extents, and working backwards from the end of
the array.

This patch also removes the need to ever write back the passed array of
extents to the guest, but instead putting its partial progress in a 
'pfns_processed' parameter, which replaces the old 'offset' parameter.
As with the 'offest' parameter, 'pfns_processed' should be set to 0 on 
calling.

 tools/libs/devicemodel/core.c   |   30 +--
 tools/libs/devicemodel/include/xendevicemodel.h |   19 +++-
 xen/arch/x86/hvm/dm.c   |  110 ++-
 xen/include/public/hvm/dm_op.h  |   17 +++-
 4 files changed, 122 insertions(+), 54 deletions(-)

diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index a85cb49..f9e37a5 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -434,22 +434,36 @@ int xendevicemodel_track_dirty_vram(
  dirty_bitmap, (size_t)(nr + 7) / 8);
 }
 
-int xendevicemodel_modified_memory(
-xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
-uint32_t nr)
+int xendevicemodel_modified_memory_bulk(
+xendevicemodel_handle *dmod, domid_t domid,
+struct xen_dm_op_modified_memory_extent *extents, uint32_t nr)
 {
 struct xen_dm_op op;
-struct xen_dm_op_modified_memory *data;
+struct xen_dm_op_modified_memory *header;
+size_t extents_size = nr * sizeof(struct xen_dm_op_modified_memory_extent);
 
 memset(, 0, sizeof(op));
 
 op.op = XEN_DMOP_modified_memory;
-data = _memory;
+header = _memory;
 
-data->first_pfn = first_pfn;
-data->nr = nr;
+header->nr_extents = nr;
+header->pfns_processed = 0;
 
-return xendevicemodel_op(dmod, domid, 1, , sizeof(op));
+return xendevicemodel_op(dmod, domid, 2, , sizeof(op),
+ extents, extents_size);
+}
+
+int xendevicemodel_modified_memory(
+xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
+uint32_t nr)
+{
+struct xen_dm_op_modified_memory_extent extent;
+
+extent.first_pfn = first_pfn;
+extent.nr = nr;
+
+return xendevicemodel_modified_memory_bulk(dmod, domid, , 1);
 }
 
 int xendevicemodel_set_mem_type(
diff --git a/tools/libs/devicemodel/include/xendevicemodel.h 
b/tools/libs/devicemodel/include/xendevicemodel.h
index b3f600e..9c62bf9 100644
--- a/tools/libs/devicemodel/include/xendevicemodel.h
+++ b/tools/libs/devicemodel/include/xendevicemodel.h
@@ -236,8 +236,8 @@ int xendevicemodel_track_dirty_vram(
 uint32_t nr, unsigned long *dirty_bitmap);
 
 /**
- * This function notifies the hypervisor that a set of domain pages
- * have been modified.
+ * This function notifies the hypervisor that a set of contiguous
+ * domain pages have been modified.
  *
  * @parm dmod a handle to an open devicemodel interface.
  * @parm domid the domain id to be serviced
@@ -250,6 +250,21 @@ int xendevicemodel_modified_memory(
 uint32_t nr);
 
 /**
+ * This function notifies the hypervisor that a set of discontiguous
+ * domain pages have been modified.
+ *
+ * @parm dmod a handle to an open devicemodel interface.
+ * @parm domid the domain id to be serviced
+ * @parm extents an array of extent structs, which each hold
+ a start_pfn and nr (number of pfns).
+ * @parm nr the number of extents in the array
+ * @return 0 on success, -1 on failure.
+ */
+int xendevicemodel_modified_memory_bulk(
+xendevicemodel_handle *dmod, domid_t domid,
+struct xen_dm_op_modified_memory_extent extents[], uint32_t nr);
+
+/**
  * This function notifies the hypervisor that a set of domain pages
  * are to be treated in a specific way. (See the definition of
  * hvmmem_type_t).
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 2122c45..c90af64 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -119,56 +119,89 @@ static int set_isa_irq_level(struct domain *d, uint8_t 
isa_irq,
 }
 
 static int modified_memory(struct domain *d,
-   struct xen_dm_op_modified_memory *data)
+ 

[Xen-devel] [PATCH] dm_op: Add xendevicemodel_modified_memory_bulk.

2017-03-16 Thread Jennifer Herbert
From: Jennifer Herbert <jennifer.herb...@citrix.com>

This new lib devicemodel call allows multiple extents of pages to be
marked as modified in a single call.  This is something needed for a
usecase I'm working on.

The xen side of the modified_memory call has been modified to accept
an array of extents.  The devicemodle library either provides an array
of length 1, to support the original library function, or a second
function allows an array to be provided.

Signed-off-by: Jennifer Herbert <jennifer.herb...@citrix.com>
---
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Wei Liu <wei.l...@citrix.com>
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: Paul Durrant <paul.durr...@citrix.com>
---
 tools/libs/devicemodel/core.c   |   30 --
 tools/libs/devicemodel/include/xendevicemodel.h |   20 +++-
 xen/arch/x86/hvm/dm.c   |  117 +++
 xen/include/public/hvm/dm_op.h  |   12 ++-
 4 files changed, 125 insertions(+), 54 deletions(-)

diff --git a/tools/libs/devicemodel/core.c b/tools/libs/devicemodel/core.c
index a85cb49..1f7a9dc 100644
--- a/tools/libs/devicemodel/core.c
+++ b/tools/libs/devicemodel/core.c
@@ -434,22 +434,36 @@ int xendevicemodel_track_dirty_vram(
  dirty_bitmap, (size_t)(nr + 7) / 8);
 }
 
-int xendevicemodel_modified_memory(
-xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
-uint32_t nr)
+int xendevicemodel_modified_memory_bulk(
+xendevicemodel_handle *dmod, domid_t domid,
+struct xen_dm_op_modified_memory_extent *extents, uint32_t nr)
 {
 struct xen_dm_op op;
-struct xen_dm_op_modified_memory *data;
+struct xen_dm_op_modified_memory *header;
+size_t extents_size = nr * sizeof(struct xen_dm_op_modified_memory_extent);
 
 memset(, 0, sizeof(op));
 
 op.op = XEN_DMOP_modified_memory;
-data = _memory;
+header = _memory;
 
-data->first_pfn = first_pfn;
-data->nr = nr;
+header->nr_extents = nr;
+header->offset = 0;
 
-return xendevicemodel_op(dmod, domid, 1, , sizeof(op));
+return xendevicemodel_op(dmod, domid, 2, , sizeof(op),
+ extents, extents_size);
+}
+
+int xendevicemodel_modified_memory(
+xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
+uint32_t nr)
+{
+struct xen_dm_op_modified_memory_extent extent;
+
+extent.first_pfn = first_pfn;
+extent.nr = nr;
+
+return xendevicemodel_modified_memory_bulk(dmod, domid, , 1);
 }
 
 int xendevicemodel_set_mem_type(
diff --git a/tools/libs/devicemodel/include/xendevicemodel.h 
b/tools/libs/devicemodel/include/xendevicemodel.h
index b3f600e..b88d73c 100644
--- a/tools/libs/devicemodel/include/xendevicemodel.h
+++ b/tools/libs/devicemodel/include/xendevicemodel.h
@@ -236,8 +236,8 @@ int xendevicemodel_track_dirty_vram(
 uint32_t nr, unsigned long *dirty_bitmap);
 
 /**
- * This function notifies the hypervisor that a set of domain pages
- * have been modified.
+ * This function notifies the hypervisor that a set of contiguous
+ * domain pages have been modified.
  *
  * @parm dmod a handle to an open devicemodel interface.
  * @parm domid the domain id to be serviced
@@ -249,6 +249,22 @@ int xendevicemodel_modified_memory(
 xendevicemodel_handle *dmod, domid_t domid, uint64_t first_pfn,
 uint32_t nr);
 
+
+/**
+ * This function notifies the hypervisor that a set of discontiguous
+ * domain pages have been modified.
+ *
+ * @parm dmod a handle to an open devicemodel interface.
+ * @parm domid the domain id to be serviced
+ * @parm extents an array of extent structs, which each hold
+ a start_pfn and nr (number of pfns).
+ * @parm nr the number of extents in the array
+ * @return 0 on success, -1 on failure.
+ */
+int xendevicemodel_modified_memory_bulk(
+xendevicemodel_handle *dmod, domid_t domid,
+struct xen_dm_op_modified_memory_extent extents[], uint32_t nr);
+
 /**
  * This function notifies the hypervisor that a set of domain pages
  * are to be treated in a specific way. (See the definition of
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 2122c45..0c4a820 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -119,56 +119,96 @@ static int set_isa_irq_level(struct domain *d, uint8_t 
isa_irq,
 }
 
 static int modified_memory(struct domain *d,
-   struct xen_dm_op_modified_memory *data)
+   struct xen_dm_op_modified_memory *header,
+   xen_dm_op_buf_t* buf)
 {
-xen_pfn_t last_pfn = data->first_pfn + data->nr - 1;
-unsigned int iter = 0;
-int rc = 0;
+/* Process maximum of 255 pfns before checking for continuation */
+const uint32_t max_pfns = 0xff;
 
-if ( (data->first_pfn > last_pfn) ||
- (last_pfn > domain_get_maximum_gpf

Re: [Xen-devel] QEMU XenServer/XenProject Working group meeting 10th February 2017

2017-02-28 Thread Jennifer Herbert

QEMU XenServer/XenProject Working group meeting 10th February 2017
==

Attendance
--

Andrew Cooper
Ian Jackson
Paul Durrent
Jennifer Herbert
George Dunlap
Igor Druzhinin
Sergey Dyasli


First topic: DMop
-

After an introduction, Paul started by describing his recent progress 
with Dmop.
He has posted a privcmd series, including the dmop hypervisor parts.  
This adds the dmop ioctl and the restriction ioctl.  This restrict 
ioctl, restricts access of the dmop to that of the given domain, and 
prevents further access via the privcmd.


Paul has also written all the code for libdevicemodel -  4 patches. This 
code is linux specific, and will detect if its running on an OS it doest 
support (either old linux or windows), or where the privcmd driver 
doesnt support the dm ioctl.
He next intends add a compatibility layer  to libxenctl, which reroutes 
old xc calls to the new way.


This would make the make calls switch transparently, but you'd know at 
the time you made the restriction call, if it succeeded or not, and can 
use local policy to determine if to continue or not.

Following this, QEMU needs to be ported onto this new library.

Next was a lengthy discussion QEMU's mapcahe, and on how to secure mmap 
calls.  It was eventual concluded that Paul could just extend his 
current patches to also include the mmap calls.


It was asked if using this library would mean having two fds,  but this 
was said to be only true if you continued to keep xenctl open – which 
you shouldn’t since the new restricted ioctl fd should be able to do 
everything you might need the old one for.


Paul detailed how the privcmd driver now has a domid associated with 
each fd, which defaults to 'invalid'.  The restrict ioctl sets this 
domid, and from then on, every call from that fd is checked against that 
domain.  The restrict ioctl won't allow the domid to be changed again 
later.


It was suggested other calls should be checked, but Paul didn't think 
there where any others not already discussed.



Next topic: How to prevent QEMU running as root
---

There is a patch to so make QEMU change away from root user.  This is 
expected to make a number of things stop working:


* CD-ROM insert  -   Eject is expected to work, if you could insert one.
* Emulated hotplug - unsure.
* PCI-passthough – This is deemed out of scope since pci-passthough 
inherently unsafe anyway.  It should give a useful error message 
suggesting turning off de-priv.  This could be made to work in future.


It is thought that some of these problems could be solved by passing fds 
over QMP.  Furthermore, it was thought that DavidV. had looked, and 
found you could already do this.  If so, only toolstack changes would be 
needed.  Making emulated network hotplug or IDE hotplug work is probably 
not important.


NOTE: Need to check QMP and it applicability to this problem.


Next Topic: Xenstore


Juergen has a proposal for xenstored, to have something similar to dm-op 
ioctl.  Each command can have a prefix which requests it's executed with 
a specified domain security context.


Its also necessary to run the multilpe QEMUs, such that the PV back-ends 
can be run in a separate instance from the emulators.  In doing this, 
you also need to separate out the xenstore paths, such that each QEMU 
uses its own xenstore branch.
Part of the reason is that the backend paths that you are using to 
control the emulated deprivileged QEMU, have to be writeable by the 
guest. This is because the emulated QEMU is going to have to write to them.


The emulator patch ID stuff is already upstream – and can be invoked.   
But toostack has to be able to cope.  Ian wrote a LibXL patch series 
(20~30 patches) 18 months ago, but that has bit rotted.  This patch was 
said to be surprisingly long.


Andrew is concerned that giving a guest permission to write to the 
xenstore paths used by its QEMU may cause a significant security issue, 
while others think it would merely offer a new way for it to crash 
itself.  Andrew to inspect Ian's patch series to evaluate this risk.


There are only a hand full of xenstore paths needed by the emulated QEMU 
– which will only drop as QMP use increased.
Another thing to consider is you'll have to be able to talk to multiple 
QEMU's via QMP – knowing which is which!


There was a suggestion that you could differentiate between QEMU pv and 
QEMU emulated by how your communicating with it – but this wouldn't 
really be possible, as QEMU upstream won't accept anything using 
xenstore, if it might be useful to more then just the xen community.



XenServer is currently working towards moving to QEMU upstream, with 
it's core functionally currently working using temporary python shims, 
translating the API calls.  If depriv is unstreamed in time, XenServer 
won't have to transition the APIs twice

[Xen-devel] QEMU XenServer/XenProject Working group meeting 29th September 2016

2016-10-14 Thread Jennifer Herbert

QEMU XenServer/XenProject Working group meeting 29th September 2016
===

Attendees
-

David Vrabel
Jennifer Herbert
Ian Jackson
Andrew Cooper
Paul Durrant
Lars Kurth

QEMU depriv
===

DMOP


There has been agreement on list on the DMOP proposal.  The HVMCTL
patch series, which was proposed  should need only mechanical changes
to use it as a basis for DMOP.

Privcmd
---

The privcmd changes should be fairly trivial to implement. Libxc
would need changing, but this code is also in the HVMCTL patch
series.  This mean only thing needed for QEMU it to call the restrict
ioctl, to enable it.  If restrict ioctl missing, an error would be
returned.  QEMU would probably want an option to it, to indicate
de-priv is required.  Given this option, the QEMU would raise an error
if the restrict ioctl was not present.

In order to avoid accidents due to ABI instability, old DMOP numbers would
be retired when a DMOP in changed in an ABI-incompatible way - there is
no shortage of new DMOP numbers.

Eventchan
-

Eventchan has resections in 4.7, but the libxc parts need to be done.
This should not be much work.

XenStore


For the non-pv part of QEMU, XenStore is only used in two places.
There is the DM state, and the physmap mechanism.  Although there is a
vague plan for replacing the physmap mechanism, it is some way off.

The DM state key is used for knowing when the qemu process is running
etcetera, QMP would seem to be an option to replace it - however there
is no (nice) way to wait on a socket until it has been opened.  One
solution might be to use Xenstore to let you know the QMP sockets
where available, before QEMU drops privileges,  and then QMP could be
used to know QEMU is in the running state.

To avoid the need to use xs-restrict, you would need to both replace
physmap and rework qemu startup procedure. The use of xs-restrict would
be more expedient, and does not look to need that much work.

Discussion was had over how secure it would be to allow a guest access
to these Xenstore keys - it was concluded that a guest could mostly
only mess itself up.  If I guest attempted to prevent itself from being
migrated, the tool stack time it out, and could kill it.

There followed a discussion on the Xenbus protocol, and additions
needed.  The aim is to merely restrict the permission for the command,
to that of the guest who's domID you provide.  It was proposed that
it uses the header as is, with its  16 bytes, with the command
'one-time-restrict' , and then the payload would have two additional
field at the start.  These two field would correspond to the domid to
restrict as, and the real command. Transaction ID and tags would be
taken from the real header.

Although inter domain xs-restrict is not specifically needed for this
project, it is thought it might be a blocking items for upstream
acceptance.  It it thoughts these changes would not require that much
work to implement, and may be useful in use use cases. Only a few
changes to QEMU would be needed, and libxl should be able to track
QEMU versions.  Ian Jackson volunteered to look at this, with David
helping  with the kernel bits.  Ian won't have time to look at this
until after Xen 4.8 is released.

There discussion about what may fail once privileges are taken away,
which would include CDs and PCI pass though.  It is thought the full
list can only be known by trying.  Not everything needs to work for
acceptance upstream, such as PCI pass though.   If such an
incompatible feature is needed, restrictions can be turned off.  These
problems can be fixed in a later phase, with CDs likely being at teh
top of the list.


Action items


Hypervisor bits really needed first, but can't be done until 4.8 has
been.

Ian to look at the Xenstore items David is to look at the kernel
items.  Paul is to audit the HVMops, checking parameters etc;

It is too late to get this in 4.8, but it is desired to get this in
early into 4.9 so that there can be a period of stabilisation.  With
the release of 4.8 imminent, little work will happen until after that.
However Paul, David and Ian are asked to have a think about their
respective areas, and have a plan for when they can be done.  They are
welcome to start tackling them if they have time.



disaggregation
=

A disaggregation proposal which had previously been posted to a QEMU
forum was discussed.  It was not previously accepted by all. The big
question was how to separate the device models from the machine, with
a particular point of contention being around PIIX and the idea of
starting a QEMU instance without one.  The general desire from us is
we want to have a specific device emulated and nothing else.  It is
suggested you would have a software interface between each device that
looked a software version of PCI.  The PIIX device could be attached to
CPU this pseudo PCI interface.  This would fit in well

[Xen-devel] XenProject/XenServer QEMU working group minutes, 30th August 2016

2016-09-09 Thread Jennifer Herbert

QEMU XenServer/XenProject Working group meeting 30th August 2016


Attendance
--

Andrew Cooper
Ian Jackson
Paul Durrant
David Vrabel
Jennifer Herbert

Introduction


Introduced Paul Durrant to the working group.

Started by recapping our purpose: A way to make it possible for qemu
to be able to make hypercalls without too much privilege, in a way
which is up streamable.Dom0 guest must not be able abuse interface
to compromise dom0 kernel.

QEMU Hypercalls – DM op
---

Has been much discussion on XenDevel -  a problem identified is when
you have operations with references to  other user memory objects,
such as with Track dirty VRAM (As used with the VGA buffer)  At the
moment, apparently there is only that one, but others may emerge.

Most obvious solution would involve the guest kernel validating the
virtual address passed, however that would would rely on the guest
kernel knowing where to objects where.   This is to be avoided.

Ian recounts how there where variose proposals, on XenDevel involving,
essentially, informing the hypervisor, or some way providing the
information about which virtual addresses where being talked about by
the hypercall to the hypervisor.  Many of these involved this
information being transmitted via a different channel.

Ian suggest the idea provide a way for the kernel to tell the
hypervisor is user virtual rages, dm op allowed memory. And there
would be a flag, in the dm op, in a fixed location, that would tell
the hypervisor, this only talks about special dm pre-approved memory.


A scheme of pre-nominating an area in QEMU, maybe using hypercall
buffers is briefly discussed,as well as a few other ideas, but
concludes that doesn’t really address the problem of future DM ops –
of which there could easily be.  Even if we can avoid the problem by
special cases for our current set-up, we still need a story for how to
add future interfaces with handles, without the need to change the
kernel interface.  Once we come up with story, we wouldn't necessarily
have to implement it.


The concept of using physical addressed hypercall buffers was
discussed.  Privcmd could allocate you a place, and mmap it into user
ram, and this is the only memory that would be used with the
hypercalls.  A hypercall would tell you the buffer range. Each qemu
would need to be associated with the correct set of physical buffers.

A recent AMD proposal was discussed, which would use only physical
addresses, no virtual address.  The upshot being we should come up
with a solution that is not incompatible this.

Ideas further discussed: User code could just put stuff in mmaped
memory, and only refer to offset within that buffer.  The privcmd
driver would fill in physical details. All dm ops would have 3
arguments:  dm op, pointer to to struct, and optional pointer to
restriction array – the last of which is filled in my privcmd driver.
It is discussed how privcmd driver must not look at the dm op number,
in particular, to know how to validate addresses, as it must be
independent from the API.

A scheme where qemu calls an ioctl before it drops privileges, to set
up restrictions ahead of time, is discussed.  One scheme may work by
setting up a rang for a given domain or VCPU.

The assumption is that all device model, running in the same domain,
have the same virtual address layout.  Then there would be a flag, in
the stable bit of the api, if to apply that restriction  - any kernel
dm op would not apply that restriction.

The idea can be extended – to have more one address range, or can have
range explicitly provided in the hypercall.  This latter suggestion is
preferred, however each platform would have different valid address
ranges, and privcmd is platform independent.  Its discussed how a
function could be created to return valid rages for your given
platform, but this is not considered a element solution. The third
parameter of the dm op could be array of ranges, where common case for
virtual addresses may be 0-3GB, but for physical addresses, it might
be quite fragmented.


A further ideas is proposed to extend the dm op, to have a fixed part,
to have an array of guest handles, the kernel can audit. The
arguments would be:

Arg1: Dom ID:
Arg2: Guests handle array of tuples(address, size)
Arg3: Number guest handles.

The first element of the array could be the DM op structure itself,
containing the DM Op code, and othe argument to the perticular op.
The Privcmd driver would only pass though what is provided by the
user.  Any extra elements would be ignored by the hypercall, and if
there where insufficient, the hypercall code would see a NULL, and be
able to gracefully fail.

The initial block (of dm arguments) passed in the array would be
copied into pre-zeroed memory of max op size, having checked the size
is not greater then this.  No need to check minimum, buffer
initialised to zero, so zero length

Re: [Xen-devel] Device model operation hypercall (DMOP, re qemu depriv)

2016-09-09 Thread Jennifer Herbert

On 01/08/16 12:32, Ian Jackson wrote:

I think we need to introduce a new hypercall (which I will call DMOP
for now) which may augment or replace some of HVMCTL.  Let me explain:



I believe the new 'DMOP' hypercall is a good idea,  but following on
from discussions, I propose a revised design, which I present below.
Please let me know what you think.

Thanks,
Jenny.


DMOP  (multi-buffer variant)



Introduction


A previous proposal for a 'DMOP' was put forward by Ian Jackson on the 1st
of August. This proposal seem very promising, however a problem was
identified with it, that this proposal addresses.

The aim of DMOP, as before, is to prevent a compromised device model from
compromising domains other then the one it is associated with. (And is
therefore likely already compromised)

The previous proposal adds a DMOP hypercall, for use by the device models,
which places the domain ID in a fixed place, within the calling args,
such that the priv driver can always find it, and not need to know any
further details about the particular DMOP in order to validate it against
previously set (vie ioctl) domain.

The problem occurs when you have a DMOP with references to other user memory
objects, such as with Track dirty VRAM (As used with the VGA buffer).
Is this case, the address of this other user object needs to be vetted,
to ensure it is not within a restricted address ranges, such as kernel
memory. The real problem comes down to how you would vet this address -
the idea place to do this is within the privcmd driver, since it would have
knowledge of the address space involved. However, with a principal goal
of DMOP to allow privcmd to be free from any knowledge of DMOP's sub ops,
it would have no way to identify any user buffer  addresses that need
checking.  The alternative of allowing the hypervisor to vet the address
is also problematic, since it has no knowledge of the guest memory layout.


The Design
--

As with the previous design, we provide a new restriction ioctl, which
takes a domid parameter.  After that restriction ioctl is called, the
privcmd driver will permit only DMOP hypercalls, and only with the
specified target domid.

In the previous design, a DMOP consisted of one buffer, containing all of
the DMOP parameters, which may include further explicit references to
more buffers.  In this design, an array of buffers with length is presented,
with the first one containing the DMOP parameters, which could implicitly
reference to further buffers from within in the array. Here, the only
user buffers passed, are that found with the array, and so all can be 
audited

from privcmd.  With the passing of the length of the buffers array, privcmd
does not need to know which DMOP it is to audit them.

If the hypervisor ends up with the wrong number of buffers, it can reject
the DMOP at that point.

The following code illustrates this idea:

typedef struct dm_op_buffer {
XEN_GUEST_HANDLE(void) h;
size_t len;
} dm_op_buffer_t;

int
HYPERVISOR_device_model_op(
domid_t domid,
unsigned int nr_buffers,
XEN_GUEST_HANDLE_PARAM(dm_op_buffer_t) buffers)

@domid: the domain the hypercall operates on.
@nr_buffers; the number of buffers in the @buffers array.

@buffers: an array of buffers.  @buffers[0] contains device_model_op - the
structure describing the sub-op and its parameters. @buffers[1], @buffers[2]
etc. may be used by a sub-op for passing additional buffers.

struct device_model_op {
uint32_t op;
union {
 struct op_1 op1;
 struct op_2 op2;
 /* etc... */
} u;
};

It is forbidden for the above struct (device_model_op) to contain any
guest handles - if they are needed, they should instead be in
HYPERVISOR_device_model_op->buffers.

Validation by privcmd driver
--

If the privcmd driver has been restricted to specific domain (using a
 new ioctl), when it received an op, it will:

1. Check hypercall is DMOP.

2. Check domid == restricted domid.

3. For each @nr_buffers in @buffers: Check @h and @len give a buffer
   wholey in the user space part of the virtual address space. (e.g.,
   on Linux use access_ok()).


Xen Implementation
--

Since a DMOP sub of may need to copy or return a buffer from the guest,
as well as the DMOP itself to git the initial buffer, functions for doing
this would be written as below.  Note that care is taken to prevent
damage from buffer under or over run situations.  If the DMOP is called
with insufficient buffers, zeros will be read, while extra is ignored.


int copy_dm_buffer_from_guest(
void *dst,/* Kernel destination buffer  */
size_t max_len,   /* Size of destination buffer */
XEN_GUEST_HANDLE_PARAM(dm_op_buffer_t) buffers,
  /* dm_op_buffers passed into DMOP */
unsigned int nr_buffers,  /* total number of dm_op_buffers  */
  

[Xen-devel] XenProject/XenServer QEMU working group, Friday 8th July, 2016, 15:00.

2016-07-28 Thread Jennifer Herbert


XenProject/XenServer QEMU working group, Friday 8th July, 2016, 15:00.


Date and Attendees
==

XenProject/XenServer QEMU working group, held on Friday the 8th of July,
2016 at 15:00.

The Following were present:

Ian Jackson [platform team]
David Vrabel [ring0]
Andrew Cooper [ring0]
Jennifer Herbert [ring0]


Purpose of meeting
==

Both XenServer (currently using qemu-trad with de-priv) and XenProject
(using QEMU (upstream) without dep-priv), would like to move to using
QEMU (upstream) de-privelaged.

This meeting was intended to restart XenServer/ring0 and XenProject's
collaboration.

Agenda includes:

* Discuss requirements
* Review the our status and strategy on achieving these requirements.
* Agree next steps.

Meeting Actions
===

Ian: Find all the Xenstore keys used by QEMU, and evaluate how much work
 it would be to either read before privileges are dropped, or
 otherwise stop reading/writing them when de-privelaged.

Ian: Write up DM_Opp design (discussed during meeting), talk to Jan
 about this.

XenServer: Go though All the priv-cmd ops, check how they would fit
 within the dm-opp design.

David: Post Event channel / priv command patch to upstream.  (Now done)


Meeting Transcript - abridged
=


Discussed Goals
---

Everyone agrees we want to stop anything gaining control of QEMU from
also gaining access to the platform.

XenProject would like to use depriv by default.  Should not preclude
other configurations such as PCI pass though.

Ian: There is a project in progress to use stub domains with QEMU –
 would expect a user to run depriv or stubdoms, but not at the same
 time.
XS: states that is does not want to go down the stub domains route, as
is not considered scalable. (Boot storms etc;)

Ian: In upstream we expect to pursue both depriv qemu, and stub qemu, as
   implementation strategies.  Depriv qemu is probably deliverable
   sooner.  Stub qemu is more secure but not suitable for all users
   (eg, maybe not for XenServer as David suggested).


Going though list technology areas listed in document shared beforehand.

Disk / Nsetwork / Host IO
-

Can be addressed by running as unprivileged user.  Andrew: Rather
linux centric - not posix solution for all these problem:  Ian: There
are similar interfaces on other platforms.  In upstream we expect to at
least be able to run as an unprivileged user, which is indeed portable.

Mem map


XenServer has a solution.  Ian: Would like to see this shared
immediately.  Andrew: Consider rest of design first.

HyperCalls
--

Three options:
1: XSM
2: Fix ABIan:  Re-arrange priv-cmd to make it easier to parse and
   restrict.
3: Loadable pattern matching solution.  All agree this would be
   horrible to maintain.

Ian: Difference between not having Stable API, and the instability of
 ABI stopping the determination of domain.
Ian: Version compatibility out of scope.  Andrew:  Disagree, should do
 both together.

Andrew: Requirement: – no interdependence between QEMU and the kernel,
such that you need to update both together.

All: Agree xen interface should not be embedded in the kernel.

Jen: Asked why Ian was against using XSM.
Ian: XSM wholesale replace existing checks, and current XSM polices are
 unproven.
Andrew: XS is experimenting, and there are problems.
Ian & Andrew: Agree would need auditing.
Ian: Reluctant to link xsm to this work.
David: Can do both – XSM and 'Fix ABI'.
Ian: With both, wouldn't need the nesting.
David: Would still need to nesting to wrap domains.
Ian: Doesn't have to do all ….
David: 
David: (wearing a linux hat): Sceptical for stable API – don't want to
have a new kernel when API new features added.
Ian: Suggest ideas of DM-Space – D-ctrl space.
David: Would need to be a strong component.  Both ends would want
   commitment ~ no snow flakes.
Ian: Shouldn't be a ploblem.
Andrew: PVH would relpcaes QEMU.   Ian:  Not relevant.

Idea of DM-ops is discussed more.
Dm-op restricted to domain
No need to version this.
Important fields (domain) fixed, such that DM-ops can be added
changed, without affecting kernel interface.

Andrew bring up XenGT.   Is this considered part of QEMU.  QEMU
shouldn't set pci bars.
Question if DM-OP would just be HVM-OP.
David: Should enumerate hyper-calls, make sure in DM-ops.
Andrew & Ian : would need to discuss with Jan.
Some things may need to be in both dm-op & other things, ie guest
interface. - This is ok.

dm-ops is tentatively agreed on as a way forward.

XenStore Restrict.
--

Jen : Upstream discussing maybe dropping sockets.  (Necessary for
restrict)
David: Problem, is it gives the same permissions as the guest – does
this restrict too much?

Should look though all uses, see if they can be used before it drops
privileges.

Ian: Already looke

Re: [Xen-devel] [PATCHv1 3/4] spinlock: shrink struct lock_debug

2016-01-19 Thread Jennifer Herbert


On 18/12/15 16:58, Jan Beulich wrote:

On 18.12.15 at 15:09,  wrote:

--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -16,7 +16,7 @@ static atomic_t spin_debug __read_mostly = ATOMIC_INIT(0);
  
  static void check_lock(struct lock_debug *debug)

  {
-int irq_safe = !local_irq_is_enabled();
+s16 irq_safe = !local_irq_is_enabled();
  
  if ( unlikely(atomic_read(_debug) <= 0) )

  return;

I can't figure out why this odd looking change is needed.

Jan




This patch shrinks struct lock_debug's irq_safe member.  Since you end 
up with many instances of this embedded in other structures such as 
struct domain, this become a useful saving in memory.  At one point, I 
had an issue with struct domain exceeding 4k, which caused trouble.
Given the change to struct_lock debug,  this local variable irq_safe, is 
compared with it, and used in cmpxchg with it it, and therefore should 
match type.


- Jenny

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv1 4/4] spinlock: add fair read-write locks

2016-01-18 Thread Jennifer Herbert

Hi Jan,

Thank you for your comments.  Sorry for the slow response - xmas and all...
Responses below...


On 22/12/15 13:54, Jan Beulich wrote:

+/**
+ * rspin_until_writer_unlock - inc reader count & spin until writer is gone

Stale comment - the function doesn't increment anything.

Also throughout the file, with Linux coding style converted to Xen
style, comment style should be made Xen-like too.


Oh yes, missed that - will fix.




+/*
+ * Readers come here when they cannot get the lock without waiting
+ */
+if ( unlikely(in_irq()) )
+{
+/*
+ * Readers in interrupt context will spin until the lock is
+ * available without waiting in the queue.
+ */
+smp_rmb();
+cnts = atomic_read(>cnts);
+rspin_until_writer_unlock(lock, cnts);
+return;
+}

I can't immediately see the reason for this re-introduction of
unfairness - can you say a word on this, or perhaps extend the
comment?


We haven't found a reason this was introduced to Linux, but assume this 
was to reduce  interrupt latency.  I had thought to leave it there, in 
case we want to use it in the future, but now feel it would probably 
better to remove it, and deal with any irq related issues, if and when 
we use rw locks from irq handlers.




-x = lock->lock & ~RW_WRITE_FLAG;
-}
-preempt_disable();
-}
-
-void _write_lock_irq(rwlock_t *lock)
-{
-uint32_t x;
+cnts = atomic_read(>cnts);
+if ( !(cnts & _QW_WMASK) &&
+ (atomic_cmpxchg(>cnts, cnts,
+ cnts | _QW_WAITING) == cnts) )
+break;

Considering that (at least on x86) cmpxchg is relatively expensive,
and that atomic OR would be carried out by some cmpxchg-like
mechanism on most other arches anyway, can't this be an atomic
OR, followed by a read to check for another active writer?


I wanted to port know proven code.  We have now run this code in xen for 
quite a while, and while I think you may well be correct, I'd rather get 
this version in, and then consider further optimisation testing in 
future patches.



+unlock:
+spin_unlock(>lock);

Labels indented by at least one space please.

Also - are you using any of the static functions in spinlock.c? If not,
putting the rwlock code in a new rwlock.c would help review quite a
bit, since code removal and code addition would then be separate
rather than intermixed.


Great idea. I think the reasons for keeping them together must just be 
historical.  I'll try and split that out.




+/*
+ * Writer states & reader shift and bias
+ */
+#define_QW_WAITING  1   /* A writer is waiting */
+#define_QW_LOCKED   0xff/* A writer holds the lock */
+#define_QW_WMASK0xff/* Writer mask */
+#define_QR_SHIFT8   /* Reader count shift  */
+#define_QR_BIAS (1U << _QR_SHIFT)

Is there a particular reason for the 8-bit writer mask (a 2-bit one
would seem to suffice)?


You picked up on optimisation in your later comment - I should add a 
comment here.



+static inline int _write_trylock(rwlock_t *lock)
+{
+u32 cnts;
+
+cnts = atomic_read(>cnts);
+if ( unlikely(cnts) )
+return 0;
+
+return likely(atomic_cmpxchg(>cnts,
+ cnts, cnts | _QW_LOCKED) == cnts);

The | is pointless here considering that cnts is zero.


I theorised that this was like this to aid the readability of the code, 
although I don't know if it does.  I'd happily change it over, and 
replace the cnts with 0s.



+static inline void _write_unlock(rwlock_t *lock)
+{
+/*
+ * If the writer field is atomic, it can be cleared directly.
+ * Otherwise, an atomic subtraction will be used to clear it.
+ */
+atomic_sub(_QW_LOCKED, >cnts);
+}

Ah, I guess the comment here is the explanation for the 8-bit
write mask.


Yes. A comment on the declaration would no doubt help too.


+static inline int _rw_is_write_locked(rwlock_t *lock)
+{
+return atomic_read(>cnts) & _QW_WMASK;
+}

This returns true for write-locked or writer-waiting - is this intended?


It was, but having thought about it a bit more, it would only have been 
"more useful" like this in unsafe (or at lest ugly) code, and so for 
that reason I should change it.  I don't think this would have affected 
the safe cases.

Jan


Cheers

-jenny


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V2] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()

2015-07-07 Thread Jennifer Herbert
Unlikely that it may seem localtime_r could fail, which would result in a
null pointer dereference.  In this case, it shoud log the errno, (instead of
the date/time), and and continue its logging, as this is still useful.

Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
---
 tools/libxc/xtl_logger_stdio.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xtl_logger_stdio.c b/tools/libxc/xtl_logger_stdio.c
index d8646e0..5569c50 100644
--- a/tools/libxc/xtl_logger_stdio.c
+++ b/tools/libxc/xtl_logger_stdio.c
@@ -61,10 +61,13 @@ static void stdiostream_vmessage(xentoollog_logger 
*logger_in,
 struct tm lt_buf;
 time_t now = time(0);
 struct tm *lt= localtime_r(now, lt_buf);
-fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s ,
-lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday,
-lt-tm_hour, lt-tm_min, lt-tm_sec,
-tzname[!!lt-tm_isdst]);
+if (lt != NULL)
+fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s ,
+lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday,
+lt-tm_hour, lt-tm_min, lt-tm_sec,
+tzname[!!lt-tm_isdst]);
+else
+fprintf(lg-f, [localtime_r failed: %d] , errno);
 }
 if (lg-flags  XTL_STDIOSTREAM_SHOW_PID)
 fprintf(lg-f, [%lu] , (unsigned long)getpid());
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()

2015-07-07 Thread Jennifer Herbert
Unlikely that it may seem localtime_r could fail, which would result in a
null pointer dereference.  In this case, it shoud log the errno, (instead of
the date/time), and and continue its logging, as this is still useful.

Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
---
 tools/libxc/xtl_logger_stdio.c |   11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xtl_logger_stdio.c b/tools/libxc/xtl_logger_stdio.c
index d8646e0..b28ef73 100644
--- a/tools/libxc/xtl_logger_stdio.c
+++ b/tools/libxc/xtl_logger_stdio.c
@@ -61,10 +61,13 @@ static void stdiostream_vmessage(xentoollog_logger 
*logger_in,
 struct tm lt_buf;
 time_t now = time(0);
 struct tm *lt= localtime_r(now, lt_buf);
-fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s ,
-lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday,
-lt-tm_hour, lt-tm_min, lt-tm_sec,
-tzname[!!lt-tm_isdst]);
+if (lt != NULL)
+fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s ,
+lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday,
+lt-tm_hour, lt-tm_min, lt-tm_sec,
+tzname[!!lt-tm_isdst]);
+else
+fprintf(lg-f, [localtime_r failed: %d] , errno);
 }
 if (lg-flags  XTL_STDIOSTREAM_SHOW_PID)
 fprintf(lg-f, [%lu] , (unsigned long)getpid());
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()

2015-07-07 Thread Jennifer Herbert

I guess I didn't build what I thought I built.
V3 coming up

On 07/07/15 16:51, Andrew Cooper wrote:

On 07/07/15 16:48, Jennifer Herbert wrote:

Unlikely that it may seem localtime_r could fail, which would result in a
null pointer dereference.  In this case, it shoud log the errno, (instead of
the date/time), and and continue its logging, as this is still useful.

Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
---
  tools/libxc/xtl_logger_stdio.c |   11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xtl_logger_stdio.c b/tools/libxc/xtl_logger_stdio.c
index d8646e0..5569c50 100644
--- a/tools/libxc/xtl_logger_stdio.c
+++ b/tools/libxc/xtl_logger_stdio.c
@@ -61,10 +61,13 @@ static void stdiostream_vmessage(xentoollog_logger 
*logger_in,
  struct tm lt_buf;
  time_t now = time(0);
  struct tm *lt= localtime_r(now, lt_buf);
-fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s ,
-lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday,
-lt-tm_hour, lt-tm_min, lt-tm_sec,
-tzname[!!lt-tm_isdst]);
+if (lt != NULL)
+fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s ,
+lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday,
+lt-tm_hour, lt-tm_min, lt-tm_sec,
+tzname[!!lt-tm_isdst]);
+else
+fprintf(lg-f, [localtime_r failed: %d] , errno);

lg-f ?

~Andrew


  }
  if (lg-flags  XTL_STDIOSTREAM_SHOW_PID)
  fprintf(lg-f, [%lu] , (unsigned long)getpid());



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()

2015-07-07 Thread Jennifer Herbert

On 07/07/15 17:05, Ian Jackson wrote:

Andrew Cooper writes (Re: [Xen-devel] [PATCH V2] libxc: Prevent NULL pointer 
dereference in stdiostream_vmessage()):

On 07/07/15 16:48, Jennifer Herbert wrote:

+else
+fprintf(lg-f, [localtime_r failed: %d] , errno);

lg-f ?

I didn't spot this.  I guess the patch wasn't build-tested...



Not sure what happened there - something was built tested, but I 
presumably without this patch.

Sorry about that.

-jenny

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/7] libxc: Use const pointer in local_file_dump()

2015-07-07 Thread Jennifer Herbert

On 03/07/15 16:27, Ian Jackson wrote:

Jennifer Herbert writes ([Xen-devel] [PATCH 2/7] libxc: Use const pointer in 
local_file_dump()):

By adding the const keyword, it is clearer to people and static analysis
tools that no changes to the data are to be made.

Would it be wrong for a future patch to add a field to dump_args which
gets modified ?  AFAICT the answer is `no'.  So I don't understand why
it ought to be const.

Does your Coverity instance complain about every instance where a
struct exists which is not marked const but which could be ?


Coverity see da-fd being passed to local_file_dump, as a modifiable 
entity and concludes  this function may be overwriting this, and end up 
leaking the handle.


It wouldn't be wrong to modify a new field, I just hadn't considered 
that likely. Since maybe it is, I'll just mark as false positive.


-jenny

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 5/7] libxc: Removing dead code from xc_dom_allocate()

2015-07-01 Thread Jennifer Herbert
The only place that jumps to 'err:' does so because !dom, which is
rechecked in 'err:'.  This patch simplifies, giving the same result.

Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
---
 tools/libxc/xc_dom_core.c |7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index b100ce1..cde943c 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -720,7 +720,7 @@ struct xc_dom_image *xc_dom_allocate(xc_interface *xch,
   __FUNCTION__, cmdline, features);
 dom = malloc(sizeof(*dom));
 if ( !dom )
-goto err;
+return NULL;
 
 memset(dom, 0, sizeof(*dom));
 dom-xch = xch;
@@ -742,11 +742,6 @@ struct xc_dom_image *xc_dom_allocate(xc_interface *xch,
 
 dom-alloc_malloc += sizeof(*dom);
 return dom;
-
- err:
-if ( dom )
-xc_dom_release(dom);
-return NULL;
 }
 
 int xc_dom_kernel_max_size(struct xc_dom_image *dom, size_t sz)
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 7/7] libxc: Prevent NULL pointer dereference in stdiostream_vmessage()

2015-07-01 Thread Jennifer Herbert
Unlikely that it may seem localtime_r could fail, which would result in a
null pointer dereference.  In this case, one can simply just skip logging the
date/time, and logging anything is more useful then nothing.

Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
---
 tools/libxc/xtl_logger_stdio.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xtl_logger_stdio.c b/tools/libxc/xtl_logger_stdio.c
index d8646e0..74d66a5 100644
--- a/tools/libxc/xtl_logger_stdio.c
+++ b/tools/libxc/xtl_logger_stdio.c
@@ -61,10 +61,11 @@ static void stdiostream_vmessage(xentoollog_logger 
*logger_in,
 struct tm lt_buf;
 time_t now = time(0);
 struct tm *lt= localtime_r(now, lt_buf);
-fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s ,
-lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday,
-lt-tm_hour, lt-tm_min, lt-tm_sec,
-tzname[!!lt-tm_isdst]);
+if (lt != NULL)
+fprintf(lg-f, %04d-%02d-%02d %02d:%02d:%02d %s ,
+lt-tm_year+1900, lt-tm_mon+1, lt-tm_mday,
+lt-tm_hour, lt-tm_min, lt-tm_sec,
+tzname[!!lt-tm_isdst]);
 }
 if (lg-flags  XTL_STDIOSTREAM_SHOW_PID)
 fprintf(lg-f, [%lu] , (unsigned long)getpid());
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/7] libxc: Use const pointer in local_file_dump()

2015-07-01 Thread Jennifer Herbert
By adding the const keyword, it is clearer to people and static analysis
tools that no changes to the data are to be made.

Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
---
 tools/libxc/xc_core.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxc/xc_core.c b/tools/libxc/xc_core.c
index dfa424b..f759fd6 100644
--- a/tools/libxc/xc_core.c
+++ b/tools/libxc/xc_core.c
@@ -926,7 +926,7 @@ struct dump_args {
 static int local_file_dump(xc_interface *xch,
void *args, char *buffer, unsigned int length)
 {
-struct dump_args *da = args;
+const struct dump_args *da = args;
 
 if ( write_exact(da-fd, buffer, length) == -1 )
 {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3/7] libxc: Fix uninitialized valiables in xc_cpuid_hvm_policy()

2015-07-01 Thread Jennifer Herbert
If xc_hvm_param_get fails, is_pae and/or is_nestedhvm are left undefined.
This patch Indicates error and defaults to false.

Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
---
 tools/libxc/xc_cpuid_x86.c |   18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index 847b701..cc6f18a 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -268,11 +268,16 @@ static void xc_cpuid_hvm_policy(
 {
 DECLARE_DOMCTL;
 char brand[13];
-uint64_t val;
-int is_pae, is_nestedhvm;
+uint64_t val = 0;
+int is_pae;
+int is_nestedhvm = 0;
 uint64_t xfeature_mask;
+int rc;
+
+rc = xc_hvm_param_get(xch, domid, HVM_PARAM_PAE_ENABLED, val);
+if ( rc  0 )
+ ERROR(xc_hvm_param_get failed to get PAE_ENABLED with rc %d\n, rc);
 
-xc_hvm_param_get(xch, domid, HVM_PARAM_PAE_ENABLED, val);
 is_pae = !!val;
 
 /* Detecting Xen's atitude towards XSAVE */
@@ -282,8 +287,11 @@ static void xc_cpuid_hvm_policy(
 do_domctl(xch, domctl);
 xfeature_mask = domctl.u.vcpuextstate.xfeature_mask;
 
-xc_hvm_param_get(xch, domid, HVM_PARAM_NESTEDHVM, val);
-is_nestedhvm = !!val;
+rc = xc_hvm_param_get(xch, domid, HVM_PARAM_NESTEDHVM, val);
+if ( rc = 0 )
+is_nestedhvm = !!val;
+else
+ERROR(xc_hvm_param_get failed to get NESTEDHVM with rc %d\n, rc);
 
 switch ( input[0] )
 {
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 6/7] libxc: Fix misleading use of strncpy code in build_hvm_info()

2015-07-01 Thread Jennifer Herbert
hvm_info-signature is not a string, but an 64 bit int, and is not
NULL terminated.  The use of strncpy to populate it is inappropriate and
potentially misleading.  A cursory glance might have you thinking someone
had miscounted the length of the string literal - not realising it was
intentionally cropping of the null termination.
Also, since we wish to initialise all of hvm_info-signature, and
certainly no more, the use of sizeof is safer.

Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
---
 tools/libxc/xc_hvm_build_x86.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 003ea06..ec5ef4d 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -99,7 +99,7 @@ static void build_hvm_info(void *hvm_info_page,
 memset(hvm_info_page, 0, PAGE_SIZE);
 
 /* Fill in the header. */
-strncpy(hvm_info-signature, HVM INFO, 8);
+memcpy(hvm_info-signature, HVM INFO, sizeof(hvm_info-signature));
 hvm_info-length = sizeof(struct hvm_info_table);
 
 /* Sensible defaults: these can be overridden by the caller. */
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 4/7] libxc: Prevent dereferencing NULL pointers returned from xc_dom_allocate()

2015-07-01 Thread Jennifer Herbert
The return from xc_dom_allocate is not checked for a NULL value.
This patch fixes this, causing it to return from the function with an error.

Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
---
 tools/libxc/xc_dom_compat_linux.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/libxc/xc_dom_compat_linux.c 
b/tools/libxc/xc_dom_compat_linux.c
index 2c14a0f..617cd96 100644
--- a/tools/libxc/xc_dom_compat_linux.c
+++ b/tools/libxc/xc_dom_compat_linux.c
@@ -91,6 +91,8 @@ int xc_linux_build_mem(xc_interface *xch, uint32_t domid,
 
 xc_dom_loginit(xch);
 dom = xc_dom_allocate(xch, cmdline, features);
+if (dom == NULL)
+return -1;
 if ( (rc = xc_dom_kernel_mem(dom, image_buffer, image_size)) != 0 )
 goto out;
 if ( initrd  ((rc = xc_dom_ramdisk_mem(dom, initrd, initrd_len)) != 0) )
@@ -123,6 +125,8 @@ int xc_linux_build(xc_interface *xch, uint32_t domid,
 
 xc_dom_loginit(xch);
 dom = xc_dom_allocate(xch, cmdline, features);
+if (dom == NULL)
+return -1;
 if ( (rc = xc_dom_kernel_file(dom, image_name)) != 0 )
 goto out;
 if ( initrd_name  strlen(initrd_name) 
@@ -146,6 +150,8 @@ int xc_get_bit_size(xc_interface *xch,
 int rc;
 *bit_size = 0;
 dom = xc_dom_allocate(xch, cmdline, features);
+if (dom == NULL)
+return -1;
 if ( (rc = xc_dom_kernel_file(dom, image_name)) != 0 )
 goto out;
 if ( (rc = xc_dom_parse_image(dom)) != 0 )
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 0/7] libxc: Fix a number of coverity issues.

2015-07-01 Thread Jennifer Herbert
Fix a number of coverity issues in libxc.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/7] libxc: fix uninitialized variable in xc_cpuid_pv_policy()

2015-07-01 Thread Jennifer Herbert
If xc_domain_get_guest_width were to fail, guest_width is not set, and
hence guest_64bit becomes undefined.
Fix is to initialise to 0, and report error if call fails.

Signed-off-by: Jennifer Herbert jennifer.herb...@citrix.com
---
 tools/libxc/xc_cpuid_x86.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index c97f91a..847b701 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -437,14 +437,16 @@ static void xc_cpuid_pv_policy(
 {
 DECLARE_DOMCTL;
 unsigned int guest_width;
-int guest_64bit;
+int guest_64bit = 0;
 char brand[13];
 uint64_t xfeature_mask;
 
 xc_cpuid_brand_get(brand);
 
-xc_domain_get_guest_width(xch, domid, guest_width);
-guest_64bit = (guest_width == 8);
+if (xc_domain_get_guest_width(xch, domid, guest_width) == 0)
+guest_64bit = (guest_width == 8);
+else
+ERROR(Could not read guest word width.);
 
 /* Detecting Xen's atitude towards XSAVE */
 memset(domctl, 0, sizeof(domctl));
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv3 0/4] Use ticket locks for spinlocks

2015-04-21 Thread Jennifer Herbert


On 21/04/15 11:22, Jan Beulich wrote:
Despite both pictures saying micro-seconds at the respective axis (and 
hence the problem not being _as bad_) - did the data collection reveal 
where these IRQ disable regions are, so we could look into eliminating 
them? (ISTR there being some open coded IRQ-disable, lock, unlock, 
IRQ-restore/enable sequences, but I'm not sure whether they got 
eliminated already.) Jan 


The location of the locks was not collected, (Other then if it was from 
a spinlock or not) but I'm certainly considering doing this, as another 
graph of duration against cumulative time, shows the majority of time 
spent with interrupts disabled, is during code that keeps it disabled 
for long contiguous periods, all using raw (not spinlock) Interrupt 
disable.  (A few occurrences, but very long when they happen)  It would 
certainly be interesting to know where these come from.


-jenny

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel