Re: [Xen-devel] [PATCH v4 31/31] libxl: allow the creation of HVM domains without a device model.

2015-08-10 Thread Paul Durrant
> -Original Message-
> From: xen-devel-boun...@lists.xen.org [mailto:xen-devel-
> boun...@lists.xen.org] On Behalf Of Andrew Cooper
> Sent: 07 August 2015 19:42
> To: Wei Liu; Roger Pau Monne
> Cc: xen-de...@lists.xenproject.org; Ian Jackson; Ian Campbell; Stefano
> Stabellini
> Subject: Re: [Xen-devel] [PATCH v4 31/31] libxl: allow the creation of HVM
> domains without a device model.
> 
> On 07/08/15 17:24, Wei Liu wrote:
> > On Fri, Aug 07, 2015 at 05:51:02PM +0200, Roger Pau Monné wrote:
> > [...]
>   It is recommended to accept the default value for new guests.  If
>  diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
>  index 1599de4..d67feb0 100644
>  --- a/tools/libxc/xc_dom_x86.c
>  +++ b/tools/libxc/xc_dom_x86.c
>  @@ -1269,6 +1269,13 @@ static int meminit_hvm(struct
> xc_dom_image *dom)
>   if ( nr_pages > target_pages )
>   memflags |= XENMEMF_populate_on_demand;
> 
>  +/* Make sure there's a MMIO hole for the special pages. */
>  +if ( dom->mmio_size == 0 )
>  +{
>  +dom->mmio_size = NR_SPECIAL_PAGES << PAGE_SHIFT;
>  +dom->mmio_start = special_pfn(0);
>  +}
>  +
> >>> Better to just assert(dom->mmio_size != 0);
> >>>
> >>> It's really libxl's responsibility to generate memory layout for guest.
> >>> Libxc doesn't have all information to make the decision.
> >> As said in a previous email, libxl doesn't know the size or position of
> >> the special pages created by libxc code, so right now it's impossible
> >> for libxl to create a correct mmio hole for a HVMlite guest.
> >>
> > Then your change here doesn't solve the real problem. You can't guarantee
> > when dom->mmio_size != 0, 1) the hole is large enough to accommodate
> all
> > special pages, 2) special pages don't clash with real mmio pages.
> >
> > I still think there should be only one entity that controls what guest
> > memory layout looks like. And that entity should be the one which has
> > all the information available. In this case, libxl should be the one who
> > decides.

Good luck convincing QEMU of that! Whatever is decided for domains without a 
QEMU still needs to be applicable when QEMU is there.

  Paul

> 
> Layout and runtime management of guests has been in a very poor state
> since forever.
> 
> This results from a combination of things not having been written down
> to start with, new features bolted on the side, and bits moving around.
> Even at the London Hackathon in 2013, a group of us couldn't even work
> out whether it was possible for a guest with certain combinations of
> features to perform correct calculates not to exhaust its PoD pool and
> suffer a domian_crash().
> 
> This seems like a good opportunity to take a step back and reconsider
> things from scratch with the benefit of hindsight, in the hopes of
> finding a way forward which gets us into a better position.
> 
> Funnily enough, there happens to be a large collection of people
> happening very shortly in Seattle, and a rumour of some whiteboards.
> 
> We should consider:
> 
> * What there is (potentially) in a guests physical address space
> ** MMIO holes (including high), VGA hole, RMRR holes, magic emulator
> pages, magic Xen pages, ACPI reported regions, etc.
> ** Ancillary bits such as the PoD pool, Shadow pool, etc.
> * What are the architectural and ABI restrictions which exist
> * What limits exist, which are static, which are dynamic
> * What needs to be known by each entity in the system
> ** including what shouldn't be known by certain entities.
> 
> This will hopefully present a (more) clear picture of which entity
> should be making things like layout decisions, and what extra
> information they need to know.
> 
> It will also hopefully show how to go about fixing the existing runtime
> memory management issues.
> 
> ~Andrew
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.6 v2 4/4] oxenstored: move sd_notify_ready out of main loop

2015-08-10 Thread Wei Liu
Oxenstored only needs to notify systemd its readiness state once. Move
sd_notify_ready out of main loop.

Signed-off-by: Wei Liu 
Acked-by: Dave Scott 
---
For 4.6: avoid wasting CPU cycles, easy to reason its correctness.

There is a small risk that either I wrote the wrong code or I
misunderstand the usage of systemd API. However I've tested the modified
oxenstored it worked fine.
---
 tools/ocaml/xenstored/xenstored.ml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/ocaml/xenstored/xenstored.ml 
b/tools/ocaml/xenstored/xenstored.ml
index f484024..42b8183 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -428,11 +428,11 @@ let _ =
process_domains store cons domains
in
 
+   if Systemd.launched_by_systemd () then
+   Systemd.sd_notify_ready ();
while not !quit
do
try
-if Systemd.launched_by_systemd() then
-Systemd.sd_notify_ready ();
main_loop ()
with exc ->
error "caught exception %s" (Printexc.to_string exc);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.6 v2 3/4] oxenstored: fix systemd socket activation

2015-08-10 Thread Wei Liu
Use the correct API sd_listen_fds to determine whether the process is
started by systemd.

Change sd_booted to launched_by_systemd to avoid confusion with
systemd's API.

Signed-off-by: Wei Liu 
Acked-by: Dave Scott 
Acked-by: Ian Campbell 
---

For 4.6: without this oxenstored is broken when running on a system with
systemd but not started by systemd.

v2: booted -> lanuched, no functional change so I keep Dave's ack.
---
 tools/ocaml/xenstored/systemd.ml  | 2 +-
 tools/ocaml/xenstored/systemd.mli | 4 ++--
 tools/ocaml/xenstored/systemd_stubs.c | 6 +++---
 tools/ocaml/xenstored/utils.ml| 2 +-
 tools/ocaml/xenstored/xenstored.ml| 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/ocaml/xenstored/systemd.ml b/tools/ocaml/xenstored/systemd.ml
index 2aa39ea..732446d 100644
--- a/tools/ocaml/xenstored/systemd.ml
+++ b/tools/ocaml/xenstored/systemd.ml
@@ -13,5 +13,5 @@
  *)
 
 external sd_listen_fds: string -> Unix.file_descr = "ocaml_sd_listen_fds"
-external sd_booted: unit -> bool = "ocaml_sd_booted"
+external launched_by_systemd: unit -> bool = "ocaml_launched_by_systemd"
 external sd_notify_ready: unit -> unit = "ocaml_sd_notify_ready"
diff --git a/tools/ocaml/xenstored/systemd.mli 
b/tools/ocaml/xenstored/systemd.mli
index 85c9f2e..538fc5e 100644
--- a/tools/ocaml/xenstored/systemd.mli
+++ b/tools/ocaml/xenstored/systemd.mli
@@ -17,8 +17,8 @@
  *  us do sanity checks on the expected sockets *)
 val sd_listen_fds: string -> Unix.file_descr
 
-(** Tells us whether or not systemd support was compiled in *)
-val sd_booted: unit -> bool
+(** Tells us whether the process is launched by systemd *)
+val launched_by_systemd: unit -> bool
 
 (** Tells systemd we're ready *)
 external sd_notify_ready: unit -> unit = "ocaml_sd_notify_ready"
diff --git a/tools/ocaml/xenstored/systemd_stubs.c 
b/tools/ocaml/xenstored/systemd_stubs.c
index d924ff1..1bd5dea 100644
--- a/tools/ocaml/xenstored/systemd_stubs.c
+++ b/tools/ocaml/xenstored/systemd_stubs.c
@@ -92,14 +92,14 @@ CAMLprim value ocaml_sd_listen_fds(value connect_to)
CAMLreturn(sock_ret);
 }
 
-CAMLprim value ocaml_sd_booted(value ignore)
+CAMLprim value ocaml_launched_by_systemd(value ignore)
 {
CAMLparam1(ignore);
CAMLlocal1(ret);
 
ret = Val_false;
 
-   if (sd_booted())
+   if (sd_listen_fds(0) > 0)
ret = Val_true;
 
CAMLreturn(ret);
@@ -129,7 +129,7 @@ CAMLprim value ocaml_sd_listen_fds(value connect_to)
CAMLreturn(sock_ret);
 }
 
-CAMLprim value ocaml_sd_booted(value ignore)
+CAMLprim value ocaml_launched_by_systemd(value ignore)
 {
CAMLparam1(ignore);
CAMLlocal1(ret);
diff --git a/tools/ocaml/xenstored/utils.ml b/tools/ocaml/xenstored/utils.ml
index 61321c6..9f82c1c 100644
--- a/tools/ocaml/xenstored/utils.ml
+++ b/tools/ocaml/xenstored/utils.ml
@@ -84,7 +84,7 @@ let create_regular_unix_socket name =
 sock
 
 let create_unix_socket name =
-if Systemd.sd_booted() then
+if Systemd.launched_by_systemd() then
 Systemd.sd_listen_fds name
 else
 create_regular_unix_socket name
diff --git a/tools/ocaml/xenstored/xenstored.ml 
b/tools/ocaml/xenstored/xenstored.ml
index bfe689b..f484024 100644
--- a/tools/ocaml/xenstored/xenstored.ml
+++ b/tools/ocaml/xenstored/xenstored.ml
@@ -431,7 +431,7 @@ let _ =
while not !quit
do
try
-if Systemd.sd_booted() then
+if Systemd.launched_by_systemd() then
 Systemd.sd_notify_ready ();
main_loop ()
with exc ->
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.6 v2 0/4] Patches for c/oxenstored

2015-08-10 Thread Wei Liu
Wei Liu (4):
  cxenstored: fix systemd socket activation
  cxenstored: document a bunch of short options in help string
  oxenstored: fix systemd socket activation
  oxenstored: move sd_notify_ready out of main loop

 tools/ocaml/xenstored/systemd.ml  |  2 +-
 tools/ocaml/xenstored/systemd.mli |  4 +--
 tools/ocaml/xenstored/systemd_stubs.c |  6 ++--
 tools/ocaml/xenstored/utils.ml|  2 +-
 tools/ocaml/xenstored/xenstored.ml|  4 +--
 tools/xenstore/xenstored_core.c   | 59 +--
 6 files changed, 44 insertions(+), 33 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.6 v2 2/4] cxenstored: document a bunch of short options in help string

2015-08-10 Thread Wei Liu
Signed-off-by: Wei Liu 
Acked-by: Ian Campbell 
---
For 4.6: pure doc changes, risk free.
---
 tools/xenstore/xenstored_core.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 87cb715..25a548d 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1886,21 +1886,21 @@ static void usage(void)
 "\n"
 "where options may include:\n"
 "\n"
-"  --no-domain-initto state that xenstored should not initialise dom0,\n"
-"  --pid-filegiving a file for the daemon's pid to be written,\n"
-"  --help  to output this message,\n"
-"  --no-fork   to request that the daemon does not fork,\n"
-"  --output-pidto request that the pid of the daemon is output,\n"
-"  --trace-file  giving the file for logging, and\n"
-"  --entry-nb  limit the number of entries per domain,\n"
-"  --entry-size  limit the size of entry per domain, and\n"
-"  --watch-nb  limit the number of watches per domain,\n"
-"  --transaction   limit the number of transaction allowed per domain,\n"
-"  --no-recovery   to request that no recovery should be attempted when\n"
-"  the store is corrupted (debug only),\n"
-"  --internal-db   store database in memory, not on disk\n"
-"  --preserve-localto request that /local is preserved on start-up,\n"
-"  --verbose   to request verbose execution.\n");
+"  -D, --no-domain-initto state that xenstored should not initialise 
dom0,\n"
+"  -F, --pid-filegiving a file for the daemon's pid to be written,\n"
+"  -H, --help  to output this message,\n"
+"  -N, --no-fork   to request that the daemon does not fork,\n"
+"  -P, --output-pidto request that the pid of the daemon is output,\n"
+"  -T, --trace-file  giving the file for logging, and\n"
+"  -E, --entry-nb  limit the number of entries per domain,\n"
+"  -S, --entry-size  limit the size of entry per domain, and\n"
+"  -W, --watch-nb  limit the number of watches per domain,\n"
+"  -t, --transaction   limit the number of transaction allowed per 
domain,\n"
+"  -R, --no-recovery   to request that no recovery should be attempted 
when\n"
+"  the store is corrupted (debug only),\n"
+"  -I, --internal-db   store database in memory, not on disk\n"
+"  -L, --preserve-localto request that /local is preserved on start-up,\n"
+"  -V, --verbose   to request verbose execution.\n");
 }
 
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.6 v2 1/4] cxenstored: fix systemd socket activation

2015-08-10 Thread Wei Liu
There were two problems with original code:

1. sd_booted() was used to determined if the process was started by
   systemd, which was wrong.
2. Exit with error if pidfile was specified, which was too harsh.

These two combined made cxenstored unable to start by hand if it ran
on a system which had systemd.

Fix issues with following changes:

1. Use sd_listen_fds to determine if the process is started by systemd.
2. Don't exit if pidfile is specified.

Rename function and restructure code to make things clearer.

A side effect of this patch is that gcc 4.8 with -Wmaybe-uninitialized
in non-debug build spits out spurious warning about sock and ro_sock
might be uninitialized. Since CentOS 7 ships gcc 4.8, we need to work
around that by setting sock and ro_sock to NULL at the beginning of
main.

Signed-off-by: Wei Liu 
Tested-by: George Dunlap 
---
For 4.6: without this cxenstored is broken when running on a system with
systemd but not started by systemd.

v2: keep tested-by because there is no functional change from v1, drop
acked-by because a workaround for gcc 4.8 is introduced.
---
 tools/xenstore/xenstored_core.c | 29 -
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index b7e4936..87cb715 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1781,7 +1781,10 @@ static int xs_validate_active_socket(const char 
*connect_to)
return xs_get_sd_fd(connect_to);
 }
 
-static void xen_claim_active_sockets(int **psock, int **pro_sock)
+/* Return true if started by systemd and false if not. Exit with
+ * error if things go wrong.
+ */
+static bool systemd_checkin(int **psock, int **pro_sock)
 {
int *sock, *ro_sock;
const char *soc_str = xs_daemon_socket();
@@ -1789,7 +1792,11 @@ static void xen_claim_active_sockets(int **psock, int 
**pro_sock)
int n;
 
n = sd_listen_fds(0);
-   if (n <= 0) {
+
+   if (n == 0)
+   return false;
+
+   if (n < 0) {
sd_notifyf(0, "STATUS=Failed to get any active sockets: %s\n"
   "ERRNO=%i",
   strerror(errno),
@@ -1816,6 +1823,8 @@ static void xen_claim_active_sockets(int **psock, int 
**pro_sock)
 
talloc_set_destructor(sock, destroy_fd);
talloc_set_destructor(ro_sock, destroy_fd);
+
+   return true;
 }
 #endif
 
@@ -1922,13 +1931,16 @@ int priv_domid = 0;
 
 int main(int argc, char *argv[])
 {
-   int opt, *sock, *ro_sock;
+   int opt, *sock = NULL, *ro_sock = NULL;
int sock_pollfd_idx = -1, ro_sock_pollfd_idx = -1;
bool dofork = true;
bool outputpid = false;
bool no_domain_init = false;
const char *pidfile = NULL;
int timeout;
+#if defined(XEN_SYSTEMD_ENABLED)
+   bool systemd;
+#endif
 
while ((opt = getopt_long(argc, argv, "DE:F:HNPS:t:T:RLVW:", options,
  NULL)) != -1) {
@@ -1990,10 +2002,11 @@ int main(int argc, char *argv[])
barf("%s: No arguments desired", argv[0]);
 
 #if defined(XEN_SYSTEMD_ENABLED)
-   if (sd_booted()) {
+   systemd = systemd_checkin(&sock, &ro_sock);
+   if (systemd) {
dofork = false;
if (pidfile)
-   barf("%s: PID file not needed on systemd", argv[0]);
+   xprintf("%s: PID file not needed on systemd", argv[0]);
pidfile = NULL;
}
 #endif
@@ -2020,9 +2033,7 @@ int main(int argc, char *argv[])
signal(SIGPIPE, SIG_IGN);
 
 #if defined(XEN_SYSTEMD_ENABLED)
-   if (sd_booted())
-   xen_claim_active_sockets(&sock, &ro_sock);
-   else
+   if (!systemd)
 #endif
init_sockets(&sock, &ro_sock);
 
@@ -2057,7 +2068,7 @@ int main(int argc, char *argv[])
xenbus_notify_running();
 
 #if defined(XEN_SYSTEMD_ENABLED)
-   if (sd_booted()) {
+   if (systemd) {
sd_notify(1, "READY=1");
fprintf(stderr, SD_NOTICE "xenstored is ready\n");
}
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 v2 4/4] oxenstored: move sd_notify_ready out of main loop

2015-08-10 Thread Andrew Cooper
On 10/08/2015 09:00, Wei Liu wrote:
> Oxenstored only needs to notify systemd its readiness state once. Move
> sd_notify_ready out of main loop.
>
> Signed-off-by: Wei Liu 
> Acked-by: Dave Scott 
> ---
> For 4.6: avoid wasting CPU cycles, easy to reason its correctness.
>
> There is a small risk that either I wrote the wrong code or I
> misunderstand the usage of systemd API. However I've tested the modified
> oxenstored it worked fine.
> ---
>  tools/ocaml/xenstored/xenstored.ml | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/ocaml/xenstored/xenstored.ml 
> b/tools/ocaml/xenstored/xenstored.ml
> index f484024..42b8183 100644
> --- a/tools/ocaml/xenstored/xenstored.ml
> +++ b/tools/ocaml/xenstored/xenstored.ml
> @@ -428,11 +428,11 @@ let _ =
>   process_domains store cons domains
>   in
>  
> + if Systemd.launched_by_systemd () then
> + Systemd.sd_notify_ready ();
>   while not !quit
>   do
>   try
> -if Systemd.launched_by_systemd() then
> -Systemd.sd_notify_ready ();

You have tabs/spaces issues here.

However, the two oxenstored patches are Tested-by: Andrew Cooper
.  XenServer testing over the weekend has
shown no regressions.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 v2 4/4] oxenstored: move sd_notify_ready out of main loop

2015-08-10 Thread Wei Liu
On Mon, Aug 10, 2015 at 09:04:20AM +0100, Andrew Cooper wrote:
> On 10/08/2015 09:00, Wei Liu wrote:
> > Oxenstored only needs to notify systemd its readiness state once. Move
> > sd_notify_ready out of main loop.
> >
> > Signed-off-by: Wei Liu 
> > Acked-by: Dave Scott 
> > ---
> > For 4.6: avoid wasting CPU cycles, easy to reason its correctness.
> >
> > There is a small risk that either I wrote the wrong code or I
> > misunderstand the usage of systemd API. However I've tested the modified
> > oxenstored it worked fine.
> > ---
> >  tools/ocaml/xenstored/xenstored.ml | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/ocaml/xenstored/xenstored.ml 
> > b/tools/ocaml/xenstored/xenstored.ml
> > index f484024..42b8183 100644
> > --- a/tools/ocaml/xenstored/xenstored.ml
> > +++ b/tools/ocaml/xenstored/xenstored.ml
> > @@ -428,11 +428,11 @@ let _ =
> > process_domains store cons domains
> > in
> >  
> > +   if Systemd.launched_by_systemd () then
> > +   Systemd.sd_notify_ready ();
> > while not !quit
> > do
> > try
> > -if Systemd.launched_by_systemd() then
> > -Systemd.sd_notify_ready ();
> 
> You have tabs/spaces issues here.
> 

Yeah. I know that. But that's what it used to be, not introduced by me.
Furthermore, it's removal, not addition, so I didn't bother sending out
another patch to adjust that.

> However, the two oxenstored patches are Tested-by: Andrew Cooper
> .  XenServer testing over the weekend has
> shown no regressions.
> 

Thanks.

Wei.

> ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-10 Thread Wei Liu
On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote:
> Currently in ioreq server, guest write-protected ram pages are
> tracked in the same rangeset with device mmio resources. Yet
> unlike device mmio, which can be in big chunks, the guest write-
> protected pages may be discrete ranges with 4K bytes each.
> 
> This patch uses a seperate rangeset for the guest ram pages.
> And a new ioreq type, IOREQ_TYPE_MEM, is defined.
> 
> Note: Previously, a new hypercall or subop was suggested to map
> write-protected pages into ioreq server. However, it turned out
> handler of this new hypercall would be almost the same with the
> existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
> already a type parameter in this hypercall. So no new hypercall
> defined, only a new type is introduced.
> 
> Signed-off-by: Yu Zhang 
> ---
>  tools/libxc/include/xenctrl.h| 39 +++---
>  tools/libxc/xc_domain.c  | 59 
> ++--

FWIW the hypercall wrappers look correct to me.

> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index 014546a..9106cb9 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -329,8 +329,9 @@ struct xen_hvm_io_range {
>  ioservid_t id;   /* IN - server id */
>  uint32_t type;   /* IN - type of range */
>  # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
> -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
> +# define HVMOP_IO_RANGE_MMIO   1 /* MMIO range */
>  # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func range */
> +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */

This looks problematic. Maybe you can get away with this because this is
a toolstack-only interface?

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 60639: tolerable FAIL

2015-08-10 Thread osstest service owner
flight 60639 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60639/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail 
like 60624
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 16 
guest-localmigrate/x10 fail like 60624
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail like 60624
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail like 60624
 test-armhf-armhf-xl-rtds 11 guest-start  fail   like 60624

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-armhf-armhf-xl-qcow2 9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 guest-saverestorefail   never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-raw   9 debian-di-installfail   never pass
 test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass

version targeted for testing:
 xen  201eac83831d94ba2e9a63a7eed4c128633fafb1
baseline version:
 xen  201eac83831d94ba2e9a63a7eed4c128633fafb1

Last test of basis60639  2015-08-09 04:41:18 Z1 days
Testing same since0  1970-01-01 00:00:00 Z 16657 days0 attempts

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-oldkern  pass
 build-i386-oldkern   pass 

[Xen-devel] OSSTest-- Leases::check_ip() ignore guest free IP

2015-08-10 Thread Hu, Robert
Hi,

Say such a leases file:
lease 192.168.199.124 {
  starts 4 2015/08/06 08:46:50;
  ends 4 2015/08/06 08:58:50;
  cltt 4 2015/08/06 08:46:50;
  binding state active;
  next binding state free;
  hardware ethernet 5e:36:0e:f5:00:02;
  uid "\001^6\016\365\000\002";
}
lease 192.168.199.242 {
  starts 4 2015/08/06 08:46:55;
  ends 4 2015/08/06 08:58:55;
  cltt 4 2015/08/06 08:46:55;
  binding state active;
  next binding state free;
  hardware ethernet 00:04:23:e8:dd:5a;
  uid "\001\000\004#\350\335Z";
}
...
lease 192.168.199.124 {
  starts 4 2015/08/06 08:46:50;
  ends 4 2015/08/06 08:58:50;
  tstp 4 2015/08/06 08:58:50;
  cltt 4 2015/08/06 08:46:50;
  binding state free;
  hardware ethernet 5e:36:0e:f5:00:02;
  uid "\001^6\016\365\000\002";
}

In Leases::check_ip(), it will ignore the last entry of guest freeing IP
and return last active IP lease entry. Then OSSTest will try to ping that IP 
and fails.

Best Regards,
Robert Ho


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-10 Thread Paul Durrant
> -Original Message-
> From: Wei Liu [mailto:wei.l...@citrix.com]
> Sent: 10 August 2015 09:26
> To: Yu Zhang
> Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; 
> Ian
> Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper;
> Kevin Tian; zhiyuan...@intel.com
> Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq
> server
> 
> On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote:
> > Currently in ioreq server, guest write-protected ram pages are
> > tracked in the same rangeset with device mmio resources. Yet
> > unlike device mmio, which can be in big chunks, the guest write-
> > protected pages may be discrete ranges with 4K bytes each.
> >
> > This patch uses a seperate rangeset for the guest ram pages.
> > And a new ioreq type, IOREQ_TYPE_MEM, is defined.
> >
> > Note: Previously, a new hypercall or subop was suggested to map
> > write-protected pages into ioreq server. However, it turned out
> > handler of this new hypercall would be almost the same with the
> > existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
> > already a type parameter in this hypercall. So no new hypercall
> > defined, only a new type is introduced.
> >
> > Signed-off-by: Yu Zhang 
> > ---
> >  tools/libxc/include/xenctrl.h| 39 +++---
> >  tools/libxc/xc_domain.c  | 59
> ++--
> 
> FWIW the hypercall wrappers look correct to me.
> 
> > diff --git a/xen/include/public/hvm/hvm_op.h
> b/xen/include/public/hvm/hvm_op.h
> > index 014546a..9106cb9 100644
> > --- a/xen/include/public/hvm/hvm_op.h
> > +++ b/xen/include/public/hvm/hvm_op.h
> > @@ -329,8 +329,9 @@ struct xen_hvm_io_range {
> >  ioservid_t id;   /* IN - server id */
> >  uint32_t type;   /* IN - type of range */
> >  # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
> > -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
> > +# define HVMOP_IO_RANGE_MMIO   1 /* MMIO range */
> >  # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func range
> */
> > +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */
> 
> This looks problematic. Maybe you can get away with this because this is
> a toolstack-only interface?
> 

Indeed, the old name is a bit problematic. Presumably re-use like this would 
require an interface version change and some if-defery.

  Paul

> Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen 4.6 retrospective] More public/easy to find information about the release schedule

2015-08-10 Thread Wei Liu
On Fri, Aug 07, 2015 at 05:36:57PM +0200, Roger Pau Monné wrote:
> = Issue / Observation =
> 
> The information about the release schedule is not clearly published
> anywhere apart from the mailing lists, which makes it hard for
> non-developers (or even for developers) given that the mailing list
> traffic for xen-devel is high.
> 
> = Possible Solution / Improvement =
> 
> Publish the release schedule in a web page with a concrete schedule,
> like the FreeBSD Release Engineering Team does:
> 
> https://www.freebsd.org/releng/
> 
> They even have the schedule for the 11.0-RELEASE, which is not expected
> until a year from now. Also each step/date contains an explanation of
> what's happening and what it means from a developer point of view.
> 

This is a good idea.

> Roger.
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 1/4] HVM x86 deprivileged mode: Page allocation helper

2015-08-10 Thread Tim Deegan
Hi,

At 10:57 +0100 on 07 Aug (1438945038), Ben Catterall wrote:
> On 06/08/15 20:22, Andrew Cooper wrote:
> > On 06/08/15 17:45, Ben Catterall wrote:
> >> This allocation function is used by the deprivileged mode initialisation 
> >> code
> >> to allocate pages for the new page table mappings and page frames on the 
> >> HAP
> >> page heap.
> >>
> >> Signed-off-by: Ben Catterall 
> > This is fine for your test box, but isn't fine for systems out there
> > without hardware EPT/NPT support.  For older systems like that (or in
> > certain specific workloads), shadow paging is used instead.
> >
> > This feature is applicable to any HVM domain, which means that it
> > shouldn't depend on HAP or shadow paging.
> >
> > How much memory is allocated for the depriv area, and what exactly is
> > allocated in total?
> So, per-vcpu:
> - a user mode stack which, from your comments in [RFC 2/4], can be 2 pages
> - local data (may or may not be needed, depends on the device) which 
> will be around
>a page or two.
> 
> Text segment: as per your comments in RFC 2/4, this will be changed to 
> be an alias
> so no extra memory.
> > I expect it isn't very much, and would suggest using
> > d->arch.paging.alloc_page() instead (which is the generic "get me some
> > memory accounted against the domain" helper) which looks as if it should
> > suffice.

Whie I agree that it would be good to account this to the domain,
paging->alloc_page() is an internal _paging assistance_ helper. :)
This new allocation is nothing to do with mm/paging-assistance, so
either it should find its own memory or the hap/shadow pool needs to
be made more generic.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 1/4] HVM x86 deprivileged mode: Page allocation helper

2015-08-10 Thread Tim Deegan
At 09:50 +0100 on 10 Aug (1439200241), Tim Deegan wrote:
> Hi,
> 
> At 10:57 +0100 on 07 Aug (1438945038), Ben Catterall wrote:
> > On 06/08/15 20:22, Andrew Cooper wrote:
> > > On 06/08/15 17:45, Ben Catterall wrote:
> > >> This allocation function is used by the deprivileged mode initialisation 
> > >> code
> > >> to allocate pages for the new page table mappings and page frames on the 
> > >> HAP
> > >> page heap.
> > >>
> > >> Signed-off-by: Ben Catterall 
> > > This is fine for your test box, but isn't fine for systems out there
> > > without hardware EPT/NPT support.  For older systems like that (or in
> > > certain specific workloads), shadow paging is used instead.
> > >
> > > This feature is applicable to any HVM domain, which means that it
> > > shouldn't depend on HAP or shadow paging.
> > >
> > > How much memory is allocated for the depriv area, and what exactly is
> > > allocated in total?
> > So, per-vcpu:
> > - a user mode stack which, from your comments in [RFC 2/4], can be 2 pages
> > - local data (may or may not be needed, depends on the device) which 
> > will be around
> >a page or two.
> > 
> > Text segment: as per your comments in RFC 2/4, this will be changed to 
> > be an alias
> > so no extra memory.
> > > I expect it isn't very much, and would suggest using
> > > d->arch.paging.alloc_page() instead (which is the generic "get me some
> > > memory accounted against the domain" helper) which looks as if it should
> > > suffice.
> 
> Whie I agree that it would be good to account this to the domain,
> paging->alloc_page() is an internal _paging assistance_ helper. :)
> This new allocation is nothing to do with mm/paging-assistance, so
> either it should find its own memory or the hap/shadow pool needs to
> be made more generic.

...at which point other HVM overheads - VMCx pages, bitmaps &c - could
be allocated from it as well.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 1/4] HVM x86 deprivileged mode: Page allocation helper

2015-08-10 Thread Andrew Cooper
On 10/08/2015 09:52, Tim Deegan wrote:
> At 09:50 +0100 on 10 Aug (1439200241), Tim Deegan wrote:
>> Hi,
>>
>> At 10:57 +0100 on 07 Aug (1438945038), Ben Catterall wrote:
>>> On 06/08/15 20:22, Andrew Cooper wrote:
 On 06/08/15 17:45, Ben Catterall wrote:
> This allocation function is used by the deprivileged mode initialisation 
> code
> to allocate pages for the new page table mappings and page frames on the 
> HAP
> page heap.
>
> Signed-off-by: Ben Catterall 
 This is fine for your test box, but isn't fine for systems out there
 without hardware EPT/NPT support.  For older systems like that (or in
 certain specific workloads), shadow paging is used instead.

 This feature is applicable to any HVM domain, which means that it
 shouldn't depend on HAP or shadow paging.

 How much memory is allocated for the depriv area, and what exactly is
 allocated in total?
>>> So, per-vcpu:
>>> - a user mode stack which, from your comments in [RFC 2/4], can be 2 pages
>>> - local data (may or may not be needed, depends on the device) which 
>>> will be around
>>>a page or two.
>>>
>>> Text segment: as per your comments in RFC 2/4, this will be changed to 
>>> be an alias
>>> so no extra memory.
 I expect it isn't very much, and would suggest using
 d->arch.paging.alloc_page() instead (which is the generic "get me some
 memory accounted against the domain" helper) which looks as if it should
 suffice.
>> Whie I agree that it would be good to account this to the domain,
>> paging->alloc_page() is an internal _paging assistance_ helper. :)
>> This new allocation is nothing to do with mm/paging-assistance, so
>> either it should find its own memory or the hap/shadow pool needs to
>> be made more generic.
> ...at which point other HVM overheads - VMCx pages, bitmaps &c - could
> be allocated from it as well.

 I agree very much in principle, but I believe other threads have
settles on all allocations being global, or per-pcpu, which means no
per-domain allocation.

(Not that we shouldn't have a general per-domain pool longterm)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Linux 4.2-rc5:

2015-08-10 Thread Ross Lagerwall

On 08/06/2015 08:51 PM, li...@eikelenboom.it wrote:

Hi Ross,

On my dom0 with a linux 4.2-rc5 kernel i encoutered the splat below.
It's probably related to your patch that went in just for 4.2-rc5:
"xen/events/fifo: Handle linked events when closing a port"

--
Sander

[   49.020173] [ cut here ]
[   49.020187] WARNING: CPU: 0 PID: 1 at
drivers/xen/events/events_fifo.c:395 evtchn_fifo_close+0xbd/0xc0()
[   49.020191] Modules linked in:
[   49.020198] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.2.0-rc5-20150804-linus-doflr+ #1
[   49.020200] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS
V1.8B1 09/13/2010
[   49.020208]  81faaae8 880059b9bad8 81aed513

[   49.020214]   880059b9bb18 810c7280
0041
[   49.020219]  0041  880059807ca8
880059807c00
[   49.020220] Call Trace:
[   49.020233]  [] dump_stack+0x45/0x57
[   49.020240]  [] warn_slowpath_common+0x80/0xc0
[   49.020245]  [] warn_slowpath_null+0x15/0x20
[   49.020249]  [] evtchn_fifo_close+0xbd/0xc0
[   49.020278]  [] xen_evtchn_close+0x1d/0x60
[   49.020281]  [] ? irq_get_irq_data+0x9/0x20
[   49.020282]  [] shutdown_pirq+0x4b/0x70
[   49.020283]  [] irq_shutdown+0x34/0x70
[   49.020285]  [] __free_irq+0x19d/0x1e0
[   49.020286]  [] free_irq+0x48/0xb0
[   49.020287]  [] i8042_probe+0x38f/0x693
[   49.020291]  [] platform_drv_probe+0x2f/0x90
[   49.020292]  [] driver_probe_device+0x1af/0x2d0
[   49.020293]  [] __driver_attach+0x8b/0x90
[   49.020294]  [] ? driver_probe_device+0x2d0/0x2d0
[   49.020296]  [] bus_for_each_dev+0x5f/0x90
[   49.020297]  [] driver_attach+0x19/0x20
[   49.020298]  [] bus_add_driver+0x1ab/0x220
[   49.020299]  [] driver_register+0x5b/0xe0
[   49.020300]  [] __platform_driver_register+0x45/0x50
[   49.020301]  [] __platform_driver_probe+0x31/0xe0
[   49.020303]  [] __platform_create_bundle+0xa3/0xd0
[   49.020304]  [] ? i8042_toggle_aux+0x6c/0x6c
[   49.020305]  [] ? i8042_probe+0x693/0x693
[   49.020306]  [] i8042_init+0x3d0/0x3f6
[   49.020308]  [] do_one_initcall+0x87/0x1d0
[   49.020310]  [] kernel_init_freeable+0x1db/0x263
[   49.020312]  [] ? rest_init+0x80/0x80
[   49.020314]  [] kernel_init+0x9/0xe0
[   49.020315]  [] ret_from_fork+0x3f/0x70
[   49.020317]  [] ? rest_init+0x80/0x80
[   49.020320] ---[ end trace 64c385518fcbbfa1 ]---



Thanks.

This means that the event channel is being closed with interrupts 
disabled, so it cannot guarantee that the event is not linked in. This 
is not a regression in behavior -- previously this was _never_ 
guaranteed and just silently ignored. However, we should find a way to 
fix this completely, to avoid warning spam.


Regards,
--
Ross Lagerwall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen 4.6 retrospective] More public/easy to find information about the release schedule

2015-08-10 Thread Lars Kurth

> On 10 Aug 2015, at 09:33, Wei Liu  wrote:
> 
> On Fri, Aug 07, 2015 at 05:36:57PM +0200, Roger Pau Monné wrote:
>> = Issue / Observation =
>> 
>> The information about the release schedule is not clearly published
>> anywhere apart from the mailing lists, which makes it hard for
>> non-developers (or even for developers) given that the mailing list
>> traffic for xen-devel is high.

This is not entirely true: see 
http://wiki.xenproject.org/wiki/Xen_Project_Hypervisor_Roadmap/4.6
However, I think https://www.freebsd.org/releng/ and also the odd mail on 
announce@ would make sense

Lars
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Can I xc_await_suspend() for a suspend event caused by another application?

2015-08-10 Thread Razvan Cojocaru
> I've noticed that the xc_suspend_evtchn_init() functions in xenguest.h
> connect the client application to a guest suspend event channel, and
> that it's possible to subscribe to these events, in theory even if you
> never signal the channel (i.e. even if you don't issue a suspend request).
> 
> But all the in-tree examples I've read seem to first signal the channel
> and then wait on the same channel for the confirmation that the guest is
> suspending.
> 
> Can the event channel be used solely to inform a monitoring application
> that _another_ application (for example, xl) has requested a suspend?

Looking at the code and what documentation I could find, it turns out
that not only xc_suspend_evtchn_init() has not been designed to do what
I am after, but it additionally only works for (some) PV guests.

I've also looked into monitoring writes to ~/control/shutdown, but that
of course also only applies to PV domains.

What I need is to be able to know that any domain (but mostly HVMs) is
about to be suspended, so that I can do some hooks cleanup in the guest
while it's still running, and I'm looking for a way to do that without
modifying Xen at all, otherwise I'd need to send out a new kind of
vm_event or something similar, and obviously simpler is better. Could
maybe someone kindly point out a reliable way to do that with the
current Xen code, if it exists and I just haven't been able to find it?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen 4.6 retrospective] More public/easy to find information about the release schedule

2015-08-10 Thread Fabio Fantoni

Il 10/08/2015 11:06, Lars Kurth ha scritto:

On 10 Aug 2015, at 09:33, Wei Liu  wrote:

On Fri, Aug 07, 2015 at 05:36:57PM +0200, Roger Pau Monné wrote:

= Issue / Observation =

The information about the release schedule is not clearly published
anywhere apart from the mailing lists, which makes it hard for
non-developers (or even for developers) given that the mailing list
traffic for xen-devel is high.

This is not entirely true: see 
http://wiki.xenproject.org/wiki/Xen_Project_Hypervisor_Roadmap/4.6
Hi, I take a look to the wiki page, can be good mention also the "add of 
ahci disk controller support for hvm domUs"? I saw that is missed in 
features list.

However, I think https://www.freebsd.org/releng/ and also the odd mail on 
announce@ would make sense

Lars
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode

2015-08-10 Thread Tim Deegan
Hi,

At 17:45 +0100 on 06 Aug (1438883118), Ben Catterall wrote:
> The process to switch into and out of deprivileged mode can be likened to
> setjmp/longjmp.
> 
> To enter deprivileged mode, we take a copy of the stack from the guest's
> registers up to the current stack pointer.

This copy is pretty unfortunate, but I can see that avoiding it will
be a bit complex.  Could we do something with more stacks?  AFAICS
there have to be three stacks anyway:

 - one to hold the depriv execution context;
 - one to hold the privileged execution context; and
 - one to take interrupts on.

So maybe we could do some fiddling to make Xen take interrupts on a
different stack while we're depriv'd?

If we do have to copy, we could track whether the original stack has
been clobbered by an interrupt, and so avoid (at least some of) the
copy back afterwards?

One nit in the assembler - if I've followed correctly, this saved IP:

> +/* Perform a near call to push rip onto the stack */
> +call   1f

is returned to (with adjustments) here:

> +/* Go to user mode return code */
> +jmp*(%rsi)

It would be good to make this a matched pair of call/ret if we can;
the CPU has special branch prediction tracking for function calls that
gets confused by a call that's not returned to.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Can I xc_await_suspend() for a suspend event caused by another application?

2015-08-10 Thread Andrew Cooper
On 10/08/15 10:28, Razvan Cojocaru wrote:
>> I've noticed that the xc_suspend_evtchn_init() functions in xenguest.h
>> connect the client application to a guest suspend event channel, and
>> that it's possible to subscribe to these events, in theory even if you
>> never signal the channel (i.e. even if you don't issue a suspend request).
>>
>> But all the in-tree examples I've read seem to first signal the channel
>> and then wait on the same channel for the confirmation that the guest is
>> suspending.
>>
>> Can the event channel be used solely to inform a monitoring application
>> that _another_ application (for example, xl) has requested a suspend?
> Looking at the code and what documentation I could find, it turns out
> that not only xc_suspend_evtchn_init() has not been designed to do what
> I am after, but it additionally only works for (some) PV guests.
>
> I've also looked into monitoring writes to ~/control/shutdown, but that
> of course also only applies to PV domains.
>
> What I need is to be able to know that any domain (but mostly HVMs) is
> about to be suspended, so that I can do some hooks cleanup in the guest
> while it's still running, and I'm looking for a way to do that without
> modifying Xen at all, otherwise I'd need to send out a new kind of
> vm_event or something similar, and obviously simpler is better. Could
> maybe someone kindly point out a reliable way to do that with the
> current Xen code, if it exists and I just haven't been able to find it?

What point of suspend do you need to be before?

Hooking the actual point of suspend is quite easy - hook
SCHEDOP_shutdown (for domains doing PV suspend themselves) and
SCHEDOP_remote_shutdown (for qemu suspending a guest on behalf of a non
PV action).

Off the top of my head, the following methods of starting a suspend from
outside of the guest are:

* ~/control/shutdown, but a guest (including PV aware HVM) can ignore this
* ~/control/sysrq, to send a sysrq key
* Inject an ACPI "power button" or "lid closed" GPE, but neither of
these might result in suspend.

Furthermore, hooking those doesn't catch an internal attempt to suspend.

I think your best bet is to actually hook the suspend/shutdown path
inside the guest.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-08-10 Thread Julien Grall

Hi Wei,

On 08/08/2015 15:55, Wei Liu wrote:

  struct xenvif_rx_meta {
int id;
@@ -80,16 +81,18 @@ struct xenvif_rx_meta {
  /* Discriminate from any valid pending_idx value. */
  #define INVALID_PENDING_IDX 0x

-#define MAX_BUFFER_OFFSET PAGE_SIZE
+#define MAX_BUFFER_OFFSET XEN_PAGE_SIZE

  #define MAX_PENDING_REQS XEN_NETIF_TX_RING_SIZE

+#define MAX_XEN_SKB_FRAGS (65536 / XEN_PAGE_SIZE + 1)
+


It might be clearer if you add a comment saying the maximum number of
frags is derived from the page size of the grant page, which happens to
be XEN_PAGE_SIZE at the moment.


Will do.



In the future we need to figure out the page size of grant page in a
dynamic way. We shall cross the bridge when we get there.


Right, there is few other places where we would need to do that too (see 
MAX_BUFFER_OFFSET for instance).



[..]


+   info.page = page;
+   gnttab_foreach_grant_in_range(page, offset, bytes,
+ xenvif_gop_frag_copy_grant,
+ &info);


Looks like I need to at least wait until the API is settle before giving
my ack.


size -= bytes;
+   offset = 0;


This looks wrong. Should be offset += bytes.


With the new implementation of the loop, each iteration will be on a 
different page.

So only the first page has an offset different than zero.





-   /* Next frame */
-   if (offset == PAGE_SIZE && size) {
+   /* Next page */
+   if (size) {
BUG_ON(!PageCompound(page));
page++;
-   offset = 0;


And this should not be deleted, I think.

What is the reason for changing offset calculation? I think there is
still compound page when using 64K page.


The compound pages are still working ... gnttab_foreach_grant_in_range 
is called once per page. So the offset can be reset to 0 every time. No 
need to add code which would make the result less clear.


We only need to know if the size is not 0 to get the next page.

The patch may not be clear enough to see it's working so I've copied the 
result loop below:


while (size > 0) {
BUG_ON(offset >= PAGE_SIZE);

bytes = PAGE_SIZE - offset;
if (bytes > size)
bytes = size;

info.page = page;
gnttab_foreach_grant_in_range(page, offset, bytes,
 xenvif_gop_frag_copy_grant,
  &info);
size -= bytes;
offset = 0;

/* Next page */
if (size) {
BUG_ON(!PageCompound(page));
page++;
}
}

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 9/9] xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn

2015-08-10 Thread Julien Grall

Hi Boris,

On 07/08/2015 22:33, Boris Ostrovsky wrote:

On 08/07/2015 12:34 PM, Julien Grall wrote:

The variable xen_store_mfn is effectively storing a GFN and not an MFN.

Signed-off-by: Julien Grall 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 

 I think that the assignation of xen_start_info in
 xenstored_local_init is pointless. Although I haven't drop it just
 in case.


I think so too (but that would be a separate patch if you decide to do it).


I will send a separate patch to drop it.



Reviewed-by: Boris Ostrovsky 


Thank you!

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Can I xc_await_suspend() for a suspend event caused by another application?

2015-08-10 Thread Razvan Cojocaru
On 08/10/2015 12:52 PM, Andrew Cooper wrote:
> On 10/08/15 10:28, Razvan Cojocaru wrote:
>>> I've noticed that the xc_suspend_evtchn_init() functions in xenguest.h
>>> connect the client application to a guest suspend event channel, and
>>> that it's possible to subscribe to these events, in theory even if you
>>> never signal the channel (i.e. even if you don't issue a suspend request).
>>>
>>> But all the in-tree examples I've read seem to first signal the channel
>>> and then wait on the same channel for the confirmation that the guest is
>>> suspending.
>>>
>>> Can the event channel be used solely to inform a monitoring application
>>> that _another_ application (for example, xl) has requested a suspend?
>> Looking at the code and what documentation I could find, it turns out
>> that not only xc_suspend_evtchn_init() has not been designed to do what
>> I am after, but it additionally only works for (some) PV guests.
>>
>> I've also looked into monitoring writes to ~/control/shutdown, but that
>> of course also only applies to PV domains.
>>
>> What I need is to be able to know that any domain (but mostly HVMs) is
>> about to be suspended, so that I can do some hooks cleanup in the guest
>> while it's still running, and I'm looking for a way to do that without
>> modifying Xen at all, otherwise I'd need to send out a new kind of
>> vm_event or something similar, and obviously simpler is better. Could
>> maybe someone kindly point out a reliable way to do that with the
>> current Xen code, if it exists and I just haven't been able to find it?
> 
> What point of suspend do you need to be before?
> 
> Hooking the actual point of suspend is quite easy - hook
> SCHEDOP_shutdown (for domains doing PV suspend themselves) and
> SCHEDOP_remote_shutdown (for qemu suspending a guest on behalf of a non
> PV action).
> 
> Off the top of my head, the following methods of starting a suspend from
> outside of the guest are:
> 
> * ~/control/shutdown, but a guest (including PV aware HVM) can ignore this
> * ~/control/sysrq, to send a sysrq key
> * Inject an ACPI "power button" or "lid closed" GPE, but neither of
> these might result in suspend.
> 
> Furthermore, hooking those doesn't catch an internal attempt to suspend.
> 
> I think your best bet is to actually hook the suspend/shutdown path
> inside the guest.

Thanks for the reply! I haven't been clear, sorry - it's not the guest
that I need to be aware of a suspend beforehand, but the monitoring
application that lives in dom0 or a similarly privileged domain.

When xl, or XenCenter via XAPI, issues a request that results in a guest
suspend (for example, 'xl save'), I'd like the monitoring application
(the one doing introspection, subscribed to vm_events) to be able to
know while the guest is still running, so that it can have a chance to
do some cleanup specific to this case.

The way I do it now, I've subscribed to @releaseDomain xenstore events,
but when these come the guest has already become history. This is good
enough for "regular" guest shutdowns, but it gets trickier with 'xl
save'-type scenarios.

So the question is, basically, is there currenly a way for a dom0
application to know that somebody issued 'xl save' on an interesting
guest, via xenstore or some other mechanism, _before_ @releaseDomain
comes (i.e. while the guest is still alive)?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 0/6] xen/PMU: PMU support for Xen PV(H) guests

2015-08-10 Thread Stefano Stabellini
On Sun, 9 Aug 2015, Boris Ostrovsky wrote:
> Changes in v6:
> * Fix ARM builds (as suggested by Julien):
>   o Make XEN_SYMS depend on X86 (patch 1)
>   o Add CONFIG_XEN_HAVE_PVMMU and use it in drivers/xen/sys-hypervisor.c
> (patch 2)
> * Adjust release dates in Documentation/ABI/testing/sysfs-hypervisor-pmu
>   (patch 2)

I confirm that it compiles just fine on ARM now.


> Boris Ostrovsky (6):
>   xen: xensyms support
>   xen/PMU: Sysfs interface for setting Xen PMU mode
>   xen/PMU: Initialization code for Xen PMU
>   xen/PMU: Describe vendor-specific PMU registers
>   xen/PMU: Intercept PMU-related MSR and APIC accesses
>   xen/PMU: PMU emulation code
> 
>  Documentation/ABI/testing/sysfs-hypervisor-pmu |  23 +
>  arch/x86/include/asm/xen/hypercall.h   |   6 +
>  arch/x86/include/asm/xen/interface.h   | 123 ++
>  arch/x86/xen/Kconfig   |   1 +
>  arch/x86/xen/Makefile  |   2 +-
>  arch/x86/xen/apic.c|   6 +
>  arch/x86/xen/enlighten.c   |  13 +-
>  arch/x86/xen/pmu.c | 572 
> +
>  arch/x86/xen/pmu.h |  15 +
>  arch/x86/xen/smp.c |  29 +-
>  arch/x86/xen/suspend.c |  23 +-
>  drivers/xen/Kconfig|  11 +
>  drivers/xen/sys-hypervisor.c   | 136 +-
>  drivers/xen/xenfs/Makefile |   1 +
>  drivers/xen/xenfs/super.c  |   3 +
>  drivers/xen/xenfs/xenfs.h  |   1 +
>  drivers/xen/xenfs/xensyms.c| 152 +++
>  include/xen/interface/platform.h   |  18 +
>  include/xen/interface/xen.h|   2 +
>  include/xen/interface/xenpmu.h |  94 
>  20 files changed, 1220 insertions(+), 11 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu
>  create mode 100644 arch/x86/xen/pmu.c
>  create mode 100644 arch/x86/xen/pmu.h
>  create mode 100644 drivers/xen/xenfs/xensyms.c
>  create mode 100644 include/xen/interface/xenpmu.h
> 
> -- 
> 1.8.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 1/4] HVM x86 deprivileged mode: Page allocation helper

2015-08-10 Thread Tim Deegan
At 09:55 +0100 on 10 Aug (1439200516), Andrew Cooper wrote:
> On 10/08/2015 09:52, Tim Deegan wrote:
> >> Whie I agree that it would be good to account this to the domain,
> >> paging->alloc_page() is an internal _paging assistance_ helper. :)
> >> This new allocation is nothing to do with mm/paging-assistance, so
> >> either it should find its own memory or the hap/shadow pool needs to
> >> be made more generic.
> > ...at which point other HVM overheads - VMCx pages, bitmaps &c - could
> > be allocated from it as well.
> 
>  I agree very much in principle, but I believe other threads have
> settles on all allocations being global, or per-pcpu, which means no
> per-domain allocation.

Grand so. :)

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 4/4] HVM x86 deprivileged mode: Trap handlers for deprivileged mode

2015-08-10 Thread Tim Deegan
Hi,

> @@ -685,8 +685,17 @@ static int hap_page_fault(struct vcpu *v, unsigned long 
> va,
>  {
>  struct domain *d = v->domain;
>  
> +/* If we get a page fault whilst in HVM security user mode */
> +if( v->user_mode == 1 )
> +{
> +printk("HVM: #PF (%u:%u) whilst in user mode\n",
> + d->domain_id, v->vcpu_id);
> +domain_crash_synchronous();
> +}
> +

This should happen in paging_fault() so it can guard the
shadow-pagetable paths too.  Once it's there, it'll need a check for
is_hvm_vcpu() as well as for user_mode.  Maybe have a helper function
'is_hvm_deprivileged_vcpu()' to do both checks, also used in
hvm_deprivileged_check_trap() &c.

>  HAP_ERROR("Intercepted a guest #PF (%u:%u) with HAP enabled.\n",
>d->domain_id, v->vcpu_id);
> +
>  domain_crash(d);
>  return 0;
>  }
> diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
> index 9f5a6c6..19d465f 100644
> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -74,6 +74,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * opt_nmi: one of 'ignore', 'dom0', or 'fatal'.
> @@ -500,6 +501,11 @@ static void do_guest_trap(
>  struct trap_bounce *tb;
>  const struct trap_info *ti;
>  
> +/* If we take the trap whilst in HVM deprivileged mode
> + * then we should crash the domain.
> + */
> +hvm_deprivileged_check_trap(__FUNCTION__);

I wonder whether it would be better to switch to an IDT with all
unacceptable traps stubbed out, rather than have to blacklist them all
separately.  Probably not - this check is cheap, and maintaining the
parallel tables would be a pain. 

Or maybe there's some single point upstream of here, in the asm
handlers, that would catch all the cases where this check is needed?

In any case, the check needs to return an error code so the caller
knows to return without running the rest of the handler (and likewise
elsewhere).

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 9/9] xen/xenbus: Rename the variable xen_store_mfn to xen_store_gfn

2015-08-10 Thread Stefano Stabellini
On Fri, 7 Aug 2015, Julien Grall wrote:
> The variable xen_store_mfn is effectively storing a GFN and not an MFN.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 


> Cc: Konrad Rzeszutek Wilk 
> Cc: Boris Ostrovsky 
> Cc: David Vrabel 
> 
> I think that the assignation of xen_start_info in
> xenstored_local_init is pointless. Although I haven't drop it just
> in case.
> 
> Changes in v3:
> - Patch added.
> ---
>  drivers/xen/xenbus/xenbus_probe.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/xen/xenbus/xenbus_probe.c 
> b/drivers/xen/xenbus/xenbus_probe.c
> index b3870f4..3cbe055 100644
> --- a/drivers/xen/xenbus/xenbus_probe.c
> +++ b/drivers/xen/xenbus/xenbus_probe.c
> @@ -75,7 +75,7 @@ EXPORT_SYMBOL_GPL(xen_store_interface);
>  enum xenstore_init xen_store_domain_type;
>  EXPORT_SYMBOL_GPL(xen_store_domain_type);
>  
> -static unsigned long xen_store_mfn;
> +static unsigned long xen_store_gfn;
>  
>  static BLOCKING_NOTIFIER_HEAD(xenstore_chain);
>  
> @@ -711,7 +711,7 @@ static int __init xenstored_local_init(void)
>   if (!page)
>   goto out_err;
>  
> - xen_store_mfn = xen_start_info->store_mfn = virt_to_gfn((void *)page);
> + xen_store_gfn = xen_start_info->store_mfn = virt_to_gfn((void *)page);
>  
>   /* Next allocate a local port which xenstored can bind to */
>   alloc_unbound.dom= DOMID_SELF;
> @@ -785,12 +785,12 @@ static int __init xenbus_init(void)
>   err = xenstored_local_init();
>   if (err)
>   goto out_error;
> - xen_store_interface = gfn_to_virt(xen_store_mfn);
> + xen_store_interface = gfn_to_virt(xen_store_gfn);
>   break;
>   case XS_PV:
>   xen_store_evtchn = xen_start_info->store_evtchn;
> - xen_store_mfn = xen_start_info->store_mfn;
> - xen_store_interface = gfn_to_virt(xen_store_mfn);
> + xen_store_gfn = xen_start_info->store_mfn;
> + xen_store_interface = gfn_to_virt(xen_store_gfn);
>   break;
>   case XS_HVM:
>   err = hvm_get_parameter(HVM_PARAM_STORE_EVTCHN, &v);
> @@ -800,9 +800,9 @@ static int __init xenbus_init(void)
>   err = hvm_get_parameter(HVM_PARAM_STORE_PFN, &v);
>   if (err)
>   goto out_error;
> - xen_store_mfn = (unsigned long)v;
> + xen_store_gfn = (unsigned long)v;
>   xen_store_interface =
> - xen_remap(xen_store_mfn << PAGE_SHIFT, PAGE_SIZE);
> + xen_remap(xen_store_gfn << PAGE_SHIFT, PAGE_SIZE);
>   break;
>   default:
>   pr_warn("Xenstore state unknown\n");
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V5 3/7] libxl: add pvusb API

2015-08-10 Thread George Dunlap
On 08/07/2015 03:31 AM, Chun Yan Liu wrote:
>> +("devid", libxl_devid),   
>> +("version", integer),   
>> +("ports", integer),   
>> +("backend_domid", libxl_domid),   
>> +("backend_domname", string),   
>> +   ])   
>> +   
>> +libxl_device_usb = Struct("device_usb", [   
>> +("ctrl", libxl_devid),   
>> +("port", integer),   
>> +("hostbus",   integer),   
>> +("hostaddr",  integer),   
>> +])   
>>  
>> I think we do want to plan for the future here by doing something like this: 
>>  
>> libxl_device_usb = Struct("device_usb", [ 
>> ("ctrl", libxl_devid), 
>> ("port", integer), 
>> ("u", KeyedUnion(None, libxl_device_usb_type, "devtype", 
>>   [("hostdev", Struct(None, [ 
>>  ("hostbus",   integer), 
>>  ("hostaddr",  integer) ])) 
>>])) 
>>  ]) 
>>  
> 
> Yes, that's the future look. For pvusb, currenlty with kernel pvusb driver, 
> the
> devtype is not really necessary. But I can add 'devtype' if it is preferred 
> now.

Yes, I think as much as possible we want the interface which is actually
checked in to be forward-compatible.

Thanks!
 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 02/20] arm/xen: Drop pte_mfn and mfn_pte

2015-08-10 Thread Stefano Stabellini
On Fri, 7 Aug 2015, Julien Grall wrote:
> They are not used in common code expect in one place in balloon.c which is
> only compiled when Linux is using PV MMU. It's not the case on ARM.
> 
> Rather than worrying how to handle the 64KB case, drop them.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 


> Stefano Stabellini 
> Russell King 
> 
> Changes in v3:
> - Patch added
> ---
>  arch/arm/include/asm/xen/page.h | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
> index 1279563..98c9fc3 100644
> --- a/arch/arm/include/asm/xen/page.h
> +++ b/arch/arm/include/asm/xen/page.h
> @@ -13,9 +13,6 @@
>  
>  #define phys_to_machine_mapping_valid(pfn) (1)
>  
> -#define pte_mfn  pte_pfn
> -#define mfn_pte  pfn_pte
> -
>  /* Xen machine address */
>  typedef struct xmaddr {
>   phys_addr_t maddr;
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode

2015-08-10 Thread Andrew Cooper
On 10/08/15 10:49, Tim Deegan wrote:
> Hi,
>
> At 17:45 +0100 on 06 Aug (1438883118), Ben Catterall wrote:
>> The process to switch into and out of deprivileged mode can be likened to
>> setjmp/longjmp.
>>
>> To enter deprivileged mode, we take a copy of the stack from the guest's
>> registers up to the current stack pointer.
> This copy is pretty unfortunate, but I can see that avoiding it will
> be a bit complex.  Could we do something with more stacks?  AFAICS
> there have to be three stacks anyway:
>
>  - one to hold the depriv execution context;
>  - one to hold the privileged execution context; and
>  - one to take interrupts on.
>
> So maybe we could do some fiddling to make Xen take interrupts on a
> different stack while we're depriv'd?

That should happen naturally by virtue of the privilege level change
involved in taking the interrupt.  Conceptually, taking interrupts from
depriv mode is no different to taking them in a PV guest.

Some complications which come to mind (none insurmountable):

* Under this model, PV exception handlers should copy themselves onto
the privileged execution stack.
* Currently, the IST handlers  copy themselves onto the primary stack if
they interrupt guest context.
* AMD Task Register on vmexit.  (this old gem)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 3.2 110/110] x86/ldt: Make modify_ldt synchronous

2015-08-10 Thread Ben Hutchings
3.2.71-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Andy Lutomirski 

commit 37868fe113ff2ba814b3b4eb12df214df555f8dc upstream.

modify_ldt() has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

This fixes some fallout from the CVE-2015-5157 fixes.

Signed-off-by: Andy Lutomirski 
Reviewed-by: Borislav Petkov 
Cc: Andrew Cooper 
Cc: Andy Lutomirski 
Cc: Boris Ostrovsky 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Sasha Levin 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: secur...@kernel.org 
Cc: xen-devel 
Link: 
http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2:
 - Adjust context
 - Drop comment changes in switch_mm()
 - Drop changes to get_segment_base() in arch/x86/kernel/cpu/perf_event.c
 - Open-code lockless_dereference(), smp_store_release(), on_each_cpu_mask()]
Signed-off-by: Ben Hutchings 
---
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -277,21 +277,6 @@ static inline void clear_LDT(void)
set_ldt(NULL, 0);
 }
 
-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-   set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-   preempt_disable();
-   load_LDT_nolock(pc);
-   preempt_enable();
-}
-
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) 
<< 24));
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
  * we put the segment information here.
  */
 typedef struct {
-   void *ldt;
-   int size;
+   struct ldt_struct *ldt;
 
 #ifdef CONFIG_X86_64
/* True if mm supports a task running in 32 bit compatibility mode. */
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -16,6 +16,51 @@ static inline void paravirt_activate_mm(
 #endif /* !CONFIG_PARAVIRT */
 
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+   /*
+* Xen requires page-aligned LDTs with special permissions.  This is
+* needed to prevent us from installing evil descriptors such as
+* call gates.  On native, we could merge the ldt_struct and LDT
+* allocations, but it's not worth trying to optimize.
+*/
+   struct desc_struct *entries;
+   int size;
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+   struct ldt_struct *ldt;
+
+   /* smp_read_barrier_depends synchronizes with barrier in install_ldt */
+   ldt = ACCESS_ONCE(mm->context.ldt);
+   smp_read_barrier_depends();
+
+   /*
+* Any change to mm->context.ldt is followed by an IPI to all
+* CPUs with the mm active.  The LDT will not be freed until
+* after the IPI is handled by all such CPUs.  This means that,
+* if the ldt_struct changes before we return, the values we see
+* will be safe, and the new values will be loaded before we run
+* any user code.
+*
+* NB: don't try to convert this to use RCU without extreme care.
+* We would still need IRQs off, because we don't want to change
+* the local LDT after an IPI loaded a newer value than the one
+* that we can see.
+*/
+
+   if (unlikely(ldt))
+   set_ldt(ldt->entries, ldt->size);
+   else
+   clear_LDT();
+
+   DEBUG_LOCKS_WARN_ON(preemptible());
+}
+
+/*
  * Used for LDT copy/destruction.
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -52,7 +97,7 @@ static inline void switch_mm(struct mm_s
 * load the LDT, if the LDT is different:
 */
if (unlikely(prev->context.ldt != next->context.ldt))
-   load_LDT_nolock(&next->context);
+   load_mm_ldt(next);
}
 #ifdef CONFIG_SMP
else {
@@ -65,7 +110,7 @@ static inline void switch_mm(struct mm_s
 * to make sure to use no freed page tables.
 */
load_cr3(next->pgd);
-   load_LDT_nolock(&next->context);
+   load_mm_ldt(next);
}
}
 #endif
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1225,7 +1225,7 @@ void __cpuinit cpu_init(void)
 

[Xen-devel] [PATCH 3.2 109/110] x86/xen: Probe target addresses in set_aliased_prot() before the hypercall

2015-08-10 Thread Ben Hutchings
3.2.71-rc1 review patch.  If anyone has any objections, please let me know.

--

From: Andy Lutomirski 

commit aa1acff356bbedfd03b544051f5b371746735d89 upstream.

The update_va_mapping hypercall can fail if the VA isn't present
in the guest's page tables.  Under certain loads, this can
result in an OOPS when the target address is in unpopulated vmap
space.

While we're at it, add comments to help explain what's going on.

This isn't a great long-term fix.  This code should probably be
changed to use something like set_memory_ro.

Signed-off-by: Andy Lutomirski 
Cc: Andrew Cooper 
Cc: Andy Lutomirski 
Cc: Boris Ostrovsky 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: David Vrabel 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Jan Beulich 
Cc: Konrad Rzeszutek Wilk 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Sasha Levin 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: secur...@kernel.org 
Cc: xen-devel 
Link: 
http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.l...@kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings 
---
 arch/x86/xen/enlighten.c | 40 
 1 file changed, 40 insertions(+)

--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -321,6 +321,7 @@ static void set_aliased_prot(void *v, pg
pte_t pte;
unsigned long pfn;
struct page *page;
+   unsigned char dummy;
 
ptep = lookup_address((unsigned long)v, &level);
BUG_ON(ptep == NULL);
@@ -330,6 +331,32 @@ static void set_aliased_prot(void *v, pg
 
pte = pfn_pte(pfn, prot);
 
+   /*
+* Careful: update_va_mapping() will fail if the virtual address
+* we're poking isn't populated in the page tables.  We don't
+* need to worry about the direct map (that's always in the page
+* tables), but we need to be careful about vmap space.  In
+* particular, the top level page table can lazily propagate
+* entries between processes, so if we've switched mms since we
+* vmapped the target in the first place, we might not have the
+* top-level page table entry populated.
+*
+* We disable preemption because we want the same mm active when
+* we probe the target and when we issue the hypercall.  We'll
+* have the same nominal mm, but if we're a kernel thread, lazy
+* mm dropping could change our pgd.
+*
+* Out of an abundance of caution, this uses __get_user() to fault
+* in the target address just in case there's some obscure case
+* in which the target address isn't readable.
+*/
+
+   preempt_disable();
+
+   pagefault_disable();/* Avoid warnings due to being atomic. */
+   __get_user(dummy, (unsigned char __user __force *)v);
+   pagefault_enable();
+
if (HYPERVISOR_update_va_mapping((unsigned long)v, pte, 0))
BUG();
 
@@ -341,6 +368,8 @@ static void set_aliased_prot(void *v, pg
BUG();
} else
kmap_flush_unused();
+
+   preempt_enable();
 }
 
 static void xen_alloc_ldt(struct desc_struct *ldt, unsigned entries)
@@ -348,6 +377,17 @@ static void xen_alloc_ldt(struct desc_st
const unsigned entries_per_page = PAGE_SIZE / LDT_ENTRY_SIZE;
int i;
 
+   /*
+* We need to mark the all aliases of the LDT pages RO.  We
+* don't need to call vm_flush_aliases(), though, since that's
+* only responsible for flushing aliases out the TLBs, not the
+* page tables, and Xen will flush the TLB for us if needed.
+*
+* To avoid confusing future readers: none of this is necessary
+* to load the LDT.  The hypervisor only checks this when the
+* LDT is faulted in due to subsequent descriptor access.
+*/
+
for(i = 0; i < entries; i += entries_per_page)
set_aliased_prot(ldt + i, PAGE_KERNEL_RO);
 }


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V6 0/7] xen pvusb toolstack work

2015-08-10 Thread Chunyan Liu
This patch series is to add pvusb toolstack work, supporting hot add|remove
USB device to|from guest and specify USB device in domain configuration file.

Changes to V5:
* Address George's comments on libxl API and Ian's comments on
  libxl_read_sysfs_file_content

V5 is here:
http://lists.xen.org/archives/html/xen-devel/2015-06/msg04052.html

V4 is here:
http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg01327.html

Related Discussion Threads:
http://www.redhat.com/archives/libvir-list/2014-June/msg00038.html
http://lists.xen.org/archives/html/xen-devel/2014-06/msg00086.html

  <<< pvusb work introduction >>>

1. Overview

There are two general methods for passing through individual host
devices to a guest. The first is via an emulated USB device
controller; the second is PVUSB.

Additionally, there are two ways to add USB devices to a guest: via
the config file at domain creation time, and via hot-plug while the VM
is running.

* Emulated USB

In emulated USB, the device model (qemu) presents an emulated USB
controller to the guest. The device model process then grabs control
of the device from domain 0 and and passes the USB commands between
the guest OS and the host USB device.

This method is only available to HVM domains, and is not available for
domains running with device model stubdomains.

* PVUSB

PVUSB uses a paravirtialized front-end/back-end interface, similar to
the traditional Xen PV network and disk protocols. In order to use
PVUSB, you need usbfront in your guest OS, and usbback in dom0 (or
your USB driver domain).

2. Specifying a host USB device

QEMU qmp commands allows USB devices to be specified either by their
bus address (in the form bus.device) or their device tag (in the form
vendorid:deviceid).

Each way of specifying has its advantages:

Specifying by device tag will always get the same device,
regardless of where the device ends up in the USB bus topology.
However, if there are two identical devices, it will not allow you to
specify which one.

Specifying by bus address will always allow you to choose a
specific device, even if you have duplicates. However, the bus address
may change depending on which port you plugged the device into, and
possibly also after a reboot.

To avoid duplication of vendorid:deviceid, we'll use bus address to
specify host USB device in xl toolstack.

You can use lsusb to list the USB devices on the system:

Bus 001 Device 003: ID 0424:2514 Standard Microsystems Corp. USB 2.0
Hub
Bus 003 Device 002: ID f617:0905
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 004: ID 0424:2640 Standard Microsystems Corp. USB 2.0
Hub
Bus 001 Device 005: ID 0424:4060 Standard Microsystems Corp. Ultra
Fast Media Reader
Bus 001 Device 006: ID 046d:c016 Logitech, Inc. Optical Wheel Mouse

To pass through the Logitec mouse, for instance, you could specify
1.6 (remove leading zeroes).

Note: USB hubs can not be assigned to guest.

3. PVUSB toolstack

* Specify USB device in xl config file

You can just specify usb devices, like:
usbdev=['1.6']

Then it will create a USB controller automatically and attach the USB
device to the first available USB controller:port.

or, you can explicitly specify usb controllers and usb devices, like:
usbctrl=['verison=1, ports=4', 'version=2, ports=8', ]
usbdev=['1.6, controller=0, port=1']

Then it will create two USB controllers as you specified.
And if controller and port are specified in usb config, then it will
attach the USB device to that controller:port. About the controller
and port value:
Each USB controller has a index (or called devid) based on 0. The 1st
controller has index 0, the 2nd controller has index 1, ...
Under controller, each port has a port number based on 1. In above
configuration, the 1st controller will have port 1,2,3,4.

* Hot-Plug USB device

To attach a USB device, you should first create a USB controller.
e.g.
xl usb-ctrl-attach domain [version=1|2] [ports=value]
By default, it will create a USB2.0 controller with 8 ports.

Then you could attach a USB device.
e.g.
xl usb-attach domain 1.6 [controller=index port=number]
By default, it will find the 1st available controller:port to attach
the USB device.

You could view USB device status of the domain by usb-list.
e.g.
xl usb-list domain
It will list USB controllers and USB devices under each controller.

You could detach a USB device with usb-detach command.
e.g.
xl usb-detach domain 1.6

You can also remove the whole USB controller by usb-ctrl-detach
command.
e.g.
xl usb-ctrl-detach domain 0
It will remove the USB controller with index 0 and all USB devices
under it.

4. PVUSB Libxl implementation

* usb-ctrl-attach
To create a usb controller, we need:
1) generate usb controler related information
2) write usb controller frontend/backend info to xenstore
PVUSB frontend and backend driver will probe xenstore paths and build
connection between frontend and backend.

* usb-ctrl-detach
To remove a usb c

[Xen-devel] [PATCH V6 4/7] libxl: add libxl_device_usb_assignable_list API

2015-08-10 Thread Chunyan Liu
Add API for listing assignable USB devices info.
Assignable USB device means the USB device type is assignable and
it's not assigned to any guest yet.

Signed-off-by: Chunyan Liu 
---
This could be squashed with previous patch. Split because there is
some dispute on this. If this is acceptable, could be squashed,
otherwise could be removed.

 tools/libxl/libxl.h   |  3 +++
 tools/libxl/libxl_pvusb.c | 53 +++
 2 files changed, 56 insertions(+)

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 05b6331..d1360ce 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1433,6 +1433,9 @@ int libxl_device_usbctrl_getinfo(libxl_ctx *ctx, uint32_t 
domid,
  libxl_usbctrlinfo *usbctrlinfo);
 
 /* USB Devices */
+libxl_device_usb *
+libxl_device_usb_assignable_list(libxl_ctx *ctx, int *num);
+
 int libxl_device_usb_add(libxl_ctx *ctx, uint32_t domid, libxl_device_usb *usb,
  const libxl_asyncop_how *ao_how)
  LIBXL_EXTERNAL_CALLERS_ONLY;
diff --git a/tools/libxl/libxl_pvusb.c b/tools/libxl/libxl_pvusb.c
index d4c4c03..e56fa07 100644
--- a/tools/libxl/libxl_pvusb.c
+++ b/tools/libxl/libxl_pvusb.c
@@ -552,6 +552,59 @@ static bool is_usb_assignable(libxl__gc *gc, 
libxl_device_usb *usb)
 return classcode != USBHUB_CLASS_CODE;
 }
 
+libxl_device_usb *
+libxl_device_usb_assignable_list(libxl_ctx *ctx, int *num)
+{
+GC_INIT(ctx);
+libxl_device_usb *usbs = NULL;
+libxl_device_usb *assigned;
+int num_assigned;
+struct dirent *de;
+DIR *dir;
+
+*num = 0;
+
+if (libxl__device_usb_assigned_list(gc, &assigned, &num_assigned) < 0)
+goto out;
+
+if (!(dir = opendir(SYSFS_USB_DEV)))
+goto out;
+
+while ((de = readdir(dir))) {
+libxl_device_usb *usb;
+int bus = -1, addr = -1;
+
+if (!de->d_name)
+continue;
+
+usb_busaddr_from_busid(gc, de->d_name, &bus, &addr);
+if (bus < 1 || addr < 1)
+continue;
+
+GCNEW(usb);
+usb->u.hostdev.hostbus = bus;
+usb->u.hostdev.hostaddr = addr;
+
+if (!is_usb_assignable(gc, usb))
+continue;
+
+if (is_usb_in_array(assigned, num_assigned, usb))
+continue;
+
+usbs = libxl__realloc(NOGC, usbs, sizeof(*usbs) * (*num + 1));
+libxl_device_usb_init(usbs + *num);
+usbs[*num].u.hostdev.hostbus = bus;
+usbs[*num].u.hostdev.hostaddr = addr;
+(*num)++;
+}
+
+closedir(dir);
+
+out:
+GC_FREE;
+return usbs;
+}
+
 /* get usb devices under certain usb controller */
 static int
 libxl__device_usb_list_per_usbctrl(libxl__gc *gc, uint32_t domid,
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V6 1/7] libxl: export some functions for pvusb use

2015-08-10 Thread Chunyan Liu
Signed-off-by: Chunyan Liu 
Signed-off-by: Simon Cao 

---
 tools/libxl/libxl.c  | 4 ++--
 tools/libxl/libxl_internal.h | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 083f099..006e8da 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1995,7 +1995,7 @@ out:
 }
 
 /* common function to get next device id */
-static int libxl__device_nextid(libxl__gc *gc, uint32_t domid, char *device)
+int libxl__device_nextid(libxl__gc *gc, uint32_t domid, char *device)
 {
 char *dompath, **l;
 unsigned int nb;
@@ -2014,7 +2014,7 @@ static int libxl__device_nextid(libxl__gc *gc, uint32_t 
domid, char *device)
 return nextid;
 }
 
-static int libxl__resolve_domid(libxl__gc *gc, const char *name,
+int libxl__resolve_domid(libxl__gc *gc, const char *name,
 uint32_t *domid)
 {
 if (!name)
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6ea6c83..6013628 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -1152,6 +1152,9 @@ _hidden int libxl__init_console_from_channel(libxl__gc 
*gc,
  libxl__device_console *console,
  int dev_num,
  libxl_device_channel *channel);
+_hidden int libxl__device_nextid(libxl__gc *gc, uint32_t domid, char *device);
+_hidden int libxl__resolve_domid(libxl__gc *gc, const char *name,
+ uint32_t *domid);
 
 /*
  * For each aggregate type which can be used as an input we provide:
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V6 3/7] libxl: add pvusb API

2015-08-10 Thread Chunyan Liu
Add pvusb APIs, including:
 - attach/detach (create/destroy) virtual usb controller.
 - attach/detach usb device
 - list usb controller and usb devices
 - some other helper functions

Signed-off-by: Chunyan Liu 
Signed-off-by: Simon Cao 

---
changes:
  - Address George's comments:
  * Update libxl_device_usb_getinfo to read ctrl/port only and
get other information.
  * Update backend path according to xenstore frontend 'xxx/backend'
entry instead of using TOOLSTACK_DOMID.
  * Use 'type' to indicate qemu/pv instead of previous naming 'protocol'.
  * Add USB 'devtype' union, currently only includes "hostdev"

 tools/libxl/Makefile |2 +-
 tools/libxl/libxl.c  |   53 ++
 tools/libxl/libxl.h  |   65 ++
 tools/libxl/libxl_device.c   |4 +
 tools/libxl/libxl_internal.h |   20 +-
 tools/libxl/libxl_osdeps.h   |   13 +
 tools/libxl/libxl_pvusb.c| 1320 ++
 tools/libxl/libxl_types.idl  |   59 ++
 tools/libxl/libxl_types_internal.idl |1 +
 tools/libxl/libxl_utils.c|   16 +
 tools/libxl/libxl_utils.h|5 +
 11 files changed, 1556 insertions(+), 2 deletions(-)
 create mode 100644 tools/libxl/libxl_pvusb.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 9036076..cdb50fe 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -103,7 +103,7 @@ LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o 
libxl_pci.o \
libxl_stream_read.o libxl_stream_write.o \
libxl_save_callout.o _libxl_save_msgs_callout.o \
libxl_qmp.o libxl_event.o libxl_fork.o \
-   libxl_dom_suspend.o $(LIBXL_OBJS-y)
+   libxl_dom_suspend.o libxl_pvusb.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
 LIBXL_OBJS += _libxl_types.o libxl_flask.o _libxl_types_internal.o
 
diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 006e8da..35843a8 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4179,11 +4179,54 @@ DEFINE_DEVICE_REMOVE(vtpm, destroy, 1)
 
 
/**/
 
+/* Macro for defining device remove/destroy functions for usbctrl */
+/* Following functions are defined:
+ * libxl_device_usbctrl_remove
+ * libxl_device_usbctrl_destroy
+ */
+
+#define DEFINE_DEVICE_REMOVE_EXT(type, removedestroy, f)\
+int libxl_device_##type##_##removedestroy(libxl_ctx *ctx,   \
+uint32_t domid, libxl_device_##type *type,  \
+const libxl_asyncop_how *ao_how)\
+{   \
+AO_CREATE(ctx, domid, ao_how);  \
+libxl__device *device;  \
+libxl__ao_device *aodev;\
+int rc; \
+\
+GCNEW(device);  \
+rc = libxl__device_from_##type(gc, domid, type, device);\
+if (rc != 0) goto out;  \
+\
+GCNEW(aodev);   \
+libxl__prepare_ao_device(ao, aodev);\
+aodev->action = LIBXL__DEVICE_ACTION_REMOVE;\
+aodev->dev = device;\
+aodev->callback = device_addrm_aocomplete;  \
+aodev->force = f;   \
+libxl__initiate_device_##type##_remove(egc, aodev); \
+\
+out:\
+if (rc) return AO_CREATE_FAIL(rc);  \
+return AO_INPROGRESS;   \
+}
+
+
+DEFINE_DEVICE_REMOVE_EXT(usbctrl, remove, 0)
+DEFINE_DEVICE_REMOVE_EXT(usbctrl, destroy, 1)
+
+#undef DEFINE_DEVICE_REMOVE_EXT
+
+/**/
+
 /* Macro for defining device addition functions in a compact way */
 /* The following functions are defined:
  * libxl_device_disk_add
  * libxl_device_nic_add
  * libxl_device_vtpm_add
+ * libxl_device_usbctrl_add
+ * libxl_device_usb_add
  */
 
 #define DEFINE_DEVICE_ADD(type) \
@@ -4215,6 +4258,12 @@ DEFINE_DEVICE_ADD(nic)
 /* vtpm */
 DEFINE_DEVICE_ADD(vtpm)
 
+/* usbctrl */
+DEFINE_DEVICE_ADD(usbctrl)
+
+/* usb */
+DEFINE_DEVICE_

[Xen-devel] [PATCH V6 6/7] xl: add usb-assignable-list command

2015-08-10 Thread Chunyan Liu
Add xl usb-assignable-list command to list assignable USB devices.
Assignable USB device means the USB device type is assignable and
it's not assigned to any guest yet.

Signed-off-by: Chunyan Liu 
---
  Same as "libxl: add libxl_device_usb_assignable_list API" patch,
  this patch could be sqaushed to previous one. Split because of
  some dispute. Could be squashed if acceptable, otherwise could
  be removed.

 tools/libxl/xl.h  |  1 +
 tools/libxl/xl_cmdimpl.c  | 27 +++
 tools/libxl/xl_cmdtable.c |  4 
 3 files changed, 32 insertions(+)

diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index e136fdf..e579ecc 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -85,6 +85,7 @@ int main_blockdetach(int argc, char **argv);
 int main_vtpmattach(int argc, char **argv);
 int main_vtpmlist(int argc, char **argv);
 int main_vtpmdetach(int argc, char **argv);
+int main_usbassignable_list(int argc, char **argv);
 int main_usbctrl_attach(int argc, char **argv);
 int main_usbctrl_detach(int argc, char **argv);
 int main_usbattach(int argc, char **argv);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 3e4d93a..e33871c 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3322,6 +3322,33 @@ int main_cd_insert(int argc, char **argv)
 return 0;
 }
 
+static void usb_assignable_list(void)
+{
+libxl_device_usb *usbs;
+int num, i;
+
+usbs = libxl_device_usb_assignable_list(ctx, &num);
+
+for (i = 0; i < num; i++) {
+printf("%d.%d\n", usbs[i].u.hostdev.hostbus,
+   usbs[i].u.hostdev.hostaddr);
+}
+
+libxl_device_usb_list_free(usbs, num);
+}
+
+int main_usbassignable_list(int argc, char **argv)
+{
+int opt;
+
+SWITCH_FOREACH_OPT(opt, "", NULL, "usb-assignable-list", 0) {
+/* No options */
+}
+
+usb_assignable_list();
+return 0;
+}
+
 int main_usbctrl_attach(int argc, char **argv)
 {
 uint32_t domid;
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 46f276e..ba51331 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -576,6 +576,10 @@ struct cmd_spec cmd_table[] = {
   "List information about USB devices for a domain",
   "",
 },
+{ "usb-assignable-list",
+  &main_usbassignable_list, 0, 0,
+  "List all assignable USB devices",
+},
 };
 
 int cmdtable_len = sizeof(cmd_table)/sizeof(struct cmd_spec);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V6 7/7] domcreate: support pvusb in configuration file

2015-08-10 Thread Chunyan Liu
Add code to support pvusb in domain config file. One could specify
usbctrl and usb in domain's configuration file and create domain,
then usb controllers will be created and usb device would be attached
to guest automatically.

One could specify usb controllers and usb devices in config file
like this:
usbctrl=['version=2,ports=4', 'version=1, ports=4', ]
usbdev=['2.1,controller=0,port=1', ]

Signed-off-by: Chunyan Liu 
Signed-off-by: Simon Cao 
---
 docs/man/xl.cfg.pod.5|  75 +
 tools/libxl/libxl_create.c   |  73 ++--
 tools/libxl/libxl_device.c   |   4 ++
 tools/libxl/libxl_internal.h |   8 
 tools/libxl/xl_cmdimpl.c | 112 ++-
 5 files changed, 268 insertions(+), 4 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 80e51bb..45f3ff3 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -709,6 +709,81 @@ Note this may be overridden by rdm_policy option in PCI 
device configuration.
 
 =back
 
+=item B
+
+Specifies the USB controllers created for this guest. Each
+B has the form C where:
+
+=over 4
+
+=item B
+
+Possible Bs are:
+
+=over 4
+
+=item B
+
+Specifies the protocol to implement USB controller, could be "pv" (indicates
+PVUSB) or "qemu" (indicates QEMU emulated). Currently only "pv" is supported.
+
+=item B
+
+Specifies version of the USB controller, could be 1 (USB1.1) or 2 (USB2.0).
+Default is 2 (USB2.0).
+
+=item B
+
+Specifies port number of the USB controller. Default is 8.
+
+Each USB controller will have an index starting from 0. On the same
+controller, each port will have an index starting from 1.
+
+E.g.
+usbctrl=["version=1,ports=4", "version=2,ports=8",]
+The first controller has:
+controller index = 0, and port 1,2,3,4.
+The second controller has:
+controller index = 1, and port 1,2,3,4,5,6,7,8.
+
+=back
+
+=back
+
+=item B
+
+Specifies the host USB devices to passthrough to this guest. Each
+B has the form C where:
+
+=over 4
+
+=item B
+
+Identifies the busnum.devnum of the USB device from the host perspective.
+This is the same scheme as used in the output of C for the device in
+question.
+
+=item B
+
+Possible Bs are:
+
+=over 4
+
+=item B
+
+Specifies USB controller index, to which controller the USB device is attached.
+
+=item B
+
+Specifies USB port index, to which port the USB device is attached. 
B
+is valid only when B is specified. Without
+B, it will find the first available USB controller:port
+and use it. If there is no controller at all, it will create one.
+
+=back
+
+=back
+
 =item B
 
 Specifies the host PCI devices to passthrough to this guest. Each 
B
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 2348ffc..2988991 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -729,6 +729,10 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *aodevs,
 
 static void domcreate_attach_vtpms(libxl__egc *egc, libxl__multidev *multidev,
int ret);
+static void domcreate_attach_usbctrls(libxl__egc *egc,
+  libxl__multidev *multidev, int ret);
+static void domcreate_attach_usbs(libxl__egc *egc, libxl__multidev *multidev,
+   int ret);
 static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *aodevs,
  int ret);
 static void domcreate_attach_dtdev(libxl__egc *egc,
@@ -1385,13 +1389,13 @@ static void domcreate_attach_vtpms(libxl__egc *egc,
if (d_config->num_vtpms > 0) {
/* Attach vtpms */
libxl__multidev_begin(ao, &dcs->multidev);
-   dcs->multidev.callback = domcreate_attach_pci;
+   dcs->multidev.callback = domcreate_attach_usbctrls;
libxl__add_vtpms(egc, ao, domid, d_config, &dcs->multidev);
libxl__multidev_prepared(egc, &dcs->multidev, 0);
return;
}
 
-   domcreate_attach_pci(egc, multidev, 0);
+   domcreate_attach_usbctrls(egc, multidev, 0);
return;
 
 error_out:
@@ -1399,6 +1403,69 @@ error_out:
domcreate_complete(egc, dcs, ret);
 }
 
+static void domcreate_attach_usbctrls(libxl__egc *egc,
+  libxl__multidev *multidev, int ret)
+{
+libxl__domain_create_state *dcs = CONTAINER_OF(multidev, *dcs, multidev);
+STATE_AO_GC(dcs->ao);
+int domid = dcs->guest_domid;
+
+libxl_domain_config *const d_config = dcs->guest_config;
+
+if (ret) {
+LOG(ERROR, "unable to add vtpm devices");
+goto error_out;
+}
+
+if (d_config->num_usbctrls > 0) {
+/* Attach usbctrls */
+libxl__multidev_begin(ao, &dcs->multidev);
+dcs->multidev.callback = domcreate_attach_usbs;
+libxl__add_usbctrls(egc, ao, domid, d_config, &dcs->multidev);
+libxl__multidev_prepared(egc, &dcs->multidev, 0);
+return;
+}
+
+domcreate_attach_usbs(egc, multidev, 0);
+return;
+
+

[Xen-devel] [PATCH V6 2/7] libxl_read_file_contents: add new entry to read sysfs file

2015-08-10 Thread Chunyan Liu
Sysfs file has size=4096 but actual file content is less than that.
Current libxl_read_file_contents will treat it as error when file size
and actual file content differs, so reading sysfs file content with
this function always fails.

Add a new entry libxl_read_sysfs_file_contents to handle sysfs file
specially. It would be used in later pvusb work.

Signed-off-by: Chunyan Liu 

---
Changes:
  - read one more byte to check bigger size problem.

 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_utils.c| 51 ++--
 2 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6013628..f98f089 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -4001,6 +4001,8 @@ void libxl__bitmap_copy_best_effort(libxl__gc *gc, 
libxl_bitmap *dptr,
 
 int libxl__count_physical_sockets(libxl__gc *gc, int *sockets);
 #endif
+_hidden int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char 
*filename,
+   void **data_r, int *datalen_r);
 
 /*
  * Local variables:
diff --git a/tools/libxl/libxl_utils.c b/tools/libxl/libxl_utils.c
index bfc9699..9234efb 100644
--- a/tools/libxl/libxl_utils.c
+++ b/tools/libxl/libxl_utils.c
@@ -322,8 +322,10 @@ out:
 return rc;
 }
 
-int libxl_read_file_contents(libxl_ctx *ctx, const char *filename,
- void **data_r, int *datalen_r) {
+static int libxl_read_file_contents_core(libxl_ctx *ctx, const char *filename,
+ void **data_r, int *datalen_r,
+ bool tolerate_shrinking_file)
+{
 GC_INIT(ctx);
 FILE *f = 0;
 uint8_t *data = 0;
@@ -359,20 +361,34 @@ int libxl_read_file_contents(libxl_ctx *ctx, const char 
*filename,
 datalen = stab.st_size;
 
 if (stab.st_size && data_r) {
-data = malloc(datalen);
+data = malloc(datalen + 1);
 if (!data) goto xe;
 
-rs = fread(data, 1, datalen, f);
-if (rs != datalen) {
-if (ferror(f))
+rs = fread(data, 1, datalen + 1, f);
+if (rs > datalen) {
+LOG(ERROR, "%s increased size while we were reading it",
+filename);
+goto xe;
+}
+
+if (rs < datalen) {
+if (ferror(f)) {
 LOGE(ERROR, "failed to read %s", filename);
-else if (feof(f))
-LOG(ERROR, "%s changed size while we were reading it",
-   filename);
-else
+goto xe;
+} else if (feof(f)) {
+if (tolerate_shrinking_file) {
+datalen = rs;
+} else {
+LOG(ERROR, "%s shrunk size while we were reading it",
+filename);
+goto xe;
+}
+} else {
 abort();
-goto xe;
+}
 }
+
+data = realloc(data, datalen);
 }
 
 if (fclose(f)) {
@@ -396,6 +412,19 @@ int libxl_read_file_contents(libxl_ctx *ctx, const char 
*filename,
 return e;
 }
 
+int libxl_read_file_contents(libxl_ctx *ctx, const char *filename,
+ void **data_r, int *datalen_r)
+{
+return libxl_read_file_contents_core(ctx, filename, data_r, datalen_r, 0);
+}
+
+int libxl_read_sysfs_file_contents(libxl_ctx *ctx, const char *filename,
+   void **data_r, int *datalen_r)
+{
+return libxl_read_file_contents_core(ctx, filename, data_r, datalen_r, 1);
+}
+
+
 #define READ_WRITE_EXACTLY(rw, zero_is_eof, constdata)\
   \
   int libxl_##rw##_exactly(libxl_ctx *ctx, int fd, \
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V6 5/7] xl: add pvusb commands

2015-08-10 Thread Chunyan Liu
Add pvusb commands: usb-ctrl-attach, usb-ctrl-detach, usb-list,
usb-attach and usb-detach.

To attach a usb device to guest through pvusb, one could follow
following example:

 #xl usb-ctrl-attach test_vm version=1 num_ports=8

 #xl usb-list test_vm
 will show the usb controllers and port usage under the domain.

 #xl usb-attach test_vm 1.6
 will find the first usable controller:port, and attach usb
 device whose bus address is 1.6 (busnum is 1, devnum is 6)
 to it. One could also specify which  and which .

 #xl usb-detach test_vm 0 1
 will detach USB device under controller 0 port 1.

 #xl usb-ctrl-detach test_vm dev_id
 will destroy the controller with specified dev_id. Dev_id
 can be traced in usb-list info.

Signed-off-by: Chunyan Liu 
Signed-off-by: Simon Cao 
---
 docs/man/xl.pod.1 |  40 
 tools/libxl/xl.h  |   5 +
 tools/libxl/xl_cmdimpl.c  | 230 ++
 tools/libxl/xl_cmdtable.c |  25 +
 4 files changed, 300 insertions(+)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index f22c3f3..4c92c78 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1345,6 +1345,46 @@ List pass-through pci devices for a domain.
 
 =back
 
+=head1 USB PASS-THROUGH
+
+=over 4
+
+=item B I I[] [I] 
[I]
+
+Create a new USB controller for the specified domain.
+B is the usb controller type, currently only support 'pv'.
+B is the usb controller version, could be 1 (USB1.1) or 2 
(USB2.0).
+B is the total ports of the usb controller.
+By default, it will create a USB2.0 controller with 8 ports.
+
+=item B I I
+
+Destroy a USB controller from the specified domain.
+B is devid of the USB controller.
+
+If B<-f> is specified, B is going to forcefully remove the device even
+without guest's collaboration.
+
+=item B I I [I 
[I]]
+
+Hot-plug a new pass-through USB device to the specified domain.
+B is the busnum.devnum of the physical USB device to pass-through.
+B B is the USB controller:port to hotplug the
+USB device to. By default, it will find the first available controller:port
+and use it; if there is no controller, it will create one.
+
+=item B I I I
+
+Hot-unplug a previously assigned USB device from a domain.
+B and B is USB controller:port in guest where 
the
+USB device is attached to.
+
+=item B I
+
+List pass-through usb devices for a domain.
+
+=back
+
 =head1 TMEM
 
 =over 4
diff --git a/tools/libxl/xl.h b/tools/libxl/xl.h
index 13bccba..e136fdf 100644
--- a/tools/libxl/xl.h
+++ b/tools/libxl/xl.h
@@ -85,6 +85,11 @@ int main_blockdetach(int argc, char **argv);
 int main_vtpmattach(int argc, char **argv);
 int main_vtpmlist(int argc, char **argv);
 int main_vtpmdetach(int argc, char **argv);
+int main_usbctrl_attach(int argc, char **argv);
+int main_usbctrl_detach(int argc, char **argv);
+int main_usbattach(int argc, char **argv);
+int main_usbdetach(int argc, char **argv);
+int main_usblist(int argc, char **argv);
 int main_uptime(int argc, char **argv);
 int main_claims(int argc, char **argv);
 int main_tmem_list(int argc, char **argv);
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 499a05c..3e4d93a 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3322,6 +3322,236 @@ int main_cd_insert(int argc, char **argv)
 return 0;
 }
 
+int main_usbctrl_attach(int argc, char **argv)
+{
+uint32_t domid;
+int opt, rc = 1;
+char *oparg;
+libxl_device_usbctrl usbctrl;
+
+SWITCH_FOREACH_OPT(opt, "", NULL, "usb-ctrl-attach", 1) {
+/* No options */
+}
+
+domid = find_domain(argv[optind++]);
+
+libxl_device_usbctrl_init(&usbctrl);
+
+while (argc > optind) {
+if (MATCH_OPTION("type", argv[optind], oparg)) {
+if (!strcmp(oparg, "pv")) {
+usbctrl.type = LIBXL_USBCTRL_TYPE_PV;
+} else {
+fprintf(stderr, "unsupported type `%s'\n", oparg);
+goto out;
+}
+} else if (MATCH_OPTION("version", argv[optind], oparg)) {
+usbctrl.version = atoi(oparg);
+if (usbctrl.version != 1 && usbctrl.version != 2) {
+fprintf(stderr, "unsupported version `%s'\n", oparg);
+goto out;
+}
+} else if (MATCH_OPTION("ports", argv[optind], oparg)) {
+usbctrl.ports = atoi(oparg);
+if (usbctrl.ports < 1 || usbctrl.ports > 31) {
+fprintf(stderr, "unsupported ports `%s'\n", oparg);
+goto out;
+}
+} else {
+fprintf(stderr, "unrecognized argument `%s'\n", argv[optind]);
+goto out;
+}
+optind++;
+}
+
+rc = libxl_device_usbctrl_add(ctx, domid, &usbctrl, 0);
+if (rc)
+fprintf(stderr, "libxl_device_usbctrl_add failed.\n");
+
+out:
+libxl_device_usbctrl_dispose(&usbctrl);
+return rc;
+}
+
+int main_usbctrl_detach(int argc, char **argv)
+{
+uint32_t domid;
+int opt, devid, rc;
+libxl_

Re: [Xen-devel] [PATCH v3 04/20] xen/grant: Introduce helpers to split a page into grant

2015-08-10 Thread Stefano Stabellini
On Fri, 7 Aug 2015, Julien Grall wrote:
> Currently, a grant is always based on the Xen page granularity (i.e
> 4KB). When Linux is using a different page granularity, a single page
> will be split between multiple grants.
> 
> The new helpers will be in charge to split the Linux page into grants and
^ of splitting

> call a function given by the caller on each grant.
> 
> Also provide an helper to count the number of grants within a given
> contiguous region.
> 
> Note that the x86/include/asm/xen/page.h is now including
> xen/interface/grant_table.h rather than xen/grant_table.h. It's
> necessary because xen/grant_table.h depends on asm/xen/page.h and will
> break the compilation. Furthermore, only definition in
> interface/grant_table.h was required.
   ^ is

> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 


> ---
> Cc: Konrad Rzeszutek Wilk 
> Cc: Boris Ostrovsky 
> Cc: David Vrabel 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> Cc: x...@kernel.org
> 
> Changes in v3:
> - Fix error reported by checkpatch.pl
> - Typoes
> - s/pfn/xen_pfn/ in gnttab_foreach_grant
> - Drop the possibility to use less data. The complexity is moved
> in netback which is the only user
> - Rename gnttab_foreach_grant into gnttab_foreach_grant_in_range
> - s/offset/start/ in gnttab_count_grant and update the
> description of the parameter
> - s/mfn/gfn base on the new terminologies
> - Add EXPORT_SYMBOL_GPL for gnttab_foreach_grant_in_range
> - Use xen_offset_in_page and XEN_PFN_DOWN whenever it's possible
> - Fix compilation on x86.
> 
> Changes in v2:
> - Patch added
> ---
>  arch/x86/include/asm/xen/page.h |  2 +-
>  drivers/xen/grant-table.c   | 26 +
>  include/xen/grant_table.h   | 42 
> +
>  3 files changed, 69 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h
> index 0b762f6..501479e 100644
> --- a/arch/x86/include/asm/xen/page.h
> +++ b/arch/x86/include/asm/xen/page.h
> @@ -12,7 +12,7 @@
>  #include 
>  
>  #include 
> -#include 
> +#include 
>  #include 
>  
>  /* Xen machine address */
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 62f591f..94ae0fd 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -776,6 +776,32 @@ void gnttab_batch_copy(struct gnttab_copy *batch, 
> unsigned count)
>  }
>  EXPORT_SYMBOL_GPL(gnttab_batch_copy);
>  
> +void gnttab_foreach_grant_in_range(struct page *page,
> +unsigned int offset,
> +unsigned int len,
> +xen_grant_fn_t fn,
> +void *data)
> +{
> + unsigned int goffset;
> + unsigned int glen;
> + unsigned long xen_pfn;
> +
> + len = min_t(unsigned int, PAGE_SIZE - offset, len);
> + goffset = xen_offset_in_page(offset);
> +
> + xen_pfn = xen_page_to_pfn(page) + XEN_PFN_DOWN(offset);
> +
> + while (len) {
> + glen = min_t(unsigned int, XEN_PAGE_SIZE - goffset, len);
> + fn(pfn_to_gfn(xen_pfn), goffset, glen, data);
> +
> + goffset = 0;
> + xen_pfn++;
> + len -= glen;
> + }
> +}
> +EXPORT_SYMBOL_GPL(gnttab_foreach_grant_in_range);
> +
>  int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
>   struct gnttab_map_grant_ref *kmap_ops,
>   struct page **pages, unsigned int count)
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 4478f4b..2a8ebe8 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -45,8 +45,10 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  #include 
> +#include 
>  
>  #define GNTTAB_RESERVED_XENSTORE 1
>  
> @@ -224,4 +226,44 @@ static inline struct xen_page_foreign 
> *xen_page_foreign(struct page *page)
>  #endif
>  }
>  
> +/* Split Linux page in chunk of the size of the grant and call fn
> + *
> + * Parameters of fn:
> + *   gfn: guest frame number
> + *   offset: offset in the grant
> + *   len: length of the data in the grant.
> + *   data: internal information
> + */
> +typedef void (*xen_grant_fn_t)(unsigned long gfn, unsigned int offset,
> +unsigned int len, void *data);
> +
> +void gnttab_foreach_grant_in_range(struct page *page,
> +unsigned int offset,
> +unsigned int len,
> +xen_grant_fn_t fn,
> +void *data);
> +
> +/* Helper to get to call fn only on the first "grant chunk" */
> +static inline void gnttab_one_grant(struct page *page, unsigned int offset,
> + unsig

Re: [Xen-devel] [PATCH v3 03/20] xen: Add Xen specific page definition

2015-08-10 Thread Stefano Stabellini
On Fri, 7 Aug 2015, Julien Grall wrote:
> The Xen hypercall interface is always using 4K page granularity on ARM
> and x86 architecture.
> 
> With the incoming support of 64K page granularity for ARM64 guest, it
> won't be possible to re-use the Linux page definition in Xen drivers.
> 
> Introduce Xen page definition helpers based on the Linux page
> definition. They have exactly the same name but prefixed with
> XEN_/xen_ prefix.
> 
> Also modify xen_page_to_gfn to use new Xen page definition.
> 
> Signed-off-by: Julien Grall 
> Reviewed-by: Stefano Stabellini 
> 
> ---
> Cc: Konrad Rzeszutek Wilk 
> Cc: Boris Ostrovsky 
> Cc: David Vrabel 
> 
> Changes in v3:
> - Fix errors reported by checkpatch.pl
> - Rename pfn to xen_pfn in xen_pfn_to_page
> - Add a comment that we assume PAGE_SIZE to be a multiple of
> XEN_PAGE_SIZE
> - s/MFN/GFN/ according to new naming
> - Add Stefano's reviewed-by
> 
> Changes in v2:
> - Add XEN_PFN_UP
> - Add a comment describing the behavior of page_to_pfn
> ---
>  include/xen/page.h | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/include/xen/page.h b/include/xen/page.h
> index f202992..dac1b26 100644
> --- a/include/xen/page.h
> +++ b/include/xen/page.h
> @@ -1,11 +1,36 @@
>  #ifndef _XEN_PAGE_H
>  #define _XEN_PAGE_H
>  
> +#include 
> +
> +/* The hypercall interface supports only 4KB page */
> +#define XEN_PAGE_SHIFT   12
> +#define XEN_PAGE_SIZE(_AC(1, UL) << XEN_PAGE_SHIFT)
> +#define XEN_PAGE_MASK(~(XEN_PAGE_SIZE-1))
> +#define xen_offset_in_page(p)((unsigned long)(p) & ~XEN_PAGE_MASK)
> +
> +/*
> + * We asume that PAGE_SIZE is a multiple of XEN_PAGE_SIZE
  ^ assume

> + * XXX: Add a BUILD_BUG_ON?
> + */
> +
> +#define xen_pfn_to_page(xen_pfn) \
> + ((pfn_to_page(((unsigned long)(xen_pfn) << XEN_PAGE_SHIFT) >> 
> PAGE_SHIFT)))
> +#define xen_page_to_pfn(page)\
> + (((page_to_pfn(page)) << PAGE_SHIFT) >> XEN_PAGE_SHIFT)
> +
> +#define XEN_PFN_PER_PAGE (PAGE_SIZE / XEN_PAGE_SIZE)
> +
> +#define XEN_PFN_DOWN(x)  ((x) >> XEN_PAGE_SHIFT)
> +#define XEN_PFN_UP(x)(((x) + XEN_PAGE_SIZE-1) >> XEN_PAGE_SHIFT)
> +#define XEN_PFN_PHYS(x)  ((phys_addr_t)(x) << XEN_PAGE_SHIFT)
> +
>  #include 
>  
> +/* Return the GFN associated to the first 4KB of the page */
>  static inline unsigned long xen_page_to_gfn(struct page *page)
>  {
> - return pfn_to_gfn(page_to_pfn(page));
> + return pfn_to_gfn(xen_page_to_pfn(page));
>  }
>  
>  struct xen_memory_region {
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 09/20] xen/biomerge: Don't allow biovec to be merge when Linux is not using 4KB page

2015-08-10 Thread Stefano Stabellini
On Fri, 7 Aug 2015, Julien Grall wrote:
> On ARM all dma-capable devices on a same platform may not be protected
> by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
> order to use correctly the device.
> 
> While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
> mapping will screw this contiguous mapping.
> 
> When Linux is using 64KB page granularitary, the page may be split
> accross multiple non-contiguous MFN (Xen is using 4KB page
> granularity). Therefore a DMA request will likely fail.
> 
> Checking that a 64KB page is using contiguous MFN is tedious. For
> now, always says that biovec are not mergeable.
> 
> Signed-off-by: Julien Grall 

Please fix the grammar in the subject line.

Reviewed-by: Stefano Stabellini 


> ---
> Cc: Konrad Rzeszutek Wilk 
> Cc: Boris Ostrovsky 
> Cc: David Vrabel 
> 
> There is some ideas to check whether two biovec could be merged
> (see [1]) but it's not critical and can be consider as a performance
> improvement.
> 
> Changes in v3:
> - Update commit message
> - s/mfn/bfn/ base on the new renaming
> - Update TODO
> 
> Changes in v2:
> - Remove the workaround and check if the Linux page granularity
> is the same as Xen or not
> 
> [1] https://lkml.org/lkml/2015/7/17/418
> ---
>  drivers/xen/biomerge.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/xen/biomerge.c b/drivers/xen/biomerge.c
> index 8ae2fc90..4da69db 100644
> --- a/drivers/xen/biomerge.c
> +++ b/drivers/xen/biomerge.c
> @@ -6,10 +6,18 @@
>  bool xen_biovec_phys_mergeable(const struct bio_vec *vec1,
>  const struct bio_vec *vec2)
>  {
> +#if XEN_PAGE_SIZE == PAGE_SIZE
>   unsigned long bfn1 = pfn_to_bfn(page_to_pfn(vec1->bv_page));
>   unsigned long bfn2 = pfn_to_bfn(page_to_pfn(vec2->bv_page));
>  
>   return __BIOVEC_PHYS_MERGEABLE(vec1, vec2) &&
>   ((bfn1 == bfn2) || ((bfn1+1) == bfn2));
> +#else
> + /*
> +  * XXX: Add support for merging bio_vec when using different page
> +  * size in Xen and Linux.
   ^ sizes

> +  */
> + return 0;
> +#endif
>  }
>  EXPORT_SYMBOL(xen_biovec_phys_mergeable);
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-10 Thread Andrew Cooper
On 10/08/15 09:33, Paul Durrant wrote:
>> -Original Message-
>> From: Wei Liu [mailto:wei.l...@citrix.com]
>> Sent: 10 August 2015 09:26
>> To: Yu Zhang
>> Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; 
>> Ian
>> Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper;
>> Kevin Tian; zhiyuan...@intel.com
>> Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq
>> server
>>
>> On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote:
>>> Currently in ioreq server, guest write-protected ram pages are
>>> tracked in the same rangeset with device mmio resources. Yet
>>> unlike device mmio, which can be in big chunks, the guest write-
>>> protected pages may be discrete ranges with 4K bytes each.
>>>
>>> This patch uses a seperate rangeset for the guest ram pages.
>>> And a new ioreq type, IOREQ_TYPE_MEM, is defined.
>>>
>>> Note: Previously, a new hypercall or subop was suggested to map
>>> write-protected pages into ioreq server. However, it turned out
>>> handler of this new hypercall would be almost the same with the
>>> existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
>>> already a type parameter in this hypercall. So no new hypercall
>>> defined, only a new type is introduced.
>>>
>>> Signed-off-by: Yu Zhang 
>>> ---
>>>  tools/libxc/include/xenctrl.h| 39 +++---
>>>  tools/libxc/xc_domain.c  | 59
>> ++--
>>
>> FWIW the hypercall wrappers look correct to me.
>>
>>> diff --git a/xen/include/public/hvm/hvm_op.h
>> b/xen/include/public/hvm/hvm_op.h
>>> index 014546a..9106cb9 100644
>>> --- a/xen/include/public/hvm/hvm_op.h
>>> +++ b/xen/include/public/hvm/hvm_op.h
>>> @@ -329,8 +329,9 @@ struct xen_hvm_io_range {
>>>  ioservid_t id;   /* IN - server id */
>>>  uint32_t type;   /* IN - type of range */
>>>  # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
>>> -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
>>> +# define HVMOP_IO_RANGE_MMIO   1 /* MMIO range */
>>>  # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func range
>> */
>>> +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */
>> This looks problematic. Maybe you can get away with this because this is
>> a toolstack-only interface?
>>
> Indeed, the old name is a bit problematic. Presumably re-use like this would 
> require an interface version change and some if-defery.

I assume it is an interface used by qemu, so this patch in its currently
state will break things.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-10 Thread Paul Durrant
> -Original Message-
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: 10 August 2015 11:56
> To: Paul Durrant; Wei Liu; Yu Zhang
> Cc: xen-devel@lists.xen.org; Ian Jackson; Stefano Stabellini; Ian Campbell;
> Keir (Xen.org); jbeul...@suse.com; Kevin Tian; zhiyuan...@intel.com
> Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by ioreq
> server
> 
> On 10/08/15 09:33, Paul Durrant wrote:
> >> -Original Message-
> >> From: Wei Liu [mailto:wei.l...@citrix.com]
> >> Sent: 10 August 2015 09:26
> >> To: Yu Zhang
> >> Cc: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini;
> Ian
> >> Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper;
> >> Kevin Tian; zhiyuan...@intel.com
> >> Subject: Re: [PATCH v3 1/2] Differentiate IO/mem resources tracked by
> ioreq
> >> server
> >>
> >> On Mon, Aug 10, 2015 at 11:33:40AM +0800, Yu Zhang wrote:
> >>> Currently in ioreq server, guest write-protected ram pages are
> >>> tracked in the same rangeset with device mmio resources. Yet
> >>> unlike device mmio, which can be in big chunks, the guest write-
> >>> protected pages may be discrete ranges with 4K bytes each.
> >>>
> >>> This patch uses a seperate rangeset for the guest ram pages.
> >>> And a new ioreq type, IOREQ_TYPE_MEM, is defined.
> >>>
> >>> Note: Previously, a new hypercall or subop was suggested to map
> >>> write-protected pages into ioreq server. However, it turned out
> >>> handler of this new hypercall would be almost the same with the
> >>> existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
> >>> already a type parameter in this hypercall. So no new hypercall
> >>> defined, only a new type is introduced.
> >>>
> >>> Signed-off-by: Yu Zhang 
> >>> ---
> >>>  tools/libxc/include/xenctrl.h| 39 +++---
> >>>  tools/libxc/xc_domain.c  | 59
> >> ++--
> >>
> >> FWIW the hypercall wrappers look correct to me.
> >>
> >>> diff --git a/xen/include/public/hvm/hvm_op.h
> >> b/xen/include/public/hvm/hvm_op.h
> >>> index 014546a..9106cb9 100644
> >>> --- a/xen/include/public/hvm/hvm_op.h
> >>> +++ b/xen/include/public/hvm/hvm_op.h
> >>> @@ -329,8 +329,9 @@ struct xen_hvm_io_range {
> >>>  ioservid_t id;   /* IN - server id */
> >>>  uint32_t type;   /* IN - type of range */
> >>>  # define HVMOP_IO_RANGE_PORT   0 /* I/O port range */
> >>> -# define HVMOP_IO_RANGE_MEMORY 1 /* MMIO range */
> >>> +# define HVMOP_IO_RANGE_MMIO   1 /* MMIO range */
> >>>  # define HVMOP_IO_RANGE_PCI2 /* PCI segment/bus/dev/func
> range
> >> */
> >>> +# define HVMOP_IO_RANGE_MEMORY 3 /* MEMORY range */
> >> This looks problematic. Maybe you can get away with this because this is
> >> a toolstack-only interface?
> >>
> > Indeed, the old name is a bit problematic. Presumably re-use like this
> would require an interface version change and some if-defery.
> 
> I assume it is an interface used by qemu, so this patch in its currently
> state will break things.

If QEMU were re-built against the updated header, yes.

  Paul

> 
> ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC v2 0/5] Multi-queue support for xen-blkfront and xen-blkback

2015-08-10 Thread Bob Liu

On 08/10/2015 07:03 PM, Rafal Mielniczuk wrote:
> On 01/07/15 04:03, Jens Axboe wrote:
>> On 06/30/2015 08:21 AM, Marcus Granado wrote:
>>> Hi,
>>>
>>> Our measurements for the multiqueue patch indicate a clear improvement
>>> in iops when more queues are used.
>>>
>>> The measurements were obtained under the following conditions:
>>>
>>> - using blkback as the dom0 backend with the multiqueue patch applied to
>>> a dom0 kernel 4.0 on 8 vcpus.
>>>
>>> - using a recent Ubuntu 15.04 kernel 3.19 with multiqueue frontend
>>> applied to be used as a guest on 4 vcpus
>>>
>>> - using a micron RealSSD P320h as the underlying local storage on a Dell
>>> PowerEdge R720 with 2 Xeon E5-2643 v2 cpus.
>>>
>>> - fio 2.2.7-22-g36870 as the generator of synthetic loads in the guest.
>>> We used direct_io to skip caching in the guest and ran fio for 60s
>>> reading a number of block sizes ranging from 512 bytes to 4MiB. Queue
>>> depth of 32 for each queue was used to saturate individual vcpus in the
>>> guest.
>>>
>>> We were interested in observing storage iops for different values of
>>> block sizes. Our expectation was that iops would improve when increasing
>>> the number of queues, because both the guest and dom0 would be able to
>>> make use of more vcpus to handle these requests.
>>>
>>> These are the results (as aggregate iops for all the fio threads) that
>>> we got for the conditions above with sequential reads:
>>>
>>> fio_threads  io_depth  block_size   1-queue_iops  8-queue_iops
>>>  8   32   512   158K 264K
>>>  8   321K   157K 260K
>>>  8   322K   157K 258K
>>>  8   324K   148K 257K
>>>  8   328K   124K 207K
>>>  8   32   16K84K 105K
>>>  8   32   32K50K  54K
>>>  8   32   64K24K  27K
>>>  8   32  128K11K  13K
>>>
>>> 8-queue iops was better than single queue iops for all the block sizes.
>>> There were very good improvements as well for sequential writes with
>>> block size 4K (from 80K iops with single queue to 230K iops with 8
>>> queues), and no regressions were visible in any measurement performed.
>> Great results! And I don't know why this code has lingered for so long, 
>> so thanks for helping get some attention to this again.
>>
>> Personally I'd be really interested in the results for the same set of 
>> tests, but without the blk-mq patches. Do you have them, or could you 
>> potentially run them?
>>
> Hello,
> 
> We rerun the tests for sequential reads with the identical settings but with 
> Bob Liu's multiqueue patches reverted from dom0 and guest kernels.
> The results we obtained were *better* than the results we got with multiqueue 
> patches applied:
> 
> fio_threads  io_depth  block_size   1-queue_iops  8-queue_iops  
> *no-mq-patches_iops*
>  8   32   512   158K 264K 321K
>  8   321K   157K 260K 328K
>  8   322K   157K 258K 336K
>  8   324K   148K 257K 308K
>  8   328K   124K 207K 188K
>  8   32   16K84K 105K 82K
>  8   32   32K50K  54K 36K
>  8   32   64K24K  27K 16K
>  8   32  128K11K  13K 11K
> 
> We noticed that the requests are not merged by the guest when the multiqueue 
> patches are applied,
> which results in a regression for small block sizes (RealSSD P320h's optimal 
> block size is around 32-64KB).
> 
> We observed similar regression for the Dell MZ-5EA1000-0D3 100 GB 2.5" 
> Internal SSD
> 

Which block scheduler was used in domU?  Please try to "cat 
/sys/block/sdxxx/queue/scheduler".
How about the result if using "noop" scheduler?

Thanks,
Bob Liu

> As I understand blk-mq layer bypasses I/O scheduler which also effectively 
> disables merges.
> Could you explain why it is difficult to enable merging in the blk-mq layer?
> That could help closing the performance gap we observed.
> 
> Otherwise, the tests shows that the multiqueue patches does not improve the 
> performance,
> at least when it comes to sequential read/writes operations.
> 
> Rafal
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

2015-08-10 Thread Stefano Stabellini
On Fri, 7 Aug 2015, Julien Grall wrote:
> For ARM64 guests, Linux is able to support either 64K or 4K page
> granularity. Although, the hypercall interface is always based on 4K
> page granularity.
> 
> With 64K page granularity, a single page will be spread over multiple
> Xen frame.
> 
> To avoid splitting the page into 4K frame, take advantage of the
> extent_order field to directly allocate/free chunk of the Linux page
> size.
> 
> Note that PVMMU is only used for PV guest (which is x86) and the page
> granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
> that because the code has not been modified.
> 
> Signed-off-by: Julien Grall 

This is much better than the previous version. Good idea using the
extent_order field.

I only have a minor comment below.


> ---
> Cc: Konrad Rzeszutek Wilk 
> Cc: Boris Ostrovsky 
> Cc: David Vrabel 
> Cc: Wei Liu 
> 
> Changes in v3:
> - Fix errors reported by checkpatch.pl
> - s/mfn/gfn/ based on the new naming
> - Rather than splitting the page into 4KB chunk, use the
> extent_order field to allocate directly a Linux page size. This
> is avoid lots of code for no benefits.
> 
> Changes in v2:
> - Use xen_apply_to_page to split a page in 4K chunk
> - It's not necessary to have a smaller frame list. Re-use
> PAGE_SIZE
> - Convert reserve_additional_memory to use XEN_... macro
> ---
>  drivers/xen/balloon.c | 47 ---
>  1 file changed, 36 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c
> index 9734649..c8739c8 100644
> --- a/drivers/xen/balloon.c
> +++ b/drivers/xen/balloon.c
> @@ -70,6 +70,9 @@
>  #include 
>  #include 
>  
> +/* Use one extent per PAGE_SIZE to avoid break down into multiple frame */
> +#define EXTENT_ORDER (fls(XEN_PFN_PER_PAGE) - 1)
> +
>  /*
>   * balloon_process() state:
>   *
> @@ -230,6 +233,11 @@ static enum bp_state reserve_additional_memory(long 
> credit)
>   nid = memory_add_physaddr_to_nid(hotplug_start_paddr);
>  
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> + /* We don't support PV MMU when Linux and Xen is using
> +  * different page granularity.
> +  */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>  /*
>   * add_memory() will build page tables for the new memory so
>   * the p2m must contain invalid entries so the correct
> @@ -326,11 +334,11 @@ static enum bp_state reserve_additional_memory(long 
> credit)
>  static enum bp_state increase_reservation(unsigned long nr_pages)
>  {
>   int rc;
> - unsigned long  pfn, i;
> + unsigned long i;
>   struct page   *page;
>   struct xen_memory_reservation reservation = {
>   .address_bits = 0,
> - .extent_order = 0,
> + .extent_order = EXTENT_ORDER,
>   .domid= DOMID_SELF
>   };
>  
> @@ -352,7 +360,11 @@ static enum bp_state increase_reservation(unsigned long 
> nr_pages)
>   nr_pages = i;
>   break;
>   }
> - frame_list[i] = page_to_pfn(page);
> +
> + /* XENMEM_populate_physmap requires a PFN based on Xen
> +  * granularity.
> +  */
> + frame_list[i] = xen_page_to_pfn(page);
>   page = balloon_next_page(page);
>   }
>  
> @@ -366,10 +378,15 @@ static enum bp_state increase_reservation(unsigned long 
> nr_pages)
>   page = balloon_retrieve(false);
>   BUG_ON(page == NULL);
>  
> - pfn = page_to_pfn(page);
> -
>  #ifdef CONFIG_XEN_HAVE_PVMMU
> + /* We don't support PV MMU when Linux and Xen is using
> +  * different page granularity.
> +  */
> + BUILD_BUG_ON(XEN_PAGE_SIZE != PAGE_SIZE);
> +
>   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
> + unsigned long pfn = page_to_pfn(page);
> +
>   set_phys_to_machine(pfn, frame_list[i]);
>  
>   /* Link back into the page tables if not highmem. */
> @@ -396,14 +413,15 @@ static enum bp_state increase_reservation(unsigned long 
> nr_pages)
>  static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>  {
>   enum bp_state state = BP_DONE;
> - unsigned long  pfn, i;
> + unsigned long i;
>   struct page   *page;
>   int ret;
>   struct xen_memory_reservation reservation = {
>   .address_bits = 0,
> - .extent_order = 0,
> + .extent_order = EXTENT_ORDER,
>   .domid= DOMID_SELF
>   };
> + static struct page *pages[ARRAY_SIZE(frame_list)];

This array can be rather large: I would try to avoid it, see below.


>  #ifdef CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
>   if (balloon_stats.hotplug_pages) {
> @@ -426,7 +444,9 @@ static enum bp_state decrease_reservation(unsigned lo

Re: [Xen-devel] [PATCH v3 09/20] xen/biomerge: Don't allow biovec to be merge when Linux is not using 4KB page

2015-08-10 Thread Julien Grall
Hi Stefano,

On 10/08/15 11:50, Stefano Stabellini wrote:
> On Fri, 7 Aug 2015, Julien Grall wrote:
>> On ARM all dma-capable devices on a same platform may not be protected
>> by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
>> order to use correctly the device.
>>
>> While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
>> mapping will screw this contiguous mapping.
>>
>> When Linux is using 64KB page granularitary, the page may be split
>> accross multiple non-contiguous MFN (Xen is using 4KB page
>> granularity). Therefore a DMA request will likely fail.
>>
>> Checking that a 64KB page is using contiguous MFN is tedious. For
>> now, always says that biovec are not mergeable.
>>
>> Signed-off-by: Julien Grall 
> 
> Please fix the grammar in the subject line.

If I made a mistake it's unlikely that I will find myself which one I made.

Anyway, I guess you mean to replace merge by merged? I don't see any
other in the subject.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 09/20] xen/biomerge: Don't allow biovec to be merge when Linux is not using 4KB page

2015-08-10 Thread Stefano Stabellini
On Mon, 10 Aug 2015, Julien Grall wrote:
> Hi Stefano,
> 
> On 10/08/15 11:50, Stefano Stabellini wrote:
> > On Fri, 7 Aug 2015, Julien Grall wrote:
> >> On ARM all dma-capable devices on a same platform may not be protected
> >> by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
> >> order to use correctly the device.
> >>
> >> While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
> >> mapping will screw this contiguous mapping.
> >>
> >> When Linux is using 64KB page granularitary, the page may be split
> >> accross multiple non-contiguous MFN (Xen is using 4KB page
> >> granularity). Therefore a DMA request will likely fail.
> >>
> >> Checking that a 64KB page is using contiguous MFN is tedious. For
> >> now, always says that biovec are not mergeable.
> >>
> >> Signed-off-by: Julien Grall 
> > 
> > Please fix the grammar in the subject line.
> 
> If I made a mistake it's unlikely that I will find myself which one I made.
> 
> Anyway, I guess you mean to replace merge by merged? I don't see any
> other in the subject.

yes and page/pages:

xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB 
pages

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

2015-08-10 Thread Julien Grall
Hi Stefano,

On 10/08/15 12:18, Stefano Stabellini wrote:
>>  /* Link back into the page tables if not highmem. */
>> @@ -396,14 +413,15 @@ static enum bp_state increase_reservation(unsigned 
>> long nr_pages)
>>  static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp)
>>  {
>>  enum bp_state state = BP_DONE;
>> -unsigned long  pfn, i;
>> +unsigned long i;
>>  struct page   *page;
>>  int ret;
>>  struct xen_memory_reservation reservation = {
>>  .address_bits = 0,
>> -.extent_order = 0,
>> +.extent_order = EXTENT_ORDER,
>>  .domid= DOMID_SELF
>>  };
>> +static struct page *pages[ARRAY_SIZE(frame_list)];
> 
> This array can be rather large: I would try to avoid it, see below.

[..]

> 
> I would simply and avoid introducing a new array:
> pfn = (frame_list[i] << XEN_PAGE_SHIFT) >> PAGE_SHIFT;
> page = pfn_to_page(pfn);

Which won't work because the frame_list contains a gfn and not a pfn.
We need to translate back the gfn into a pfn and the into a page.

The cost of the translation may be big and I wanted to avoid anymore
XEN_PAGE_SHIFT in the code. In general we should avoid to deal with 4KB
PFN when it's not necessary, it make the code more confusing to read.

If your only concern is the size of the array, we could decrease the
number of frames by batch. Or allocation the variable once a boot time.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 09/20] xen/biomerge: Don't allow biovec to be merge when Linux is not using 4KB page

2015-08-10 Thread Julien Grall
On 10/08/15 12:25, Stefano Stabellini wrote:
> yes and page/pages:
> 
> xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB 
> pages

Why the ' in biovec's ? Shouldn't we says biovecs directly?

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-08-10 Thread Wei Liu
On Mon, Aug 10, 2015 at 10:57:48AM +0100, Julien Grall wrote:
[...]
> 
> >>+   info.page = page;
> >>+   gnttab_foreach_grant_in_range(page, offset, bytes,
> >>+ xenvif_gop_frag_copy_grant,
> >>+ &info);
> >
> >Looks like I need to at least wait until the API is settle before giving
> >my ack.
> >
> >>size -= bytes;
> >>+   offset = 0;
> >
> >This looks wrong. Should be offset += bytes.
> 
> With the new implementation of the loop, each iteration will be on a
> different page.
> So only the first page has an offset different than zero.
> 
> >
> >>
> >>-   /* Next frame */
> >>-   if (offset == PAGE_SIZE && size) {
> >>+   /* Next page */
> >>+   if (size) {
> >>BUG_ON(!PageCompound(page));
> >>page++;
> >>-   offset = 0;
> >
> >And this should not be deleted, I think.
> >
> >What is the reason for changing offset calculation? I think there is
> >still compound page when using 64K page.
> 
> The compound pages are still working ... gnttab_foreach_grant_in_range is
> called once per page. So the offset can be reset to 0 every time. No need to
> add code which would make the result less clear.
> 
> We only need to know if the size is not 0 to get the next page.
> 
> The patch may not be clear enough to see it's working so I've copied the
> result loop below:
> 
> while (size > 0) {
> BUG_ON(offset >= PAGE_SIZE);
> 
> bytes = PAGE_SIZE - offset;
> if (bytes > size)
> bytes = size;
> 
> info.page = page;
> gnttab_foreach_grant_in_range(page, offset, bytes,
>  xenvif_gop_frag_copy_grant,
>   &info);
> size -= bytes;
> offset = 0;
> 
> /* Next page */
> if (size) {
> BUG_ON(!PageCompound(page));
> page++;
> }
> }
> 

Right. That doesn't mean the original code was wrong or anything. But I
don't want to bikeshed about this.

Please add a comment saying that offset is always 0 starting from second
iteration because the gnttab_foreach_grant_in_range makes sure we handle
one page in one go.

Wei.


> Regards,
> 
> -- 
> Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-08-10 Thread Julien Grall
On 10/08/15 12:39, Wei Liu wrote:
> On Mon, Aug 10, 2015 at 10:57:48AM +0100, Julien Grall wrote:
>> while (size > 0) {
>> BUG_ON(offset >= PAGE_SIZE);
>>
>> bytes = PAGE_SIZE - offset;
>> if (bytes > size)
>> bytes = size;
>>
>> info.page = page;
>> gnttab_foreach_grant_in_range(page, offset, bytes,
>>  xenvif_gop_frag_copy_grant,
>>   &info);
>> size -= bytes;
>> offset = 0;
>>
>> /* Next page */
>> if (size) {
>> BUG_ON(!PageCompound(page));
>> page++;
>> }
>> }
>>
> 
> Right. That doesn't mean the original code was wrong or anything. But I
> don't want to bikeshed about this.

I never said the original code was wrong... The original code was
allowing the possibility to copy less data than the length contained in
page.

In the new version, it has been pushed with the callback
xenvif_gop_frag_copy_grant.

> Please add a comment saying that offset is always 0 starting from second
> iteration because the gnttab_foreach_grant_in_range makes sure we handle
> one page in one go.

I think this is superfluous. To be honest, the comment should have been
on the original version and not in the new one. The construction of the
loop was far from obvious that we copied less data.

In this new version, the reason is not because of
gnttab_foreach_grant_in_range is always a page but how the loop has been
constructed.

If you look how bytes has been defined, it will always contain

min(PAGE_SIZE - offset, size)

So for the first page, this will be PAGE_SIZE - offset. A the end of the
loop we reset the offset 0, indeed we copy all the data of the first
page. For the second page and onwards this will always be PAGE_SIZE
except for the last one where we took size.


Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 19/20] xen/privcmd: Add support for Linux 64KB page granularity

2015-08-10 Thread Stefano Stabellini
On Fri, 7 Aug 2015, Julien Grall wrote:
> The hypercall interface (as well as the toolstack) is always using 4KB
> page granularity. When the toolstack is asking for mapping a series of
> guest PFN in a batch, it expects to have the page map contiguously in
> its virtual memory.
> 
> When Linux is using 64KB page granularity, the privcmd driver will have
> to map multiple Xen PFN in a single Linux page.
> 
> Note that this solution works on page granularity which is a multiple of
> 4KB.
> 
> Signed-off-by: Julien Grall 
> 
> ---
> Cc: Konrad Rzeszutek Wilk 
> Cc: Boris Ostrovsky 
> Cc: David Vrabel 
> 
> I kept the hypercall arguments in remap_data to avoid allocating them on
> the stack every time that remap_pte_fn is called.
> I will keep like that unless someone is strongly disagree.
> 
> Changes in v3:
> - The function to split a Linux page in mutiple Xen page has
> been moved internally. It was the only use (not used anymore in
> the balloon) and it's not quite clear what should be the common
> interface. Differ the question until someone need to use it.
> - s/nr_pfn/numgfns/ to make clear that we are dealing with GFN
> - Use DIV_ROUND_UP rather round_up and fix the usage in
> xen_xlate_unmap_gfn_range
> 
> Changes in v2:
> - Use xen_apply_to_page
> ---
>  drivers/xen/privcmd.c   |   8 ++--
>  drivers/xen/xlate_mmu.c | 124 
> 
>  2 files changed, 89 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index c6deb87..c8798ee 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -446,7 +446,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
> int version)
>   return -EINVAL;
>   }
>  
> - nr_pages = m.num;
> + nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
>   if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
>   return -EINVAL;
>  
> @@ -494,7 +494,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
> int version)
>   goto out_unlock;
>   }
>   if (xen_feature(XENFEAT_auto_translated_physmap)) {
> - ret = alloc_empty_pages(vma, m.num);
> + ret = alloc_empty_pages(vma, nr_pages);
>   if (ret < 0)
>   goto out_unlock;
>   } else
> @@ -518,6 +518,7 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
> int version)
>   state.global_error  = 0;
>   state.version   = version;
>  
> + BUILD_BUG_ON(((PAGE_SIZE / sizeof(xen_pfn_t)) % XEN_PFN_PER_PAGE) != 0);
>   /* mmap_batch_fn guarantees ret == 0 */
>   BUG_ON(traverse_pages_block(m.num, sizeof(xen_pfn_t),
>   &pagelist, mmap_batch_fn, &state));
> @@ -582,12 +583,13 @@ static void privcmd_close(struct vm_area_struct *vma)
>  {
>   struct page **pages = vma->vm_private_data;
>   int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
> + int numgfns = (vma->vm_end - vma->vm_start) >> XEN_PAGE_SHIFT;
>   int rc;
>  
>   if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages)
>   return;
>  
> - rc = xen_unmap_domain_gfn_range(vma, numpgs, pages);
> + rc = xen_unmap_domain_gfn_range(vma, numgfns, pages);
>   if (rc == 0)
>   free_xenballooned_pages(numpgs, pages);
>   else
> diff --git a/drivers/xen/xlate_mmu.c b/drivers/xen/xlate_mmu.c
> index cff2387..a1d3904 100644
> --- a/drivers/xen/xlate_mmu.c
> +++ b/drivers/xen/xlate_mmu.c
> @@ -38,31 +38,28 @@
>  #include 
>  #include 
>  
> -/* map fgfn of domid to lpfn in the current domain */
> -static int map_foreign_page(unsigned long lpfn, unsigned long fgfn,
> - unsigned int domid)
> -{
> - int rc;
> - struct xen_add_to_physmap_range xatp = {
> - .domid = DOMID_SELF,
> - .foreign_domid = domid,
> - .size = 1,
> - .space = XENMAPSPACE_gmfn_foreign,
> - };
> - xen_ulong_t idx = fgfn;
> - xen_pfn_t gpfn = lpfn;
> - int err = 0;
> +typedef void (*xen_gfn_fn_t)(unsigned long gfn, void *data);
>  
> - set_xen_guest_handle(xatp.idxs, &idx);
> - set_xen_guest_handle(xatp.gpfns, &gpfn);
> - set_xen_guest_handle(xatp.errs, &err);
> +/* Break down the pages in 4KB chunk and call fn for each gfn */
> +static void xen_for_each_gfn(struct page **pages, unsigned nr_gfn,
> +  xen_gfn_fn_t fn, void *data)
> +{
> + unsigned long xen_pfn = 0;
> + struct page *page;
> + int i;
>  
> - rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
> - return rc < 0 ? rc : err;
> + for (i = 0; i < nr_gfn; i++) {
> + if ((i % XEN_PFN_PER_PAGE) == 0) {
> + page = pages[i / XEN_PFN_PER_PAGE];

If this functi

[Xen-devel] Does Xen project have test suites for testing xc/xl/hypercall and so on?

2015-08-10 Thread Jinjian (Ken)

Hi all,
   I'm looking into xen's test suites now, and encounter some problems.
   Does xen have unit tests that can be executed by xen contributors to
validate their code-commits before sending a patch to xen-devel?
   In xen-4.5.1(and its upstream), there are sevral test cases in the 
'tests' directory, but these are much like tools, rather than test cases 
at my view. For example, it obtains physical address by hypercall and 
write relevant MSR in mce-test, but it didn't tell us whether the 
address obtained was correct? and is the register written successfully? 
There seems to have no method or expected results to check them.
   If it was unit test, it is obviously not enough for xen. Are there 
any more test methods to ensure the accuracy and stability of xen?


Thank you in advance.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] xen/pcifront: Use monotonic clock

2015-08-10 Thread Abhilash Jindal
Wall time obtained from do_gettimeofday is susceptible to sudden jumps due
to user setting the time or due to NTP.

Monotonic time is constantly increasing time better suited for comparing two
timestamps.
---
 drivers/pci/xen-pcifront.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index f7197a7..5ef3eb7 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -114,7 +114,7 @@ static int do_pci_op(struct pcifront_device *pdev,
struct xen_pci_op *op)
  evtchn_port_t port = pdev->evtchn;
  unsigned irq = pdev->irq;
  s64 ns, ns_timeout;
- struct timeval tv;
+ struct timespec tv;

  spin_lock_irqsave(&pdev->sh_info_lock, irq_flags);

@@ -131,8 +131,8 @@ static int do_pci_op(struct pcifront_device *pdev,
struct xen_pci_op *op)
  * (in the latter case we end up continually re-executing poll() with a
  * timeout in the past). 1s difference gives plenty of slack for error.
  */
- do_gettimeofday(&tv);
- ns_timeout = timeval_to_ns(&tv) + 2 * (s64)NSEC_PER_SEC;
+ ktime_get_ts(&tv);
+ ns_timeout = timespec_to_ns(&tv) + 2 * (s64)NSEC_PER_SEC;

  xen_clear_irq_pending(irq);

@@ -140,8 +140,8 @@ static int do_pci_op(struct pcifront_device *pdev,
struct xen_pci_op *op)
  (unsigned long *)&pdev->sh_info->flags)) {
  xen_poll_irq_timeout(irq, jiffies + 3*HZ);
  xen_clear_irq_pending(irq);
- do_gettimeofday(&tv);
- ns = timeval_to_ns(&tv);
+ ktime_get_ts(&tv);
+ ns = timespec_to_ns(&tv);
  if (ns > ns_timeout) {
  dev_err(&pdev->xdev->dev,
  "pciback not responding!!!\n");
-- 
1.7.9.5
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC v2 0/5] Multi-queue support for xen-blkfront and xen-blkback

2015-08-10 Thread Rafal Mielniczuk
On 01/07/15 04:03, Jens Axboe wrote:
> On 06/30/2015 08:21 AM, Marcus Granado wrote:
>> Hi,
>>
>> Our measurements for the multiqueue patch indicate a clear improvement
>> in iops when more queues are used.
>>
>> The measurements were obtained under the following conditions:
>>
>> - using blkback as the dom0 backend with the multiqueue patch applied to
>> a dom0 kernel 4.0 on 8 vcpus.
>>
>> - using a recent Ubuntu 15.04 kernel 3.19 with multiqueue frontend
>> applied to be used as a guest on 4 vcpus
>>
>> - using a micron RealSSD P320h as the underlying local storage on a Dell
>> PowerEdge R720 with 2 Xeon E5-2643 v2 cpus.
>>
>> - fio 2.2.7-22-g36870 as the generator of synthetic loads in the guest.
>> We used direct_io to skip caching in the guest and ran fio for 60s
>> reading a number of block sizes ranging from 512 bytes to 4MiB. Queue
>> depth of 32 for each queue was used to saturate individual vcpus in the
>> guest.
>>
>> We were interested in observing storage iops for different values of
>> block sizes. Our expectation was that iops would improve when increasing
>> the number of queues, because both the guest and dom0 would be able to
>> make use of more vcpus to handle these requests.
>>
>> These are the results (as aggregate iops for all the fio threads) that
>> we got for the conditions above with sequential reads:
>>
>> fio_threads  io_depth  block_size   1-queue_iops  8-queue_iops
>>  8   32   512   158K 264K
>>  8   321K   157K 260K
>>  8   322K   157K 258K
>>  8   324K   148K 257K
>>  8   328K   124K 207K
>>  8   32   16K84K 105K
>>  8   32   32K50K  54K
>>  8   32   64K24K  27K
>>  8   32  128K11K  13K
>>
>> 8-queue iops was better than single queue iops for all the block sizes.
>> There were very good improvements as well for sequential writes with
>> block size 4K (from 80K iops with single queue to 230K iops with 8
>> queues), and no regressions were visible in any measurement performed.
> Great results! And I don't know why this code has lingered for so long, 
> so thanks for helping get some attention to this again.
>
> Personally I'd be really interested in the results for the same set of 
> tests, but without the blk-mq patches. Do you have them, or could you 
> potentially run them?
>
Hello,

We rerun the tests for sequential reads with the identical settings but with 
Bob Liu's multiqueue patches reverted from dom0 and guest kernels.
The results we obtained were *better* than the results we got with multiqueue 
patches applied:

fio_threads  io_depth  block_size   1-queue_iops  8-queue_iops  
*no-mq-patches_iops*
 8   32   512   158K 264K 321K
 8   321K   157K 260K 328K
 8   322K   157K 258K 336K
 8   324K   148K 257K 308K
 8   328K   124K 207K 188K
 8   32   16K84K 105K 82K
 8   32   32K50K  54K 36K
 8   32   64K24K  27K 16K
 8   32  128K11K  13K 11K

We noticed that the requests are not merged by the guest when the multiqueue 
patches are applied,
which results in a regression for small block sizes (RealSSD P320h's optimal 
block size is around 32-64KB).

We observed similar regression for the Dell MZ-5EA1000-0D3 100 GB 2.5" Internal 
SSD

As I understand blk-mq layer bypasses I/O scheduler which also effectively 
disables merges.
Could you explain why it is difficult to enable merging in the blk-mq layer?
That could help closing the performance gap we observed.

Otherwise, the tests shows that the multiqueue patches does not improve the 
performance,
at least when it comes to sequential read/writes operations.

Rafal



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [libvirt test] 60641: regressions - trouble: broken/fail/pass

2015-08-10 Thread osstest service owner
flight 60641 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60641/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt-raw  3 host-install(3) broken REGR. vs. 60629
 test-armhf-armhf-libvirt 11 guest-start   fail REGR. vs. 60629

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 14 guest-saverestorefail   never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail  never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-vhd  11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  f8fe8f03455783afcd62d79db7ce4120f514c629
baseline version:
 libvirt  82af954c527e88111b05d50953b80eb4afde4d9a

Last test of basis60629  2015-08-07 21:19:27 Z2 days
Testing same since60641  2015-08-09 09:09:00 Z1 days1 attempts


People who touched revisions under test:
  Laine Stump 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-armhf-armhf-libvirt-xsm fail
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-armhf-armhf-libvirt fail
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairfail
 test-amd64-i386-libvirt-pair fail
 test-amd64-amd64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   fail
 test-amd64-i386-libvirt-qcow2pass
 test-amd64-amd64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw broken  
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass
 test-armhf-armhf-libvirt-vhd fail
 test-amd64-i386-libvirt-vhd  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test 

Re: [Xen-devel] [PATCH v3 19/20] xen/privcmd: Add support for Linux 64KB page granularity

2015-08-10 Thread David Vrabel
On 10/08/15 13:03, Stefano Stabellini wrote:
> On Fri, 7 Aug 2015, Julien Grall wrote:
>> -rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
>> -return rc < 0 ? rc : err;
>> +for (i = 0; i < nr_gfn; i++) {
>> +if ((i % XEN_PFN_PER_PAGE) == 0) {
>> +page = pages[i / XEN_PFN_PER_PAGE];
> 
> If this function is going to be called very frequently you might want to
> consider using a shift instead.
> 
> page = pages[i >> 4];
> 
> With an appropriate macro of course.

This change isn't necessary.  Compilers already turn divides into
suitable shifts.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Does Xen project have test suites for testing xc/xl/hypercall and so on?

2015-08-10 Thread Andrew Cooper
On 07/08/15 09:47, Jinjian (Ken) wrote:
> Hi all,
>I'm looking into xen's test suites now, and encounter some problems.
>Does xen have unit tests that can be executed by xen contributors to
> validate their code-commits before sending a patch to xen-devel?
>In xen-4.5.1(and its upstream), there are sevral test cases in the
> 'tests' directory, but these are much like tools, rather than test
> cases at my view. For example, it obtains physical address by
> hypercall and write relevant MSR in mce-test, but it didn't tell us
> whether the address obtained was correct? and is the register written
> successfully? There seems to have no method or expected results to
> check them.
>If it was unit test, it is obviously not enough for xen. Are there
> any more test methods to ensure the accuracy and stability of xen?

There is basically nothing in the way of unit tests which I am aware
of.  We rely on code review and functional testing primarily.

I have some plans to introduce some functional tests from a guests
perspective.  I hope to have these ready in the 4.7 timeframe, but there
is nothing similar which I am aware of.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Does Xen project have test suites for testing xc/xl/hypercall and so on?

2015-08-10 Thread Lars Kurth
There is some very minimal functionality in Raisin for testing, which Stefano 
is working on. See 
https://blog.xenproject.org/2015/06/28/project-raisin-raise-xen/
Lars

> On 10 Aug 2015, at 13:18, Andrew Cooper  wrote:
> 
> On 07/08/15 09:47, Jinjian (Ken) wrote:
>> Hi all,
>>   I'm looking into xen's test suites now, and encounter some problems.
>>   Does xen have unit tests that can be executed by xen contributors to
>> validate their code-commits before sending a patch to xen-devel?
>>   In xen-4.5.1(and its upstream), there are sevral test cases in the
>> 'tests' directory, but these are much like tools, rather than test
>> cases at my view. For example, it obtains physical address by
>> hypercall and write relevant MSR in mce-test, but it didn't tell us
>> whether the address obtained was correct? and is the register written
>> successfully? There seems to have no method or expected results to
>> check them.
>>   If it was unit test, it is obviously not enough for xen. Are there
>> any more test methods to ensure the accuracy and stability of xen?
> 
> There is basically nothing in the way of unit tests which I am aware
> of.  We rely on code review and functional testing primarily.
> 
> I have some plans to introduce some functional tests from a guests
> perspective.  I hope to have these ready in the 4.7 timeframe, but there
> is nothing similar which I am aware of.
> 
> ~Andrew
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen 4.6 retrospective] More public/easy to find information about the release schedule

2015-08-10 Thread Lars Kurth

> On 10 Aug 2015, at 10:40, Fabio Fantoni  wrote:
> 
> Il 10/08/2015 11:06, Lars Kurth ha scritto:
>>> On 10 Aug 2015, at 09:33, Wei Liu  wrote:
>>> 
>>> On Fri, Aug 07, 2015 at 05:36:57PM +0200, Roger Pau Monné wrote:
 = Issue / Observation =
 
 The information about the release schedule is not clearly published
 anywhere apart from the mailing lists, which makes it hard for
 non-developers (or even for developers) given that the mailing list
 traffic for xen-devel is high.
>> This is not entirely true: see 
>> http://wiki.xenproject.org/wiki/Xen_Project_Hypervisor_Roadmap/4.6
> Hi, I take a look to the wiki page, can be good mention also the "add of ahci 
> disk controller support for hvm domUs"? I saw that is missed in features list.

Will add it. I scraped the information from the last 4.6 email. But there were 
some omissions, which have not yet been addressed. Was waiting for Wei's Xen 
Dev Summit presentation before I update. 

>> However, I think https://www.freebsd.org/releng/ and also the odd mail on 
>> announce@ would make sense
>> 
>> Lars
>> ___
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 09/20] xen/biomerge: Don't allow biovec to be merge when Linux is not using 4KB page

2015-08-10 Thread David Vrabel
On 10/08/15 12:32, Julien Grall wrote:
> On 10/08/15 12:25, Stefano Stabellini wrote:
>> yes and page/pages:
>>
>> xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB 
>> pages
> 
> Why the ' in biovec's ? Shouldn't we says biovecs directly?

Pluralizing named C structures with apostrophes is valid.  It makes it
clear we're talking about "struct biovec" and not "struct biovecs" objects.

You could also consider "biovec's" to be a contraction for "biovec
objects" so the apostrophe is "correct".

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 20/20] arm/xen: Add support for 64KB page granularity

2015-08-10 Thread Stefano Stabellini
On Fri, 7 Aug 2015, Julien Grall wrote:
> The hypercall interface is always using 4KB page granularity. This is
> requiring to use xen page definition macro when we deal with hypercall.
> 
> Note that pfn_to_gfn is working with a Xen pfn (i.e 4KB). We may want to
> rename pfn_gfn to make this explicit.
> 
> We also allocate a 64KB page for the shared page even though only the
> first 4KB is used. I don't think this is really important for now as it
> helps to have the pointer 4KB aligned (XENMEM_add_to_physmap is taking a
> Xen PFN).
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 


> ---
> Cc: Stefano Stabellini 
> Cc: Russell King 
> 
> Stefano, I've dropped your reviewed-by given I've updated the doc and do
> changes to avoid usage of XEN_PAGE_SHIFT
> 
> Changes in v3:
> - s/MFN/GFN/ base on the new naming
> - Use virt_to_gfn to avoid use XEN_PAGE_SHIFT
> - Drop Stefano's reviewed-by
> - Add some docs in arch/arm/asm/xen/page.h
> 
> Changes in v2
> - Add Stefano's reviewed-by
> ---
>  arch/arm/include/asm/xen/page.h | 15 +--
>  arch/arm/xen/enlighten.c|  6 +++---
>  2 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h
> index 98c9fc3..e3d94cf 100644
> --- a/arch/arm/include/asm/xen/page.h
> +++ b/arch/arm/include/asm/xen/page.h
> @@ -28,6 +28,17 @@ typedef struct xpaddr {
>  
>  #define INVALID_P2M_ENTRY  (~0UL)
>  
> +/*
> + * The pseudo-physical frame (pfn) used in all the helpers is always based
> + * on Xen page granularity (i.e 4KB).
> + *
> + * A Linux page may be split across multiple non-contiguous Xen page so we
> + * have to keep track with frame based on 4KB page granularity.
> + *
> + * PV drivers should never make a direct usage of those helpers (particularly
> + * pfn_to_gfn and gfn_to_pfn).
> + */
> +
>  unsigned long __pfn_to_mfn(unsigned long pfn);
>  extern struct rb_root phys_to_mach;
>  
> @@ -64,8 +75,8 @@ static inline unsigned long bfn_to_pfn(unsigned long bfn)
>  #define bfn_to_local_pfn(bfn)bfn_to_pfn(bfn)
>  
>  /* VIRT <-> GUEST conversion */
> -#define virt_to_gfn(v)   (pfn_to_gfn(virt_to_pfn(v)))
> -#define gfn_to_virt(m)   (__va(gfn_to_pfn(m) << PAGE_SHIFT))
> +#define virt_to_gfn(v)   (pfn_to_gfn(virt_to_phys(v) >> 
> XEN_PAGE_SHIFT))
> +#define gfn_to_virt(m)   (__va(gfn_to_pfn(m) << XEN_PAGE_SHIFT))
>  
>  /* Only used in PV code. But ARM guests are always HVM. */
>  static inline xmaddr_t arbitrary_virt_to_machine(void *vaddr)
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index eeeab07..50b4769 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -89,8 +89,8 @@ static void xen_percpu_init(void)
>   pr_info("Xen: initializing cpu%d\n", cpu);
>   vcpup = per_cpu_ptr(xen_vcpu_info, cpu);
>  
> - info.mfn = __pa(vcpup) >> PAGE_SHIFT;
> - info.offset = offset_in_page(vcpup);
> + info.mfn = virt_to_gfn(vcpup);
> + info.offset = xen_offset_in_page(vcpup);
>  
>   err = HYPERVISOR_vcpu_op(VCPUOP_register_vcpu_info, cpu, &info);
>   BUG_ON(err);
> @@ -213,7 +213,7 @@ static int __init xen_guest_init(void)
>   xatp.domid = DOMID_SELF;
>   xatp.idx = 0;
>   xatp.space = XENMAPSPACE_shared_info;
> - xatp.gpfn = __pa(shared_info_page) >> PAGE_SHIFT;
> + xatp.gpfn = virt_to_gfn(shared_info_page);
>   if (HYPERVISOR_memory_op(XENMEM_add_to_physmap, &xatp))
>   BUG();
>  
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

2015-08-10 Thread Stefano Stabellini
On Mon, 10 Aug 2015, Julien Grall wrote:
> Hi Stefano,
> 
> On 10/08/15 12:18, Stefano Stabellini wrote:
> >>/* Link back into the page tables if not highmem. */
> >> @@ -396,14 +413,15 @@ static enum bp_state increase_reservation(unsigned 
> >> long nr_pages)
> >>  static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t 
> >> gfp)
> >>  {
> >>enum bp_state state = BP_DONE;
> >> -  unsigned long  pfn, i;
> >> +  unsigned long i;
> >>struct page   *page;
> >>int ret;
> >>struct xen_memory_reservation reservation = {
> >>.address_bits = 0,
> >> -  .extent_order = 0,
> >> +  .extent_order = EXTENT_ORDER,
> >>.domid= DOMID_SELF
> >>};
> >> +  static struct page *pages[ARRAY_SIZE(frame_list)];
> > 
> > This array can be rather large: I would try to avoid it, see below.
> 
> [..]
> 
> > 
> > I would simply and avoid introducing a new array:
> > pfn = (frame_list[i] << XEN_PAGE_SHIFT) >> PAGE_SHIFT;
> > page = pfn_to_page(pfn);
> 
> Which won't work because the frame_list contains a gfn and not a pfn.
> We need to translate back the gfn into a pfn and the into a page.
> 
> The cost of the translation may be big and I wanted to avoid anymore
> XEN_PAGE_SHIFT in the code. In general we should avoid to deal with 4KB
> PFN when it's not necessary, it make the code more confusing to read.

That is true


> If your only concern is the size of the array, we could decrease the
> number of frames by batch. Or allocation the variable once a boot time.

Yes, that is my only concern. Allocating only nr_pages new struct page*
would be good enough I guess.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 19/20] xen/privcmd: Add support for Linux 64KB page granularity

2015-08-10 Thread Stefano Stabellini
On Mon, 10 Aug 2015, David Vrabel wrote:
> On 10/08/15 13:03, Stefano Stabellini wrote:
> > On Fri, 7 Aug 2015, Julien Grall wrote:
> >> -  rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
> >> -  return rc < 0 ? rc : err;
> >> +  for (i = 0; i < nr_gfn; i++) {
> >> +  if ((i % XEN_PFN_PER_PAGE) == 0) {
> >> +  page = pages[i / XEN_PFN_PER_PAGE];
> > 
> > If this function is going to be called very frequently you might want to
> > consider using a shift instead.
> > 
> > page = pages[i >> 4];
> > 
> > With an appropriate macro of course.
> 
> This change isn't necessary.  Compilers already turn divides into
> suitable shifts.

The ARM compiler I used last time I tested this did not, but that was 1
or 2 years ago. In any case to be clear this change is not required.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Does Xen project have test suites for testing xc/xl/hypercall and so on?

2015-08-10 Thread Stefano Stabellini
Hi Jinjian,

validating changes made by contributors before submitting their patches
to xen-devel, is exactly the reason why I introduced "raise test" in
raisin. However the number of tests available is still very limited and
the functionality pretty immature.

Of course I would be happy to take patches to improve it :-)

Cheers,

Stefano

On Mon, 10 Aug 2015, Lars Kurth wrote:
> There is some very minimal functionality in Raisin for testing, which Stefano 
> is working on. See 
> https://blog.xenproject.org/2015/06/28/project-raisin-raise-xen/
> Lars
> 
> > On 10 Aug 2015, at 13:18, Andrew Cooper  wrote:
> > 
> > On 07/08/15 09:47, Jinjian (Ken) wrote:
> >> Hi all,
> >>   I'm looking into xen's test suites now, and encounter some problems.
> >>   Does xen have unit tests that can be executed by xen contributors to
> >> validate their code-commits before sending a patch to xen-devel?
> >>   In xen-4.5.1(and its upstream), there are sevral test cases in the
> >> 'tests' directory, but these are much like tools, rather than test
> >> cases at my view. For example, it obtains physical address by
> >> hypercall and write relevant MSR in mce-test, but it didn't tell us
> >> whether the address obtained was correct? and is the register written
> >> successfully? There seems to have no method or expected results to
> >> check them.
> >>   If it was unit test, it is obviously not enough for xen. Are there
> >> any more test methods to ensure the accuracy and stability of xen?
> > 
> > There is basically nothing in the way of unit tests which I am aware
> > of.  We rely on code review and functional testing primarily.
> > 
> > I have some plans to introduce some functional tests from a guests
> > perspective.  I hope to have these ready in the 4.7 timeframe, but there
> > is nothing similar which I am aware of.
> > 
> > ~Andrew
> > 
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 19/20] xen/privcmd: Add support for Linux 64KB page granularity

2015-08-10 Thread Julien Grall
Hi Stefano,

On 10/08/15 13:57, Stefano Stabellini wrote:
> On Mon, 10 Aug 2015, David Vrabel wrote:
>> On 10/08/15 13:03, Stefano Stabellini wrote:
>>> On Fri, 7 Aug 2015, Julien Grall wrote:
 -  rc = HYPERVISOR_memory_op(XENMEM_add_to_physmap_range, &xatp);
 -  return rc < 0 ? rc : err;
 +  for (i = 0; i < nr_gfn; i++) {
 +  if ((i % XEN_PFN_PER_PAGE) == 0) {
 +  page = pages[i / XEN_PFN_PER_PAGE];
>>>
>>> If this function is going to be called very frequently you might want to
>>> consider using a shift instead.
>>>
>>> page = pages[i >> 4];
>>>
>>> With an appropriate macro of course.
>>
>> This change isn't necessary.  Compilers already turn divides into
>> suitable shifts.
> 
> The ARM compiler I used last time I tested this did not, but that was 1
> or 2 years ago. In any case to be clear this change is not required.

I gave a try on the compiler used by Debian Jessy (gcc 4.9.2). It turns
divides into suitable shifts.

Anyway, if it may happen that older ARM compiler doesn't do this change,
I sure we would have to modify many other places in order to make the
code efficient.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 12/20] xen/balloon: Don't rely on the page granularity is the same for Xen and Linux

2015-08-10 Thread Julien Grall
On 10/08/15 13:55, Stefano Stabellini wrote:
>> If your only concern is the size of the array, we could decrease the
>> number of frames by batch. Or allocation the variable once a boot time.
> 
> Yes, that is my only concern. Allocating only nr_pages new struct page*
> would be good enough I guess.

That would be even worst. We shouldn't allocate the array at every call,
but at boot time.

Note that frame_list is already a static variable use 64KB when 64KB
page is used. I guess this will be unlikely to remove that much frame in
a single batch. But I will keep this optimization for later.

Anyway, I'm wondering if we could re-use the lru field to link the page
when allocate them and retrieve in the second loop in order to avoid the
pages array.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3] xen-apic: Enable on domU as well

2015-08-10 Thread Jason A. Donenfeld
It turns out that domU also requires the Xen APIC driver. Otherwise we
get stuck in busy loops that never exit, such as in this stack trace:

(gdb) target remote localhost:
Remote debugging using localhost:
__xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
56  while (native_apic_mem_read(APIC_ICR) & APIC_ICR_BUSY)
(gdb) bt
 #0  __xapic_wait_icr_idle () at ./arch/x86/include/asm/ipi.h:56
 #1  __default_send_IPI_shortcut (shortcut=,
dest=, vector=) at
./arch/x86/include/asm/ipi.h:75
 #2  apic_send_IPI_self (vector=246) at arch/x86/kernel/apic/probe_64.c:54
 #3  0x81011336 in arch_irq_work_raise () at
arch/x86/kernel/irq_work.c:47
 #4  0x8114990c in irq_work_queue (work=0x88000fc0e400) at
kernel/irq_work.c:100
 #5  0x8110c29d in wake_up_klogd () at kernel/printk/printk.c:2633
 #6  0x8110ca60 in vprintk_emit (facility=0, level=, dict=0x0 , dictlen=,
fmt=, args=)
at kernel/printk/printk.c:1778
 #7  0x816010c8 in printk (fmt=) at
kernel/printk/printk.c:1868
 #8  0xc00013ea in ?? ()
 #9  0x in ?? ()

Mailing-list-thread: https://lkml.org/lkml/2015/8/4/755
Signed-off-by: Jason A. Donenfeld 
Cc: David Vrabel 
Cc: Ian Campbell 
Cc: 
---
 arch/x86/xen/Makefile  | 4 ++--
 arch/x86/xen/xen-ops.h | 7 ++-
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 7322755..4b6e29a 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -13,13 +13,13 @@ CFLAGS_mmu.o:= $(nostackp)
 obj-y  := enlighten.o setup.o multicalls.o mmu.o irq.o \
time.o xen-asm.o xen-asm_$(BITS).o \
grant-table.o suspend.o platform-pci-unplug.o \
-   p2m.o
+   p2m.o apic.o
 
 obj-$(CONFIG_EVENT_TRACING) += trace.o
 
 obj-$(CONFIG_SMP)  += smp.o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= spinlock.o
 obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o
-obj-$(CONFIG_XEN_DOM0) += apic.o vga.o
+obj-$(CONFIG_XEN_DOM0) += vga.o
 obj-$(CONFIG_SWIOTLB_XEN)  += pci-swiotlb-xen.o
 obj-$(CONFIG_XEN_EFI)  += efi.o
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index c20fe29..d0a543b 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -98,20 +98,17 @@ static inline void xen_uninit_lock_cpu(int cpu)
 #endif
 
 struct dom0_vga_console_info;
-
 #ifdef CONFIG_XEN_DOM0
 void __init xen_init_vga(const struct dom0_vga_console_info *, size_t size);
-void __init xen_init_apic(void);
 #else
 static inline void __init xen_init_vga(const struct dom0_vga_console_info 
*info,
   size_t size)
 {
 }
-static inline void __init xen_init_apic(void)
-{
-}
 #endif
 
+void __init xen_init_apic(void);
+
 #ifdef CONFIG_XEN_EFI
 extern void xen_efi_init(void);
 #else
-- 
2.5.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] xen-apic: Enable on domU as well

2015-08-10 Thread David Vrabel
On 10/08/15 14:40, Jason A. Donenfeld wrote:
> It turns out that domU also requires the Xen APIC driver. Otherwise we
> get stuck in busy loops that never exit, such as in this stack trace:

What's the difference between v3 and v2?

David


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] xen-apic: Enable on domU as well

2015-08-10 Thread Jason A. Donenfeld
On Mon, Aug 10, 2015 at 3:41 PM, David Vrabel  wrote:
> On 10/08/15 14:40, Jason A. Donenfeld wrote:
>> It turns out that domU also requires the Xen APIC driver. Otherwise we
>> get stuck in busy loops that never exit, such as in this stack trace:
>
> What's the difference between v3 and v2?

I did some silly things with vim in v2, and there's an extra
semicolon, some other formatting things, and a function is made
unstatic by accident. v3 is what I should have originally sent.
Functionally the same though.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Linux 4.2-rc5:

2015-08-10 Thread Sander Eikelenboom

Monday, August 10, 2015, 11:00:11 AM, you wrote:

> On 08/06/2015 08:51 PM, li...@eikelenboom.it wrote:
>> Hi Ross,
>>
>> On my dom0 with a linux 4.2-rc5 kernel i encoutered the splat below.
>> It's probably related to your patch that went in just for 4.2-rc5:
>> "xen/events/fifo: Handle linked events when closing a port"
>>
>> --
>> Sander
>>
>> [   49.020173] [ cut here ]
>> [   49.020187] WARNING: CPU: 0 PID: 1 at
>> drivers/xen/events/events_fifo.c:395 evtchn_fifo_close+0xbd/0xc0()
>> [   49.020191] Modules linked in:
>> [   49.020198] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
>> 4.2.0-rc5-20150804-linus-doflr+ #1
>> [   49.020200] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS
>> V1.8B1 09/13/2010
>> [   49.020208]  81faaae8 880059b9bad8 81aed513
>> 
>> [   49.020214]   880059b9bb18 810c7280
>> 0041
>> [   49.020219]  0041  880059807ca8
>> 880059807c00
>> [   49.020220] Call Trace:
>> [   49.020233]  [] dump_stack+0x45/0x57
>> [   49.020240]  [] warn_slowpath_common+0x80/0xc0
>> [   49.020245]  [] warn_slowpath_null+0x15/0x20
>> [   49.020249]  [] evtchn_fifo_close+0xbd/0xc0
>> [   49.020278]  [] xen_evtchn_close+0x1d/0x60
>> [   49.020281]  [] ? irq_get_irq_data+0x9/0x20
>> [   49.020282]  [] shutdown_pirq+0x4b/0x70
>> [   49.020283]  [] irq_shutdown+0x34/0x70
>> [   49.020285]  [] __free_irq+0x19d/0x1e0
>> [   49.020286]  [] free_irq+0x48/0xb0
>> [   49.020287]  [] i8042_probe+0x38f/0x693
>> [   49.020291]  [] platform_drv_probe+0x2f/0x90
>> [   49.020292]  [] driver_probe_device+0x1af/0x2d0
>> [   49.020293]  [] __driver_attach+0x8b/0x90
>> [   49.020294]  [] ? driver_probe_device+0x2d0/0x2d0
>> [   49.020296]  [] bus_for_each_dev+0x5f/0x90
>> [   49.020297]  [] driver_attach+0x19/0x20
>> [   49.020298]  [] bus_add_driver+0x1ab/0x220
>> [   49.020299]  [] driver_register+0x5b/0xe0
>> [   49.020300]  [] __platform_driver_register+0x45/0x50
>> [   49.020301]  [] __platform_driver_probe+0x31/0xe0
>> [   49.020303]  [] __platform_create_bundle+0xa3/0xd0
>> [   49.020304]  [] ? i8042_toggle_aux+0x6c/0x6c
>> [   49.020305]  [] ? i8042_probe+0x693/0x693
>> [   49.020306]  [] i8042_init+0x3d0/0x3f6
>> [   49.020308]  [] do_one_initcall+0x87/0x1d0
>> [   49.020310]  [] kernel_init_freeable+0x1db/0x263
>> [   49.020312]  [] ? rest_init+0x80/0x80
>> [   49.020314]  [] kernel_init+0x9/0xe0
>> [   49.020315]  [] ret_from_fork+0x3f/0x70
>> [   49.020317]  [] ? rest_init+0x80/0x80
>> [   49.020320] ---[ end trace 64c385518fcbbfa1 ]---
>>

> Thanks.

> This means that the event channel is being closed with interrupts 
> disabled, so it cannot guarantee that the event is not linked in. This 
> is not a regression in behavior -- previously this was _never_ 
> guaranteed and just silently ignored. However, we should find a way to 
> fix this completely, to avoid warning spam.

> Regards,

I assume that's feasible within the two weeks left until the 4.2 release ?

--
Sander


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] About Xen bridged pci devices and suspend/resume for the X10SAE motherboard

2015-08-10 Thread Konrad Rzeszutek Wilk
On Mon, Aug 10, 2015 at 02:11:38AM +0300, M. Ivanov wrote:
> Hello,
> 
> excuse me for bothering you, but I've read an old thread on a mailing
> list about X10SAE compatibility. 
> http://lists.xen.org/archives/html/xen-devel/2014-02/msg02111.html

CC-ing Xen devel.
> 
> Currently I own this board and am trying to use it with Xen and be able
> to suspend and resume.
> 
> But I am getting errors from the USB 3 Renesas controller about parity
> in my bios event log, and my system hangs on resume,
> so I was wondering if that is connected to the bridge(tundra) you've
> mentioned.

Did you update the BIOS to the latest version?
> 
> I will be very glad if you could share any information regarding this
> matter. 
> 
> Best regards,
> M. Ivanov



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 2/6] xen/PMU: Sysfs interface for setting Xen PMU mode

2015-08-10 Thread Konrad Rzeszutek Wilk
On Sun, Aug 09, 2015 at 09:31:43PM -0400, Boris Ostrovsky wrote:
> Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall.
> 
> Signed-off-by: Boris Ostrovsky 

Reviewed-by: Konrad Rzeszutek Wilk 
> ---
>  Documentation/ABI/testing/sysfs-hypervisor-pmu |  23 +
>  arch/x86/include/asm/xen/hypercall.h   |   6 ++
>  arch/x86/xen/Kconfig   |   1 +
>  drivers/xen/Kconfig|   3 +
>  drivers/xen/sys-hypervisor.c   | 136 
> -
>  include/xen/interface/xen.h|   1 +
>  include/xen/interface/xenpmu.h |  59 +++
>  7 files changed, 228 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu
>  create mode 100644 include/xen/interface/xenpmu.h
> 
> diff --git a/Documentation/ABI/testing/sysfs-hypervisor-pmu 
> b/Documentation/ABI/testing/sysfs-hypervisor-pmu
> new file mode 100644
> index 000..224faa1
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-hypervisor-pmu
> @@ -0,0 +1,23 @@
> +What:/sys/hypervisor/pmu/pmu_mode
> +Date:August 2015
> +KernelVersion:   4.3
> +Contact: Boris Ostrovsky 
> +Description:
> + Describes mode that Xen's performance-monitoring unit (PMU)
> + uses. Accepted values are
> + "off"  -- PMU is disabled
> + "self" -- The guest can profile itself
> + "hv"   -- The guest can profile itself and, if it is
> +   privileged (e.g. dom0), the hypervisor
> + "all" --  The guest can profile itself, the hypervisor
> +   and all other guests. Only available to
> +   privileged guests.
> +
> +What:   /sys/hypervisor/pmu/pmu_features
> +Date:   August 2015
> +KernelVersion:  4.3
> +Contact:Boris Ostrovsky 
> +Description:
> + Describes Xen PMU features (as an integer). A set bit indicates
> + that the corresponding feature is enabled. See
> + include/xen/interface/xenpmu.h for available features
> diff --git a/arch/x86/include/asm/xen/hypercall.h 
> b/arch/x86/include/asm/xen/hypercall.h
> index ca08a27..83aea80 100644
> --- a/arch/x86/include/asm/xen/hypercall.h
> +++ b/arch/x86/include/asm/xen/hypercall.h
> @@ -465,6 +465,12 @@ HYPERVISOR_tmem_op(
>   return _hypercall1(int, tmem_op, op);
>  }
>  
> +static inline int
> +HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
> +{
> + return _hypercall2(int, xenpmu_op, op, arg);
> +}
> +
>  static inline void
>  MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
>  {
> diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
> index e88fda8..049cdda 100644
> --- a/arch/x86/xen/Kconfig
> +++ b/arch/x86/xen/Kconfig
> @@ -7,6 +7,7 @@ config XEN
>   depends on PARAVIRT
>   select PARAVIRT_CLOCK
>   select XEN_HAVE_PVMMU
> + select XEN_HAVE_VPMU
>   depends on X86_64 || (X86_32 && X86_PAE)
>   depends on X86_TSC
>   help
> diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
> index 9367604..73708ac 100644
> --- a/drivers/xen/Kconfig
> +++ b/drivers/xen/Kconfig
> @@ -288,4 +288,7 @@ config XEN_SYMS
>Exports hypervisor symbols (along with their types and addresses) 
> via
>/proc/xen/xensyms file, similar to /proc/kallsyms
>  
> +config XEN_HAVE_VPMU
> +   bool
> +
>  endmenu
> diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
> index 96453f8..0907275 100644
> --- a/drivers/xen/sys-hypervisor.c
> +++ b/drivers/xen/sys-hypervisor.c
> @@ -20,6 +20,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_XEN_HAVE_VPMU
> +#include 
> +#endif
>  
>  #define HYPERVISOR_ATTR_RO(_name) \
>  static struct hyp_sysfs_attr  _name##_attr = __ATTR_RO(_name)
> @@ -368,6 +371,126 @@ static void xen_properties_destroy(void)
>   sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
>  }
>  
> +#ifdef CONFIG_XEN_HAVE_VPMU
> +struct pmu_mode {
> + const char *name;
> + uint32_t mode;
> +};
> +
> +struct pmu_mode pmu_modes[] = {
> + {"off", XENPMU_MODE_OFF},
> + {"self", XENPMU_MODE_SELF},
> + {"hv", XENPMU_MODE_HV},
> + {"all", XENPMU_MODE_ALL}
> +};
> +
> +static ssize_t pmu_mode_store(struct hyp_sysfs_attr *attr,
> +   const char *buffer, size_t len)
> +{
> + int ret;
> + struct xen_pmu_params xp;
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(pmu_modes); i++) {
> + if (strncmp(buffer, pmu_modes[i].name, len - 1) == 0) {
> + xp.val = pmu_modes[i].mode;
> + break;
> + }
> + }
> +
> + if (i == ARRAY_SIZE(pmu_modes))
> + return -EINVAL;
> +
> + xp.version.maj = XENPMU_VER_MAJ;
> + xp.version.min = XENPMU_VER_MIN;
> + ret = HYPERVI

[Xen-devel] [PATCHv1] xen/events/fifo: Handle linked events when closing a PIRQ port

2015-08-10 Thread David Vrabel
Commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8 (xen/events/fifo:
Handle linked events when closing a port) did not handle closing a
port bound to a PIRQ because these are closed from shutdown_pirq()
which is called with interrupts disabled.

Defer the close to a work queue where we can safely spin waiting for
the LINKED bit to clear.  For simplicity, the close is always deferred
even if it is not required (i.e., we're already in process context).

Signed-off-by: David Vrabel 
Cc: Ross Lagerwall 
---
Cc: Sander Eikelenboom 
---
 drivers/xen/events/events_2l.c   | 10 +++
 drivers/xen/events/events_base.c | 13 +
 drivers/xen/events/events_fifo.c | 52 +++-
 drivers/xen/events/events_internal.h |  5 ++--
 4 files changed, 53 insertions(+), 27 deletions(-)

diff --git a/drivers/xen/events/events_2l.c b/drivers/xen/events/events_2l.c
index 7dd4631..82c90de 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -354,6 +354,15 @@ static void evtchn_2l_resume(void)
EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD);
 }
 
+static void evtchn_2l_close(unsigned int port, unsigned int cpu)
+{
+   struct evtchn_close close;
+
+   close.port = port;
+   if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
+   BUG();
+}
+
 static const struct evtchn_ops evtchn_ops_2l = {
.max_channels  = evtchn_2l_max_channels,
.nr_channels   = evtchn_2l_max_channels,
@@ -366,6 +375,7 @@ static const struct evtchn_ops evtchn_ops_2l = {
.unmask= evtchn_2l_unmask,
.handle_events = evtchn_2l_handle_events,
.resume= evtchn_2l_resume,
+   .close = evtchn_2l_close,
 };
 
 void __init xen_evtchn_2l_init(void)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 1495ecc..e3f0049 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -452,17 +452,6 @@ static void xen_free_irq(unsigned irq)
irq_free_desc(irq);
 }
 
-static void xen_evtchn_close(unsigned int port, unsigned int cpu)
-{
-   struct evtchn_close close;
-
-   xen_evtchn_op_close(port, cpu);
-
-   close.port = port;
-   if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
-   BUG();
-}
-
 static void pirq_query_unmask(int irq)
 {
struct physdev_irq_status_query irq_status;
@@ -546,7 +535,7 @@ out:
 
 err:
pr_err("irq%d: Failed to set port to irq mapping (%d)\n", irq, rc);
-   xen_evtchn_close(evtchn, NR_CPUS);
+   xen_evtchn_close(evtchn, 0);
return 0;
 }
 
diff --git a/drivers/xen/events/events_fifo.c b/drivers/xen/events/events_fifo.c
index 6df8aac..149e1e9 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -385,24 +386,51 @@ static void evtchn_fifo_resume(void)
event_array_pages = 0;
 }
 
+struct close_work {
+   struct work_struct work;
+   unsigned int port;
+};
+
+static void evtchn_fifo_close_work(struct work_struct *work)
+{
+   struct close_work *cw = container_of(work, struct close_work, work);
+   struct evtchn_close close;
+
+   while (evtchn_fifo_is_linked(cw->port))
+   cpu_relax();
+
+   close.port = cw->port;
+   if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
+   BUG();
+
+   kfree(cw);
+}
+
 static void evtchn_fifo_close(unsigned port, unsigned int cpu)
 {
-   if (cpu == NR_CPUS)
-   return;
+   struct close_work *cw;
 
-   get_online_cpus();
-   if (cpu_online(cpu)) {
-   if (WARN_ON(irqs_disabled()))
-   goto out;
+   /*
+* A port cannot be closed until the LINKED bit is clear.
+*
+* Reusing an already linked event may: a) cause the new event
+* to be raised on the wrong VCPU; or b) cause the event to be
+* lost (if the old VCPU is offline).
+*
+* If the VCPU is offline, its queues must be drained before
+* spinning for LINKED to be clear.
+*/
 
-   while (evtchn_fifo_is_linked(port))
-   cpu_relax();
-   } else {
+   if (!cpu_online(cpu))
__evtchn_fifo_handle_events(cpu, true);
-   }
 
-out:
-   put_online_cpus();
+   cw = kzalloc(sizeof(*cw), GFP_ATOMIC);
+   if (!cw)
+   return;
+   INIT_WORK(&cw->work, evtchn_fifo_close_work);
+   cw->port = port;
+
+   schedule_work_on(cpu, &cw->work);
 }
 
 static const struct evtchn_ops evtchn_ops_fifo = {
diff --git a/drivers/xen/events/events_internal.h 
b/drivers/xen/events/events_internal.h
index d18e123..017cc22 100644
--- a/drivers/xen/events/events_internal.h
+++ b/drivers/xen/events/events_internal.h
@@ -146,10 +146,9 @@ sta

[Xen-devel] [PATCH] x86: Allow PV guest set X86_CR4_PCE flag

2015-08-10 Thread Boris Ostrovsky
With added PV support for VPMU, guests may legitimately decide to set
CR4's PCE flag. We should allow this when VPMU is enabled.

Signed-off-by: Boris Ostrovsky 
---
 xen/arch/x86/cpu/vpmu.c  | 19 +++
 xen/arch/x86/domain.c| 13 -
 xen/include/asm-x86/domain.h |  2 ++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/vpmu.c b/xen/arch/x86/cpu/vpmu.c
index 8af3df1..8cac04e 100644
--- a/xen/arch/x86/cpu/vpmu.c
+++ b/xen/arch/x86/cpu/vpmu.c
@@ -81,6 +81,12 @@ static void __init parse_vpmu_param(char *s)
 }
 }
 
+static void update_cr4_mask_pce(bool_t allow)
+{
+pv_guest_update_cr4_mask(X86_CR4_PCE, 0, allow);
+pv_guest_update_cr4_mask(X86_CR4_PCE, 1, allow);
+}
+
 void vpmu_lvtpc_update(uint32_t val)
 {
 struct vpmu_struct *vpmu;
@@ -475,6 +481,7 @@ void vpmu_initialise(struct vcpu *v)
 printk(XENLOG_G_WARNING "VPMU: Unknown CPU vendor %d. "
"Disabling VPMU\n", vendor);
 opt_vpmu_enabled = 0;
+update_cr4_mask_pce(0);
 vpmu_mode = XENPMU_MODE_OFF;
 }
 return; /* Don't bother restoring vpmu_count, VPMU is off forever */
@@ -679,7 +686,16 @@ long do_xenpmu_op(unsigned int op, 
XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 if ( (vpmu_count == 0) ||
  ((vpmu_mode ^ pmu_params.val) ==
   (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
+{
+if ( (vpmu_mode != XENPMU_MODE_OFF) &&
+ (pmu_params.val == XENPMU_MODE_OFF) )
+update_cr4_mask_pce(0);
+else if ( (vpmu_mode == XENPMU_MODE_OFF) &&
+  (pmu_params.val != XENPMU_MODE_OFF) )
+update_cr4_mask_pce(1);
+
 vpmu_mode = pmu_params.val;
+}
 else if ( vpmu_mode != pmu_params.val )
 {
 printk(XENLOG_WARNING
@@ -807,8 +823,11 @@ static int __init vpmu_init(void)
 }
 
 if ( vpmu_mode != XENPMU_MODE_OFF )
+{
+update_cr4_mask_pce(1);
 printk(XENLOG_INFO "VPMU: version " __stringify(XENPMU_VER_MAJ) "."
__stringify(XENPMU_VER_MIN) "\n");
+}
 else
 opt_vpmu_enabled = 0;
 
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 045f6ff..71a2bb3 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -738,7 +738,18 @@ static int __init init_pv_cr4_masks(void)
 
 return 0;
 }
-__initcall(init_pv_cr4_masks);
+presmp_initcall(init_pv_cr4_masks);
+
+void pv_guest_update_cr4_mask(unsigned long mask, bool_t is_compat,
+  bool_t allow)
+{
+unsigned long *curr_mask = is_compat ? &compat_pv_cr4_mask : &pv_cr4_mask;
+
+if ( !allow )
+*curr_mask |= mask;
+else
+*curr_mask &= ~mask;
+}
 
 unsigned long pv_guest_cr4_fixup(const struct vcpu *v, unsigned long guest_cr4)
 {
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 0fce09e..9758b0a 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -561,6 +561,8 @@ void vcpu_show_registers(const struct vcpu *);
 
 /* Clean up CR4 bits that are not under guest control. */
 unsigned long pv_guest_cr4_fixup(const struct vcpu *, unsigned long guest_cr4);
+void pv_guest_update_cr4_mask(unsigned long mask, bool_t is_compat,
+  bool_t allow);
 
 /* Convert between guest-visible and real CR4 values. */
 #define pv_guest_cr4_to_real_cr4(v) \
-- 
1.8.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: Allow PV guest set X86_CR4_PCE flag

2015-08-10 Thread Andrew Cooper
On 10/08/15 15:27, Boris Ostrovsky wrote:
> With added PV support for VPMU, guests may legitimately decide to set
> CR4's PCE flag. We should allow this when VPMU is enabled.
>
> Signed-off-by: Boris Ostrovsky 

Why?  Even a PV guest using VPMU should know that it doesn't actually
control CR4.PCE

All this (appears to) end up doing is putting PCE into the "allow but
ignore" mask.

How about this (not even compile tested) which is a rather shorter way
of doing the same thing:

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 045f6ff..834ce0f 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -721,10 +721,12 @@ static int __init init_pv_cr4_masks(void)
 unsigned long common_mask = ~X86_CR4_TSD;
 
 /*
- * All PV guests may attempt to modify TSD, DE and OSXSAVE.
+ * All PV guests may attempt to modify TSD, DE, PCE and OSXSAVE.
  */
 if ( cpu_has_de )
 common_mask &= ~X86_CR4_DE;
+if ( cpu_has_pce )
+common_mask &= ~X86_CR4_PCE;
 if ( cpu_has_xsave )
 common_mask &= ~X86_CR4_OSXSAVE;
 
~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] About Xen bridged pci devices and suspend/resume for the X10SAE motherboard

2015-08-10 Thread Konrad Rzeszutek Wilk
On Mon, Aug 10, 2015 at 05:14:28PM +0300, M. Ivanov wrote:
> On Mon, 2015-08-10 at 09:58 -0400, Konrad Rzeszutek Wilk wrote:
> > On Mon, Aug 10, 2015 at 02:11:38AM +0300, M. Ivanov wrote:
> > > Hello,
> > > 
> > > excuse me for bothering you, but I've read an old thread on a mailing
> > > list about X10SAE compatibility. 
> > > http://lists.xen.org/archives/html/xen-devel/2014-02/msg02111.html
> > 
> > CC-ing Xen devel.
> > > 
> > > Currently I own this board and am trying to use it with Xen and be able
> > > to suspend and resume.
> > > 
> > > But I am getting errors from the USB 3 Renesas controller about parity
> > > in my bios event log, and my system hangs on resume,
> > > so I was wondering if that is connected to the bridge(tundra) you've
> > > mentioned.
> > 
> > Did you update the BIOS to the latest version?
> Will updating to version 3 solve my issue?
> Can you do a suspend/resume on your X10SAE?

It did work at some point. I will find out when I am at home later today.

> > > 
> > > I will be very glad if you could share any information regarding this
> > > matter. 
> > > 
> > > Best regards,
> > > M. Ivanov
> > 
> > 
> 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: Allow PV guest set X86_CR4_PCE flag

2015-08-10 Thread Boris Ostrovsky



On 08/10/2015 10:37 AM, Andrew Cooper wrote:

On 10/08/15 15:27, Boris Ostrovsky wrote:

With added PV support for VPMU, guests may legitimately decide to set
CR4's PCE flag. We should allow this when VPMU is enabled.

Signed-off-by: Boris Ostrovsky 

Why?  Even a PV guest using VPMU should know that it doesn't actually
control CR4.PCE

All this (appears to) end up doing is putting PCE into the "allow but
ignore" mask.


Yes, that's what I wanted to do.



How about this (not even compile tested) which is a rather shorter way
of doing the same thing:


We could do this too but I thought that if we have VPMU off there is no 
reason to allow this bit to be set (quietly).


(There is no cpu_has_pce, we'd use cpu_has_arch_perfmon on Intel and do 
this unconditionally on AMD)


-boris



diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 045f6ff..834ce0f 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -721,10 +721,12 @@ static int __init init_pv_cr4_masks(void)
  unsigned long common_mask = ~X86_CR4_TSD;
  
  /*

- * All PV guests may attempt to modify TSD, DE and OSXSAVE.
+ * All PV guests may attempt to modify TSD, DE, PCE and OSXSAVE.
   */
  if ( cpu_has_de )
  common_mask &= ~X86_CR4_DE;
+if ( cpu_has_pce )
+common_mask &= ~X86_CR4_PCE;
  if ( cpu_has_xsave )
  common_mask &= ~X86_CR4_OSXSAVE;
  
~Andrew



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: Allow PV guest set X86_CR4_PCE flag

2015-08-10 Thread Andrew Cooper
On 10/08/15 15:49, Boris Ostrovsky wrote:
>
>
> On 08/10/2015 10:37 AM, Andrew Cooper wrote:
>> On 10/08/15 15:27, Boris Ostrovsky wrote:
>>> With added PV support for VPMU, guests may legitimately decide to set
>>> CR4's PCE flag. We should allow this when VPMU is enabled.
>>>
>>> Signed-off-by: Boris Ostrovsky 
>> Why?  Even a PV guest using VPMU should know that it doesn't actually
>> control CR4.PCE
>>
>> All this (appears to) end up doing is putting PCE into the "allow but
>> ignore" mask.
>
> Yes, that's what I wanted to do.
>
>>
>> How about this (not even compile tested) which is a rather shorter way
>> of doing the same thing:
>
> We could do this too but I thought that if we have VPMU off there is
> no reason to allow this bit to be set (quietly).

Adding the bit to this mask doesn't allow the guest to play with it. 
Xen never sets CR4.PCE.

The question is whether warning about the guest attempting to set it is
worthwhile or not.  We have no rdpmc support in emulate_privileged_op()
so any attempt to use it will result in a #UD being injected.

>
> (There is no cpu_has_pce, we'd use cpu_has_arch_perfmon on Intel and
> do this unconditionally on AMD)

It could trivially be added.  The salient piece of information is
whether the hardware would support setting CR4.PCE, not whether any of
the interesting features which come with it are present.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] how can I find hypercall page address?

2015-08-10 Thread Dario Faggioli
On Sat, 2015-08-08 at 08:02 +0800, big strong wrote:
> I think I've stated clearly what I want to do.
>
Well...
> 
> |I want to locate the hypercall page address when creating a new domU,
> so as to locate hypercalls.
>
Ok. What for?

Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86: Allow PV guest set X86_CR4_PCE flag

2015-08-10 Thread Boris Ostrovsky



On 08/10/2015 11:02 AM, Andrew Cooper wrote:

On 10/08/15 15:49, Boris Ostrovsky wrote:


On 08/10/2015 10:37 AM, Andrew Cooper wrote:

On 10/08/15 15:27, Boris Ostrovsky wrote:

With added PV support for VPMU, guests may legitimately decide to set
CR4's PCE flag. We should allow this when VPMU is enabled.

Signed-off-by: Boris Ostrovsky 

Why?  Even a PV guest using VPMU should know that it doesn't actually
control CR4.PCE

All this (appears to) end up doing is putting PCE into the "allow but
ignore" mask.

Yes, that's what I wanted to do.


How about this (not even compile tested) which is a rather shorter way
of doing the same thing:

We could do this too but I thought that if we have VPMU off there is
no reason to allow this bit to be set (quietly).

Adding the bit to this mask doesn't allow the guest to play with it.
Xen never sets CR4.PCE.

The question is whether warning about the guest attempting to set it is
worthwhile or not.  We have no rdpmc support in emulate_privileged_op()
so any attempt to use it will result in a #UD being injected.


That's because (at least on Linux) we turn it into rdmsr.

Actually, let's forget this patch. Given what I just said above, I think 
it's better to fix this on Linux side and just clear PCE bit in 
xen_write_cr4().


-boris




(There is no cpu_has_pce, we'd use cpu_has_arch_perfmon on Intel and
do this unconditionally on AMD)

It could trivially be added.  The salient piece of information is
whether the hardware would support setting CR4.PCE, not whether any of
the interesting features which come with it are present.

~Andrew



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Developing Desktop Window Integration

2015-08-10 Thread Dario Faggioli
On Fri, 2015-08-07 at 12:55 -0400, hanji unit wrote:
> I have started looking at what would be required. It looks like a full
> graphical stack implementation, including userland, kernelmode, and
> possibly even VGA device code. For these reasons, I think it will be a
> large undertaking and want to discuss with the community.
> 
> I think some of what Qubes did here is closed source.
> 
I'm not 100% sure, but I don't think it is.

Actually, I don't think it could possibly be, if only, for licences
issues. In fact, Quebs people modified KDE and XFCE, and are
redistributing those changes in their product/project, available for
download to everyone.

As per my understanding of GPL and of other OS licenses, this, in many
cases, mean calls for the modified sources to be available, and, as a
matter of fact, I think they are. *I think* that some of the work
they've done on Windows is closed, but I'm quite sure their
modifications to Xen, Linux, and to the various bits and pieces of X
server, display managers and desktop environments are all public.

I may be wrong, but I'd at least have a look, if I were you. :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen 4.6 retrospective] More public/easy to find information about the release schedule

2015-08-10 Thread Dario Faggioli
On Fri, 2015-08-07 at 17:36 +0200, Roger Pau Monné wrote:
> = Issue / Observation =
> 
> The information about the release schedule is not clearly published
> anywhere apart from the mailing lists, which makes it hard for
> non-developers (or even for developers) given that the mailing list
> traffic for xen-devel is high.
> 
> = Possible Solution / Improvement =
> 
> Publish the release schedule in a web page with a concrete schedule,
> like the FreeBSD Release Engineering Team does:
> 
> https://www.freebsd.org/releng/
> 
+1

At Fedora, they do something similar:

 https://fedoraproject.org/wiki/Releases/23/Schedule

 https://fedoraproject.org/wiki/Releases/22/Schedule?rd=Releases/22

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC v2 0/5] Multi-queue support for xen-blkfront and xen-blkback

2015-08-10 Thread Jens Axboe

On 08/10/2015 05:03 AM, Rafal Mielniczuk wrote:

On 01/07/15 04:03, Jens Axboe wrote:

On 06/30/2015 08:21 AM, Marcus Granado wrote:

Hi,

Our measurements for the multiqueue patch indicate a clear improvement
in iops when more queues are used.

The measurements were obtained under the following conditions:

- using blkback as the dom0 backend with the multiqueue patch applied to
a dom0 kernel 4.0 on 8 vcpus.

- using a recent Ubuntu 15.04 kernel 3.19 with multiqueue frontend
applied to be used as a guest on 4 vcpus

- using a micron RealSSD P320h as the underlying local storage on a Dell
PowerEdge R720 with 2 Xeon E5-2643 v2 cpus.

- fio 2.2.7-22-g36870 as the generator of synthetic loads in the guest.
We used direct_io to skip caching in the guest and ran fio for 60s
reading a number of block sizes ranging from 512 bytes to 4MiB. Queue
depth of 32 for each queue was used to saturate individual vcpus in the
guest.

We were interested in observing storage iops for different values of
block sizes. Our expectation was that iops would improve when increasing
the number of queues, because both the guest and dom0 would be able to
make use of more vcpus to handle these requests.

These are the results (as aggregate iops for all the fio threads) that
we got for the conditions above with sequential reads:

fio_threads  io_depth  block_size   1-queue_iops  8-queue_iops
  8   32   512   158K 264K
  8   321K   157K 260K
  8   322K   157K 258K
  8   324K   148K 257K
  8   328K   124K 207K
  8   32   16K84K 105K
  8   32   32K50K  54K
  8   32   64K24K  27K
  8   32  128K11K  13K

8-queue iops was better than single queue iops for all the block sizes.
There were very good improvements as well for sequential writes with
block size 4K (from 80K iops with single queue to 230K iops with 8
queues), and no regressions were visible in any measurement performed.

Great results! And I don't know why this code has lingered for so long,
so thanks for helping get some attention to this again.

Personally I'd be really interested in the results for the same set of
tests, but without the blk-mq patches. Do you have them, or could you
potentially run them?


Hello,

We rerun the tests for sequential reads with the identical settings but with 
Bob Liu's multiqueue patches reverted from dom0 and guest kernels.
The results we obtained were *better* than the results we got with multiqueue 
patches applied:

fio_threads  io_depth  block_size   1-queue_iops  8-queue_iops  
*no-mq-patches_iops*
  8   32   512   158K 264K 321K
  8   321K   157K 260K 328K
  8   322K   157K 258K 336K
  8   324K   148K 257K 308K
  8   328K   124K 207K 188K
  8   32   16K84K 105K 82K
  8   32   32K50K  54K 36K
  8   32   64K24K  27K 16K
  8   32  128K11K  13K 11K

We noticed that the requests are not merged by the guest when the multiqueue 
patches are applied,
which results in a regression for small block sizes (RealSSD P320h's optimal 
block size is around 32-64KB).

We observed similar regression for the Dell MZ-5EA1000-0D3 100 GB 2.5" Internal 
SSD

As I understand blk-mq layer bypasses I/O scheduler which also effectively 
disables merges.
Could you explain why it is difficult to enable merging in the blk-mq layer?
That could help closing the performance gap we observed.

Otherwise, the tests shows that the multiqueue patches does not improve the 
performance,
at least when it comes to sequential read/writes operations.


blk-mq still provides merging, there should be no difference there. Does 
the xen patches set BLK_MQ_F_SHOULD_MERGE?


--
Jens Axboe


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 01/23] x86/boot: remove unneeded instruction

2015-08-10 Thread Konrad Rzeszutek Wilk
On Mon, Jul 27, 2015 at 09:46:08PM +0200, Daniel Kiper wrote:
> On Fri, Jul 24, 2015 at 12:22:57PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jul 20, 2015 at 04:28:56PM +0200, Daniel Kiper wrote:
> > > Signed-off-by: Daniel Kiper 
> >
> > Don't you use it in:
> >
> > /* Switch to low-memory stack.  */
> > 193 mov sym_phys(trampoline_phys),%edi
> > 194 lea 0x1(%edi),%esp
> > 195 lea trampoline_boot_cpu_entry-trampoline_start(%edi),%eax
> > ?
> 
> Yep, but...
> 
> > > ---
> > >  xen/arch/x86/boot/head.S |1 -
> > >  1 file changed, 1 deletion(-)
> > >
> > > diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
> > > index cfd59dc..f63b349 100644
> > > --- a/xen/arch/x86/boot/head.S
> > > +++ b/xen/arch/x86/boot/head.S
> > > @@ -169,7 +169,6 @@ __start:
> > >  /* Apply relocations to bootstrap trampoline. */
> > >  mov sym_phys(trampoline_phys),%edx
> 
> ...relevant value is stored in sym_phys(trampoline_phys) earlier then it is
> read into %edx here and...
> 
> > >  mov $sym_phys(__trampoline_rel_start),%edi
> > > -mov %edx,sym_phys(trampoline_phys)
> 
> ...it is put back to sym_phys(trampoline_phys) without any change here :-))).
> So, I suppose this is remnant from something which was removed once but
> somebody forgot to remove this instruction too... This patch fixes it.

Reviewed-by: Konrad Rzeszutek Wilk 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv1] xen/events/fifo: Handle linked events when closing a PIRQ port

2015-08-10 Thread Boris Ostrovsky

On 08/10/2015 10:24 AM, David Vrabel wrote:

Commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8 (xen/events/fifo:
Handle linked events when closing a port) did not handle closing a
port bound to a PIRQ because these are closed from shutdown_pirq()
which is called with interrupts disabled.

Defer the close to a work queue where we can safely spin waiting for
the LINKED bit to clear.  For simplicity, the close is always deferred
even if it is not required (i.e., we're already in process context).

Signed-off-by: David Vrabel 
Cc: Ross Lagerwall 


Reviewed-by: Boris Ostrovsky 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] OVMF BoF @ KVM Forum 2015

2015-08-10 Thread Laszlo Ersek
Hi.

Let's do an OVMF BoF at this year's KVM Forum too.

Paolo will present

  Securing secure boot: system management mode in KVM and Tiano Core

on Thursday, August 20, in the 5:00pm - 5:30pm time slot.

Right after that, the BoF section starts at 5:30pm:

  http://events.linuxfoundation.org/events/kvm-forum/program/schedule

We should convene and discuss stuff. I don't have an agenda, so people
should bring their ideas and questions (famous last words).

As food for thought, I tried to collect the feature-looking patches from
the git history that have been committed since last year's KVM Forum,
and to match them against patch sets on the mailing list:

  git log --reverse --oneline --since=2014-10-14 -- \
  OvmfPkg/ \
  ArmVirtPkg/ \
  ArmPlatformPkg/ArmVirtualizationPkg/

I attempted to sort them into categories. You can see the list below.
The ordering is totally random, it's just what I ended up with.
Corrections / additions welcome.

Personally, one (missing) feature I'd like to see discussed is
"SataControllerDxe in OVMF". SMM will require Q35, and the only "IDE"
that Q35 speaks is SATA / AHCI. (And you can't disable that controller
on Q35.)

Anyway, here goes.

Features completed
--

(... unless marked [pending])

- Xen guest:

  - PV block driver:
[PATCH v4 00/19] Introducing Xen PV block driver to OVMF

  - Xen for ARM:
[PATCH v5 00/29] Xen/ARM guest support

- PCI / hw related:

  - PCI on ARM; detect VGA and USB keyboard:
[PATCH v3 00/28] ArmVirtualizationPkg/ArmVirtualizationQemu: enable PCI
[PATCH 0/4] ArmVirtualizationPkg: PlatformIntelBdsLib: dynamic console setup

  - support for Q35:
[PATCH v6 0/9] OVMF: Add support for Qemu Q35 machine type
[PATCH 1/1] OvmfPkg: QemuBootOrderLib: parse OFW device path nodes of PCI 
bridges

  - USB3 (ARM and x86):
[PATCH v2 2/4] ArmVirtualizationPkg/ArmVirtualizationQemu: include XHCI 
driver
[PATCH v2 4/4] OvmfPkg: include XHCI driver

  - support TCO watchdog emulation features:
[PATCH v5 2/2] OvmfPkg/PlatformPei: Initialise RCBA (B0:D31:F0 0xf0) 
register

  - virtio-vga:
[PATCH] Add virtio-vga support

  - support extra PCI root buses for NUMA-locality with assigned
devices:
[PATCH v3 00/23] OvmfPkg: support extra PCI root buses

- QEMU config integration:

  - fw_cfg, boot order, and -kernel booting on ARM:
[PATCH v4 00/13] ArmVirtualizationQemu: support fw_cfg, bootorder, '-kernel'
[PATCH 0/3] ArmVirtPkg: drop support for the ARM BDS

  - support for "-boot menu=on[,splash-time=N]":
[PATCH v2 0/3] OVMF, ArmVirt: consume QEMU's "-boot menu=on[,splash-time=N]"

  - ACPI tables for ARM:
[PATCH v2 0/3] ACPI over fw_cfg for ARM/AARCH64 qemu guests

  - SMBIOS features: Type 0 default, and SMBIOS 3.0 support on ARM and
x86:
[PATCH] OvmfPkg/SMBIOS: Provide default Type 0 (BIOS Information) structure
[PATCH v2 0/6] ArmVirtPkg/ArmVirtQemu: support SMBIOS
[PATCH 0/9] OvmfPkg, ArmVirtPkg: SMBIOS 3.0, round 2

- ARM specific:

  - "fun" with the caches:
[PATCH v4 0/5] ArmVirtualizationPkg: explicit cache maintenance

  - secure boot:
[PATCH v3 0/3] ArmVirtualizationQemu: enable support for UEFI Secure Boot

  - performance optimization:
[PATCH v2 0/6] ArmPkg/ArmVirtPkg: GIC revision detection

  - better handling for the typical Linux terminal (generic driver code,
hooked up to ArmVirt):
[PATCH V4 0/5] Add TtyTerm terminal type

- SMM for OVMF (in progress):
[PATCH 00/11] Bits and pieces
[PATCH 00/58] OvmfPkg: support SMM for better security (single VCPU, IA32) 
[pending]

- Build system:

  - moving to NASM:
[PATCH 0/7] Convert OVMF assembly to NASM
[PATCH v2 0/6] OvmfPkg/XenBusDxe: Convert *.asm to NASM.

  - accept UTF-8 in .uni files:
[PATCH v4 00/10] Support UTF-8 in .uni string files

  - LLVM/clang support for AARCH64 (in progress):
[PATCH v4 00/13] BaseTools: unify all GCC linker scripts
[PATCH v4 0/7] small model and clang support for AARCH64 [pending]

- UEFI compliance:

  - support for OsIndications:
[PATCH v2 0/9] OvmfPkg: PlatformBdsLib cleanups and improvements

  - signal ReadyToBoot:
[PATCH 1/8] OvmfPkg/PlatformBdsLib: Signal ReadyToBoot before booting QEMU 
kernel

  - signal EndOfDxe:
[PATCH v2] ArmVirtPkg: signal EndOxDxe event in PlatformBsdInit
[PATCH v2 0/6] OvmfPkg: save S3 state at EndOfDxe

  - fix Serial IO Protocol issues flagged by SCT
[PATCH V4 0/5] Some improvements on serial terminal

- other

  - big OVMF guests:
[PATCH v2 0/4] OvmfPkg: enable >= 64 GB guests

  - IPv6 (conditionally enabled):
[PATCH v2] OvmfPkg: enable the IPv6 support

  - many fixes for toolchain warnings and C language misuse

Thanks
Laszlo

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 04/23] x86/boot: call reloc() using cdecl calling convention

2015-08-10 Thread Konrad Rzeszutek Wilk
On Mon, Jul 20, 2015 at 04:28:59PM +0200, Daniel Kiper wrote:
> Suggested-by: Jan Beulich 
> Signed-off-by: Daniel Kiper 

Reviewed-by: Konrad Rzeszutek Wilk 

> ---
>  xen/arch/x86/boot/head.S  |4 +++-
>  xen/arch/x86/boot/reloc.c |   20 
>  2 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
> index ed42782..3cbb2e6 100644
> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -119,8 +119,10 @@ __start:
>  
>  /* Save the Multiboot info struct (after relocation) for later use. 
> */
>  mov $sym_phys(cpu0_stack)+1024,%esp
> -push%ebx
> +push%ebx/* Multiboot information address. */
> +push%eax/* Boot trampoline address. */
>  callreloc
> +add $8,%esp /* Remove reloc() args from stack. */
>  mov %eax,sym_phys(multiboot_ptr)
>  
>  /* Initialize BSS (no nasty surprises!). */
> diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c
> index 63045c0..708898f 100644
> --- a/xen/arch/x86/boot/reloc.c
> +++ b/xen/arch/x86/boot/reloc.c
> @@ -10,15 +10,27 @@
>   *Keir Fraser 
>   */
>  
> -/* entered with %eax = BOOT_TRAMPOLINE */
> +/*
> + * This entry point is entered from xen/arch/x86/boot/head.S with:
> + *   - 0x4(%esp) = BOOT_TRAMPOLINE_ADDRESS,
> + *   - 0x8(%esp) = MULTIBOOT_INFORMATION_ADDRESS.
> + */
>  asm (
>  ".text \n"
>  ".globl _start \n"
>  "_start:   \n"
> +"push %ebp \n"
> +"mov  %esp,%ebp\n"
>  "call 1f   \n"
> -"1:  pop  %ebx \n"
> -"mov  %eax,alloc-1b(%ebx)  \n"
> -"jmp  reloc\n"
> +"1:  pop  %ecx \n"
> +"mov  0x8(%ebp),%eax   \n"
> +"mov  %eax,alloc-1b(%ecx)  \n"
> +"mov  0xc(%ebp),%eax   \n"
> +"push %eax \n"
> +"call reloc\n"
> +"add  $4,%esp  \n"
> +"pop  %ebp \n"
> +"ret   \n"
>  );
>  
>  /*
> -- 
> 1.7.10.4
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 06/23] x86/boot: use %ecx instead of %eax

2015-08-10 Thread Konrad Rzeszutek Wilk
On Mon, Jul 20, 2015 at 04:29:01PM +0200, Daniel Kiper wrote:
> Use %ecx instead of %eax to store low memory upper limit from EBDA.
> This way we do not wipe multiboot protocol identifier. It is needed
> in reloc() to differentiate between multiboot (v1) and
> multiboot2 protocol.
> 
> Signed-off-by: Daniel Kiper 
> Reviewed-by: Andrew Cooper 

Reviewed-by: Konrad Rzeszutek Wilk 
> ---
>  xen/arch/x86/boot/head.S |   24 
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
> index 3cbb2e6..77e7da9 100644
> --- a/xen/arch/x86/boot/head.S
> +++ b/xen/arch/x86/boot/head.S
> @@ -87,14 +87,14 @@ __start:
>  jne not_multiboot
>  
>  /* Set up trampoline segment 64k below EBDA */
> -movzwl  0x40e,%eax  /* EBDA segment */
> -cmp $0xa000,%eax/* sanity check (high) */
> +movzwl  0x40e,%ecx  /* EBDA segment */
> +cmp $0xa000,%ecx/* sanity check (high) */
>  jae 0f
> -cmp $0x4000,%eax/* sanity check (low) */
> +cmp $0x4000,%ecx/* sanity check (low) */
>  jae 1f
>  0:
> -movzwl  0x413,%eax  /* use base memory size on failure */
> -shl $10-4,%eax
> +movzwl  0x413,%ecx  /* use base memory size on failure */
> +shl $10-4,%ecx
>  1:
>  /*
>   * Compare the value in the BDA with the information from the
> @@ -106,21 +106,21 @@ __start:
>  cmp $0x100,%edx /* is the multiboot value too small? */
>  jb  2f  /* if so, do not use it */
>  shl $10-4,%edx
> -cmp %eax,%edx   /* compare with BDA value */
> -cmovb   %edx,%eax   /* and use the smaller */
> +cmp %ecx,%edx   /* compare with BDA value */
> +cmovb   %edx,%ecx   /* and use the smaller */
>  
>  2:  /* Reserve 64kb for the trampoline */
> -sub $0x1000,%eax
> +sub $0x1000,%ecx
>  
>  /* From arch/x86/smpboot.c: start_eip had better be page-aligned! */
> -xor %al, %al
> -shl $4, %eax
> -mov %eax,sym_phys(trampoline_phys)
> +xor %cl, %cl
> +shl $4, %ecx
> +mov %ecx,sym_phys(trampoline_phys)
>  
>  /* Save the Multiboot info struct (after relocation) for later use. 
> */
>  mov $sym_phys(cpu0_stack)+1024,%esp
>  push%ebx/* Multiboot information address. */
> -push%eax/* Boot trampoline address. */
> +push%ecx/* Boot trampoline address. */
>  callreloc
>  add $8,%esp /* Remove reloc() args from stack. */
>  mov %eax,sym_phys(multiboot_ptr)
> -- 
> 1.7.10.4
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 07/23] x86/boot/reloc: Rename some variables and rearrange code a bit

2015-08-10 Thread Konrad Rzeszutek Wilk
On Mon, Jul 20, 2015 at 04:29:02PM +0200, Daniel Kiper wrote:
> Rename mbi and mbi_old variables and rearrange code a bit to make

s/mbi_old/mbi_in/

Perhaps you want to say: rename mbi_old with mbi_in, and mbi with mbi_out

or better:

Replace mbi with mbi_out and mbi_old with mbi_in and ...


> it more readable. Additionally, this way multiboot (v1) protocol
> implementation and future multiboot2 protocol implementation will
> use the same variable naming convention.
> 
> Signed-off-by: Daniel Kiper 

Reviewed-by: Konrad Rzeszutek Wilk 
> ---
> v2 - suggestions/fixes:
>- extract this change from main mutliboot2
>  protocol implementation
>  (suggested by Jan Beulich).
> ---
>  xen/arch/x86/boot/reloc.c |   39 +--
>  1 file changed, 21 insertions(+), 18 deletions(-)
> 
> diff --git a/xen/arch/x86/boot/reloc.c b/xen/arch/x86/boot/reloc.c
> index 09fd540..feb1d72 100644
> --- a/xen/arch/x86/boot/reloc.c
> +++ b/xen/arch/x86/boot/reloc.c
> @@ -86,41 +86,44 @@ static u32 copy_string(u32 src)
>  return copy_mem(src, p - src + 1);
>  }
>  
> -multiboot_info_t *reloc(u32 mbi_old)
> +multiboot_info_t *reloc(u32 mbi_in)
>  {
> -multiboot_info_t *mbi = (multiboot_info_t *)copy_mem(mbi_old, 
> sizeof(*mbi));
>  int i;
> +multiboot_info_t *mbi_out;
>  
> -if ( mbi->flags & MBI_CMDLINE )
> -mbi->cmdline = copy_string(mbi->cmdline);
> +mbi_out = (multiboot_info_t *)copy_mem(mbi_in, sizeof(*mbi_out));
>  
> -if ( mbi->flags & MBI_MODULES )
> +if ( mbi_out->flags & MBI_CMDLINE )
> +mbi_out->cmdline = copy_string(mbi_out->cmdline);
> +
> +if ( mbi_out->flags & MBI_MODULES )
>  {
>  module_t *mods;
>  
> -mbi->mods_addr = copy_mem(mbi->mods_addr, mbi->mods_count * 
> sizeof(module_t));
> +mbi_out->mods_addr = copy_mem(mbi_out->mods_addr,
> +  mbi_out->mods_count * 
> sizeof(module_t));
>  
> -mods = (module_t *)mbi->mods_addr;
> +mods = (module_t *)mbi_out->mods_addr;
>  
> -for ( i = 0; i < mbi->mods_count; i++ )
> +for ( i = 0; i < mbi_out->mods_count; i++ )
>  {
>  if ( mods[i].string )
>  mods[i].string = copy_string(mods[i].string);
>  }
>  }
>  
> -if ( mbi->flags & MBI_MEMMAP )
> -mbi->mmap_addr = copy_mem(mbi->mmap_addr, mbi->mmap_length);
> +if ( mbi_out->flags & MBI_MEMMAP )
> +mbi_out->mmap_addr = copy_mem(mbi_out->mmap_addr, 
> mbi_out->mmap_length);
>  
> -if ( mbi->flags & MBI_LOADERNAME )
> -mbi->boot_loader_name = copy_string(mbi->boot_loader_name);
> +if ( mbi_out->flags & MBI_LOADERNAME )
> +mbi_out->boot_loader_name = copy_string(mbi_out->boot_loader_name);
>  
>  /* Mask features we don't understand or don't relocate. */
> -mbi->flags &= (MBI_MEMLIMITS |
> -   MBI_CMDLINE |
> -   MBI_MODULES |
> -   MBI_MEMMAP |
> -   MBI_LOADERNAME);
> +mbi_out->flags &= (MBI_MEMLIMITS |
> +   MBI_CMDLINE |
> +   MBI_MODULES |
> +   MBI_MEMMAP |
> +   MBI_LOADERNAME);
>  
> -return mbi;
> +return mbi_out;
>  }
> -- 
> 1.7.10.4
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-3.18 test] 60642: regressions - FAIL

2015-08-10 Thread osstest service owner
flight 60642 linux-3.18 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/60642/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail REGR. vs. 58581

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-libvirt  6 xen-boot  fail REGR. vs. 58581
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail 
baseline untested
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 13 guest-localmigrate 
fail baseline untested
 test-armhf-armhf-xl-rtds 11 guest-start fail baseline untested
 test-armhf-armhf-xl-credit2   6 xen-boot fail   like 58581
 test-armhf-armhf-xl-multivcpu  6 xen-boot fail  like 58581
 test-armhf-armhf-xl   6 xen-boot fail   like 58581
 test-armhf-armhf-libvirt-xsm  6 xen-boot fail   like 58581
 test-armhf-armhf-xl-xsm   6 xen-boot fail   like 58581
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail like 58581
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail like 58581
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail like 58581

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-armhf-armhf-xl-vhd   9 debian-di-installfail   never pass
 test-armhf-armhf-libvirt-qcow2  9 debian-di-installfail never pass
 test-armhf-armhf-libvirt-raw  9 debian-di-installfail   never pass
 test-amd64-amd64-libvirt-pair 21 guest-migrate/src_host/dst_host fail never 
pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck  6 xen-boot fail never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-pair 21 guest-migrate/src_host/dst_host fail never pass
 test-armhf-armhf-libvirt-vhd  9 debian-di-installfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 11 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow2 9 debian-di-installfail   never pass
 test-amd64-i386-libvirt-qcow2 11 migrate-support-checkfail  never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-raw   9 debian-di-installfail   never pass

version targeted for testing:
 linuxe9fd6ddcabf8695329f2462d3ece5a8442f2a8cf
baseline version:
 linuxd048c068d00da7d4cfa5ea7651933b99026958cf

Last test of basis58581  2015-06-15 09:42:22 Z   56 days
Failing since 58976  2015-06-29 19:43:23 Z   41 days   42 attempts
Testing same since60642  2015-08-09 13:18:01 Z1 days1 attempts


383 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass 

Re: [Xen-devel] [PATCH 3.2 110/110] x86/ldt: Make modify_ldt synchronous

2015-08-10 Thread Andy Lutomirski
On Mon, Aug 10, 2015 at 3:12 AM, Ben Hutchings  wrote:
> 3.2.71-rc1 review patch.  If anyone has any objections, please let me know.
>
> --
>
> From: Andy Lutomirski 
>
> commit 37868fe113ff2ba814b3b4eb12df214df555f8dc upstream.

Unfortunately, this patch was slightly buggy.  The fixes are:

https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=4809146b86c3d41ce588fdb767d021e2a80600dd

https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/urgent&id=136d9d83c07c5e30ac49fc83b27e8c4842f108fc

Grr, making major changes like this in the middle of a release cycle
isn't the best.

--Andy

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv1] xen/events/fifo: Handle linked events when closing a PIRQ port

2015-08-10 Thread linux

On 2015-08-10 16:24, David Vrabel wrote:

Commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8 (xen/events/fifo:
Handle linked events when closing a port) did not handle closing a
port bound to a PIRQ because these are closed from shutdown_pirq()
which is called with interrupts disabled.

Defer the close to a work queue where we can safely spin waiting for
the LINKED bit to clear.  For simplicity, the close is always deferred
even if it is not required (i.e., we're already in process context).

Signed-off-by: David Vrabel 
Cc: Ross Lagerwall 
---
Cc: Sander Eikelenboom 


Hi David,

Tested your patch, don't know for sure but this doesn't seem to work 
out.

I end up with this event channel error on dom0 boot.

Which ends in state:
NameID   Mem VCPUs  State   Time(s)
(null)   0  1536 6 r-
 183.8


--
Sander

(XEN) [2015-08-10 16:35:34.584] PCI add device :0d:00.0
(XEN) [2015-08-10 16:35:34.891] PCI add device :0c:00.0
(XEN) [2015-08-10 16:35:35.123] PCI add device :0b:00.0
(XEN) [2015-08-10 16:35:35.325] PCI add device :0a:00.0
(XEN) [2015-08-10 16:35:35.574] PCI add device :09:00.0
(XEN) [2015-08-10 16:35:35.642] PCI add device :09:00.1
(XEN) [2015-08-10 16:35:35.872] PCI add device :05:00.0
(XEN) [2015-08-10 16:35:36.044] PCI add device :06:01.0
(XEN) [2015-08-10 16:35:36.109] PCI add device :06:02.0
(XEN) [2015-08-10 16:35:36.293] PCI add device :08:00.0
(XEN) [2015-08-10 16:35:36.603] PCI add device :07:00.0
(XEN) [2015-08-10 16:35:36.906] PCI add device :04:00.0
(XEN) [2015-08-10 16:35:37.074] PCI add device :03:06.0
(XEN) [2015-08-10 16:35:39.456] PCI: Using MCFG for segment  bus 
00-ff
(XEN) [2015-08-10 16:35:49.623] d0: Forcing read-only access to MFN 
fed00
(XEN) [2015-08-10 16:35:51.374] event_channel.c:472:d0v0 EVTCHNOP 
failure: error -17





---
 drivers/xen/events/events_2l.c   | 10 +++
 drivers/xen/events/events_base.c | 13 +
 drivers/xen/events/events_fifo.c | 52 
+++-

 drivers/xen/events/events_internal.h |  5 ++--
 4 files changed, 53 insertions(+), 27 deletions(-)

diff --git a/drivers/xen/events/events_2l.c 
b/drivers/xen/events/events_2l.c

index 7dd4631..82c90de 100644
--- a/drivers/xen/events/events_2l.c
+++ b/drivers/xen/events/events_2l.c
@@ -354,6 +354,15 @@ static void evtchn_2l_resume(void)
EVTCHN_2L_NR_CHANNELS/BITS_PER_EVTCHN_WORD);
 }

+static void evtchn_2l_close(unsigned int port, unsigned int cpu)
+{
+   struct evtchn_close close;
+
+   close.port = port;
+   if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
+   BUG();
+}
+
 static const struct evtchn_ops evtchn_ops_2l = {
.max_channels  = evtchn_2l_max_channels,
.nr_channels   = evtchn_2l_max_channels,
@@ -366,6 +375,7 @@ static const struct evtchn_ops evtchn_ops_2l = {
.unmask= evtchn_2l_unmask,
.handle_events = evtchn_2l_handle_events,
.resume= evtchn_2l_resume,
+   .close = evtchn_2l_close,
 };

 void __init xen_evtchn_2l_init(void)
diff --git a/drivers/xen/events/events_base.c 
b/drivers/xen/events/events_base.c

index 1495ecc..e3f0049 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -452,17 +452,6 @@ static void xen_free_irq(unsigned irq)
irq_free_desc(irq);
 }

-static void xen_evtchn_close(unsigned int port, unsigned int cpu)
-{
-   struct evtchn_close close;
-
-   xen_evtchn_op_close(port, cpu);
-
-   close.port = port;
-   if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
-   BUG();
-}
-
 static void pirq_query_unmask(int irq)
 {
struct physdev_irq_status_query irq_status;
@@ -546,7 +535,7 @@ out:

 err:
pr_err("irq%d: Failed to set port to irq mapping (%d)\n", irq, rc);
-   xen_evtchn_close(evtchn, NR_CPUS);
+   xen_evtchn_close(evtchn, 0);
return 0;
 }

diff --git a/drivers/xen/events/events_fifo.c 
b/drivers/xen/events/events_fifo.c

index 6df8aac..149e1e9 100644
--- a/drivers/xen/events/events_fifo.c
+++ b/drivers/xen/events/events_fifo.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include 
@@ -385,24 +386,51 @@ static void evtchn_fifo_resume(void)
event_array_pages = 0;
 }

+struct close_work {
+   struct work_struct work;
+   unsigned int port;
+};
+
+static void evtchn_fifo_close_work(struct work_struct *work)
+{
+   struct close_work *cw = container_of(work, struct close_work, work);
+   struct evtchn_close close;
+
+   while (evtchn_fifo_is_linked(cw->port))
+   cpu_relax();
+
+   close.port = cw->port;
+   if (HYPERVISOR_event_channel_op(EVTCHNOP_close, &close) != 0)
+   BUG();
+
+   kfree(cw);
+}
+
 static void evtchn_fifo_close(unsi

Re: [Xen-devel] [PATCHv1] xen/events/fifo: Handle linked events when closing a PIRQ port

2015-08-10 Thread David Vrabel
On 10/08/15 17:47, li...@eikelenboom.it wrote:
> On 2015-08-10 16:24, David Vrabel wrote:
>> Commit fcdf31a7c162de0c93a2bee51df4688ab0a348f8 (xen/events/fifo:
>> Handle linked events when closing a port) did not handle closing a
>> port bound to a PIRQ because these are closed from shutdown_pirq()
>> which is called with interrupts disabled.
>>
>> Defer the close to a work queue where we can safely spin waiting for
>> the LINKED bit to clear.  For simplicity, the close is always deferred
>> even if it is not required (i.e., we're already in process context).
>>
>> Signed-off-by: David Vrabel 
>> Cc: Ross Lagerwall 
>> ---
>> Cc: Sander Eikelenboom 
> 
> Hi David,
> 
> Tested your patch, don't know for sure but this doesn't seem to work out.
> I end up with this event channel error on dom0 boot.
> 
> Which ends in state:
> NameID   Mem VCPUsState   
> Time(s)
> (null)   0  1536 6 r-   
>  183.8
> 
> -- 
> Sander
> 
> (XEN) [2015-08-10 16:35:34.584] PCI add device :0d:00.0
> (XEN) [2015-08-10 16:35:34.891] PCI add device :0c:00.0
> (XEN) [2015-08-10 16:35:35.123] PCI add device :0b:00.0
> (XEN) [2015-08-10 16:35:35.325] PCI add device :0a:00.0
> (XEN) [2015-08-10 16:35:35.574] PCI add device :09:00.0
> (XEN) [2015-08-10 16:35:35.642] PCI add device :09:00.1
> (XEN) [2015-08-10 16:35:35.872] PCI add device :05:00.0
> (XEN) [2015-08-10 16:35:36.044] PCI add device :06:01.0
> (XEN) [2015-08-10 16:35:36.109] PCI add device :06:02.0
> (XEN) [2015-08-10 16:35:36.293] PCI add device :08:00.0
> (XEN) [2015-08-10 16:35:36.603] PCI add device :07:00.0
> (XEN) [2015-08-10 16:35:36.906] PCI add device :04:00.0
> (XEN) [2015-08-10 16:35:37.074] PCI add device :03:06.0
> (XEN) [2015-08-10 16:35:39.456] PCI: Using MCFG for segment  bus 00-ff
> (XEN) [2015-08-10 16:35:49.623] d0: Forcing read-only access to MFN fed00
> (XEN) [2015-08-10 16:35:51.374] event_channel.c:472:d0v0 EVTCHNOP
> failure: error -17

This didn't happen on the test box I used but I can see it is possible
to rebind a PIRQ whose close is still deferred.

I'm going to revert fcdf31a7c162de0c93a2bee51df4688ab0a348f8
(xen/events/fifo: Handle linked events when closing a port) for now.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen/xenbus: Don't leak memory when unmapping the ring on HVM backend

2015-08-10 Thread Julien Grall
The commit ccc9d90a9a8b5c4ad7e9708ec41f75ff9e98d61d "xenbus_client:
Extend interface to support multi-page ring" removes the call to
free_xenballooned_pages in xenbus_unmap_ring_vfree_hvm.

This will result to not give back the pages to Linux and loose them
forever. It only happens when the backends are running in HVM domains.

Signed-off-by: Julien Grall 

---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Cc: Wei Liu 

Appeared in Linux 4.1. HVM backend, which is always the case on ARM, will
leak every mapped ring (i.e ~12KB per domain with 1 disk and 1 vif).
---
 drivers/xen/xenbus/xenbus_client.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c 
b/drivers/xen/xenbus/xenbus_client.c
index 9ad3272..e303535 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -814,8 +814,10 @@ static int xenbus_unmap_ring_vfree_hvm(struct 
xenbus_device *dev, void *vaddr)
 
rv = xenbus_unmap_ring(dev, node->handles, node->nr_handles,
   addrs);
-   if (!rv)
+   if (!rv) {
vunmap(vaddr);
+   free_xenballooned_pages(node->nr_handles, node->hvm.pages);
+   }
else
WARN(1, "Leaking %p, size %u page(s)\n", vaddr,
 node->nr_handles);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/xenbus: Don't leak memory when unmapping the ring on HVM backend

2015-08-10 Thread Boris Ostrovsky



On 08/10/2015 02:10 PM, Julien Grall wrote:

The commit ccc9d90a9a8b5c4ad7e9708ec41f75ff9e98d61d "xenbus_client:
Extend interface to support multi-page ring" removes the call to
free_xenballooned_pages in xenbus_unmap_ring_vfree_hvm.

This will result to not give back the pages to Linux and loose them
forever. It only happens when the backends are running in HVM domains.

Signed-off-by: Julien Grall 


Reviewed-by: Boris Ostrovsky 



---
Cc: Konrad Rzeszutek Wilk 
Cc: Boris Ostrovsky 
Cc: David Vrabel 
Cc: Wei Liu 

Appeared in Linux 4.1. HVM backend, which is always the case on ARM, will
leak every mapped ring (i.e ~12KB per domain with 1 disk and 1 vif).
---
  drivers/xen/xenbus/xenbus_client.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/xenbus/xenbus_client.c 
b/drivers/xen/xenbus/xenbus_client.c
index 9ad3272..e303535 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -814,8 +814,10 @@ static int xenbus_unmap_ring_vfree_hvm(struct 
xenbus_device *dev, void *vaddr)
  
  	rv = xenbus_unmap_ring(dev, node->handles, node->nr_handles,

   addrs);
-   if (!rv)
+   if (!rv) {
vunmap(vaddr);
+   free_xenballooned_pages(node->nr_handles, node->hvm.pages);
+   }
else
WARN(1, "Leaking %p, size %u page(s)\n", vaddr,
 node->nr_handles);



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   >