date:20181213

Re: [Xen-devel] [PATCH v3 1/1] cameraif: add ABI for para-virtual camera

2018-12-13 Thread Oleksandr Andrushchenko


PFA the diff between v2 and v3 for your convenience

On 12/12/18 11:49 AM, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

This is the ABI for the two halves of a para-virtualized
camera driver which extends Xen's reach multimedia capabilities even
farther enabling it for video conferencing, In-Vehicle Infotainment,
high definition maps etc.

The initial goal is to support most needed functionality with the
final idea to make it possible to extend the protocol if need be:

1. Provide means for base virtual device configuration:
  - pixel formats
  - resolutions
  - frame rates
2. Support basic camera controls:
  - contrast
  - brightness
  - hue
  - saturation
3. Support streaming control

Signed-off-by: Oleksandr Andrushchenko 
---
  xen/include/public/io/cameraif.h | 1374 ++
  1 file changed, 1374 insertions(+)
  create mode 100644 xen/include/public/io/cameraif.h

diff --git a/xen/include/public/io/cameraif.h b/xen/include/public/io/cameraif.h
new file mode 100644
index ..9aae0f47743b
--- /dev/null
+++ b/xen/include/public/io/cameraif.h
@@ -0,0 +1,1374 @@
+/**
+ * cameraif.h
+ *
+ * Unified camera device I/O interface for Xen guest OSes.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to
+ * deal in the Software without restriction, including without limitation the
+ * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Copyright (C) 2018 EPAM Systems Inc.
+ *
+ * Author: Oleksandr Andrushchenko 
+ */
+
+#ifndef __XEN_PUBLIC_IO_CAMERAIF_H__
+#define __XEN_PUBLIC_IO_CAMERAIF_H__
+
+#include "ring.h"
+#include "../grant_table.h"
+
+/*
+ **
+ *   Protocol version
+ **
+ */
+#define XENCAMERA_PROTOCOL_VERSION "1"
+
+/*
+ **
+ *  Feature and Parameter Negotiation
+ **
+ *
+ * Front->back notifications: when enqueuing a new request, sending a
+ * notification can be made conditional on xencamera_req (i.e., the generic
+ * hold-off mechanism provided by the ring macros). Backends must set
+ * xencamera_req appropriately (e.g., using RING_FINAL_CHECK_FOR_REQUESTS()).
+ *
+ * Back->front notifications: when enqueuing a new response, sending a
+ * notification can be made conditional on xencamera_resp (i.e., the generic
+ * hold-off mechanism provided by the ring macros). Frontends must set
+ * xencamera_resp appropriately (e.g., using RING_FINAL_CHECK_FOR_RESPONSES()).
+ *
+ * The two halves of a para-virtual camera driver utilize nodes within
+ * XenStore to communicate capabilities and to negotiate operating parameters.
+ * This section enumerates these nodes which reside in the respective front and
+ * backend portions of XenStore, following the XenBus convention.
+ *
+ * All data in XenStore is stored as strings. Nodes specifying numeric
+ * values are encoded in decimal. Integer value ranges listed below are
+ * expressed as fixed sized integer types capable of storing the conversion
+ * of a properly formatted node string, without loss of information.
+ *
+ **
+ *Example configuration
+ **
+ *
+ * This is an example of backend and frontend configuration:
+ *
+ *- Backend ---
+ *
+ * /local/domain/0/backend/vcamera/1/0/frontend-id = "1"
+ * /local/domain/0/backend/vcamera/1/0/frontend = 
"/local/domain/1/device/vcamera/0"
+ * /local/domain/0/backend/vcamera/1/0/state = "4"
+ * /local/domain/0/backend/vcamera/1/0/versions = "1,2"
+ *
+ *- Frontend

Re: [Xen-devel] [PATCH] drm/xen-front: Make shmem backed display buffer coherent

2018-12-13 Thread Oleksandr Andrushchenko


On 12/13/18 5:48 PM, Daniel Vetter wrote:

On Thu, Dec 13, 2018 at 12:17:54PM +0200, Oleksandr Andrushchenko wrote:

Daniel, could you please comment?

Cross-revieweing someone else's stuff would scale better,

fair enough

  I don't think
I'll get around to anything before next year.


I put you on CC explicitly because you had comments on other patch [1]

and this one tries to solve the issue raised (I tried to figure out

at [2] if this is the way to go, but it seems I have no alternative here).

While at it [3] (I hope) addresses your comments and the series just

needs your single ack/nack to get in: all the rest ack/r-b are already

there. Do you mind looking at it?


-Daniel


Thank you very much for your time,

Oleksandr


Thank you

On 11/27/18 12:32 PM, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

When GEM backing storage is allocated with drm_gem_get_pages
the backing pages may be cached, thus making it possible that
the backend sees only partial content of the buffer which may
lead to screen artifacts. Make sure that the frontend's
memory is coherent and the backend always sees correct display
buffer content.

Fixes: c575b7eeb89f ("drm/xen-front: Add support for Xen PV display frontend")

Signed-off-by: Oleksandr Andrushchenko 
---
   drivers/gpu/drm/xen/xen_drm_front_gem.c | 62 +++--
   1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c 
b/drivers/gpu/drm/xen/xen_drm_front_gem.c
index 47ff019d3aef..c592735e49d2 100644
--- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
+++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
@@ -33,8 +33,11 @@ struct xen_gem_object {
/* set for buffers allocated by the backend */
bool be_alloc;
-   /* this is for imported PRIME buffer */
-   struct sg_table *sgt_imported;
+   /*
+* this is for imported PRIME buffer or the one allocated via
+* drm_gem_get_pages.
+*/
+   struct sg_table *sgt;
   };
   static inline struct xen_gem_object *
@@ -77,10 +80,21 @@ static struct xen_gem_object *gem_create_obj(struct 
drm_device *dev,
return xen_obj;
   }
+struct sg_table *xen_drm_front_gem_get_sg_table(struct drm_gem_object *gem_obj)
+{
+   struct xen_gem_object *xen_obj = to_xen_gem_obj(gem_obj);
+
+   if (!xen_obj->pages)
+   return ERR_PTR(-ENOMEM);
+
+   return drm_prime_pages_to_sg(xen_obj->pages, xen_obj->num_pages);
+}
+
   static struct xen_gem_object *gem_create(struct drm_device *dev, size_t size)
   {
struct xen_drm_front_drm_info *drm_info = dev->dev_private;
struct xen_gem_object *xen_obj;
+   struct address_space *mapping;
int ret;
size = round_up(size, PAGE_SIZE);
@@ -113,10 +127,14 @@ static struct xen_gem_object *gem_create(struct 
drm_device *dev, size_t size)
xen_obj->be_alloc = true;
return xen_obj;
}
+
/*
 * need to allocate backing pages now, so we can share those
 * with the backend
 */
+   mapping = xen_obj->base.filp->f_mapping;
+   mapping_set_gfp_mask(mapping, GFP_USER | __GFP_DMA32);
+
xen_obj->num_pages = DIV_ROUND_UP(size, PAGE_SIZE);
xen_obj->pages = drm_gem_get_pages(_obj->base);
if (IS_ERR_OR_NULL(xen_obj->pages)) {
@@ -125,8 +143,27 @@ static struct xen_gem_object *gem_create(struct drm_device 
*dev, size_t size)
goto fail;
}
+   xen_obj->sgt = xen_drm_front_gem_get_sg_table(_obj->base);
+   if (IS_ERR_OR_NULL(xen_obj->sgt)){
+   ret = PTR_ERR(xen_obj->sgt);
+   xen_obj->sgt = NULL;
+   goto fail_put_pages;
+   }
+
+   if (!dma_map_sg(dev->dev, xen_obj->sgt->sgl, xen_obj->sgt->nents,
+   DMA_BIDIRECTIONAL)) {
+   ret = -EFAULT;
+   goto fail_free_sgt;
+   }
+
return xen_obj;
+fail_free_sgt:
+   sg_free_table(xen_obj->sgt);
+   xen_obj->sgt = NULL;
+fail_put_pages:
+   drm_gem_put_pages(_obj->base, xen_obj->pages, true, false);
+   xen_obj->pages = NULL;
   fail:
DRM_ERROR("Failed to allocate buffer with size %zu\n", size);
return ERR_PTR(ret);
@@ -149,7 +186,7 @@ void xen_drm_front_gem_free_object_unlocked(struct 
drm_gem_object *gem_obj)
struct xen_gem_object *xen_obj = to_xen_gem_obj(gem_obj);
if (xen_obj->base.import_attach) {
-   drm_prime_gem_destroy(_obj->base, xen_obj->sgt_imported);
+   drm_prime_gem_destroy(_obj->base, xen_obj->sgt);
gem_free_pages_array(xen_obj);
} else {
if (xen_obj->pages) {
@@ -158,6 +195,13 @@ void xen_drm_front_gem_free_object_unlocked(struct 
drm_gem_object *gem_obj)
xen_obj->pages);
gem_free_pages_array(xen_obj);
} else {
+

[Xen-devel] [qemu-mainline test] 131285: regressions - FAIL

2018-12-13 Thread osstest service owner

flight 131285 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131285/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 131172
 test-armhf-armhf-libvirt-raw  7 xen-boot fail REGR. vs. 131172

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 131172
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 131172
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 131172
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 131172
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 qemuu4b3aab204204ca742836219b97b538d90584f4f2
baseline version:
 qemuu4f818e7b7f8ecb5c166d093b8859fec2ddeca2ef

Last test of basis   131172  2018-12-09 12:08:42 Z4 days
Failing since131240  2018-12-11 17:37:01 Z2 days2 attempts
Testing same since   131285  2018-12-13 00:27:23 Z1 days1 attempts


People who touched revisions under test:
  Aleksandar Markovic 
  Alex Williamson 
  Corey Minyard 
  David Gibson 
  David Hildenbrand 
  Dongli Zhang 
  Eduardo Habkost 
  Eric Blake 
  Fabrice Desclaux 
  fabrice.descl...@cea.fr 
  Fam Zheng 
  Fam Zheng 
  Gerd Hoffmann 
  Kashyap Chamarthy 
  Laurent Vivier 
  Li Qiang 
  Li Qiang 
  Marc-André Lureau 
  Peter Maydell 
  Philippe Mathieu-Daudé 
  Thomas Huth 
  Yuval Shaia 
  Zhang Yi 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass

Re: [Xen-devel] [PATCH 2/2] avoid TABs in files that only contain a few

2018-12-13 Thread Michael S. Tsirkin

On Thu, Dec 13, 2018 at 11:37:37PM +0100, Paolo Bonzini wrote:
> Most files that have TABs only contain a handful of them.  Change
> them to spaces so that we don't confuse people.
> 
> disas, standard-headers, linux-headers and libdecnumber are imported
> from other projects and probably should be exempted from the check.

For sure for standard-headers, linux-headers since if someone
does contribute a patch we want them to contribue upstream.

> Outside those, after this patch the following files still contain both
> 8-space and TAB sequences at the beginning of the line.  Many of them
> have a majority of TABs, or were initially committed with all tabs.
> 
> bsd-user/i386/target_syscall.h
> bsd-user/x86_64/target_syscall.h
> crypto/aes.c
> hw/audio/fmopl.c
> hw/audio/fmopl.h
> hw/block/tc58128.c
> hw/display/cirrus_vga.c
> hw/display/xenfb.c
> hw/dma/etraxfs_dma.c
> hw/intc/sh_intc.c
> hw/misc/mst_fpga.c
> hw/net/pcnet.c
> hw/sh4/sh7750.c
> hw/timer/m48t59.c
> hw/timer/sh_timer.c
> include/crypto/aes.h
> include/disas/bfd.h
> include/hw/sh4/sh.h
> libdecnumber/decNumber.c
> linux-headers/asm-generic/unistd.h
> linux-headers/linux/kvm.h
> linux-user/alpha/target_syscall.h
> linux-user/arm/nwfpe/double_cpdo.c
> linux-user/arm/nwfpe/fpa11_cpdt.c
> linux-user/arm/nwfpe/fpa11_cprt.c
> linux-user/arm/nwfpe/fpa11.h
> linux-user/flat.h
> linux-user/flatload.c
> linux-user/i386/target_syscall.h
> linux-user/ppc/target_syscall.h
> linux-user/sparc/target_syscall.h
> linux-user/syscall.c
> linux-user/syscall_defs.h
> linux-user/x86_64/target_syscall.h
> slirp/cksum.c
> slirp/if.c
> slirp/ip.h
> slirp/ip_icmp.c
> slirp/ip_icmp.h
> slirp/ip_input.c
> slirp/ip_output.c
> slirp/mbuf.c
> slirp/misc.c
> slirp/sbuf.c
> slirp/socket.c
> slirp/socket.h
> slirp/tcp_input.c
> slirp/tcpip.h
> slirp/tcp_output.c
> slirp/tcp_subr.c
> slirp/tcp_timer.c
> slirp/tftp.c
> slirp/udp.c
> slirp/udp.h
> target/cris/cpu.h
> target/cris/mmu.c
> target/cris/op_helper.c
> target/sh4/helper.c
> target/sh4/op_helper.c
> target/sh4/translate.c
> tcg/sparc/tcg-target.inc.c
> tests/tcg/cris/check_addo.c
> tests/tcg/cris/check_moveq.c
> tests/tcg/cris/check_swap.c
> tests/tcg/multiarch/test-mmap.c
> ui/vnc-enc-hextile-template.h
> ui/vnc-enc-zywrle.h
> util/envlist.c
> util/readline.c
> 
> The following have only TABs:
> 
> bsd-user/i386/target_signal.h
> bsd-user/sparc64/target_signal.h
> bsd-user/sparc64/target_syscall.h
> bsd-user/sparc/target_signal.h
> bsd-user/sparc/target_syscall.h
> bsd-user/x86_64/target_signal.h
> crypto/desrfb.c
> hw/audio/intel-hda-defs.h
> hw/core/uboot_image.h
> hw/sh4/sh7750_regnames.c
> hw/sh4/sh7750_regs.h
> include/hw/cris/etraxfs_dma.h
> linux-user/alpha/termbits.h
> linux-user/arm/nwfpe/fpopcode.h
> linux-user/arm/nwfpe/fpsr.h
> linux-user/arm/syscall_nr.h
> linux-user/arm/target_signal.h
> linux-user/cris/target_signal.h
> linux-user/i386/target_signal.h
> linux-user/linux_loop.h
> linux-user/m68k/target_signal.h
> linux-user/microblaze/target_signal.h
> linux-user/mips64/target_signal.h
> linux-user/mips/target_signal.h
> linux-user/mips/target_syscall.h
> linux-user/mips/termbits.h
> linux-user/ppc/target_signal.h
> linux-user/sh4/target_signal.h
> linux-user/sh4/termbits.h
> linux-user/sparc64/target_syscall.h
> linux-user/sparc/target_signal.h
> linux-user/x86_64/target_signal.h
> linux-user/x86_64/termbits.h
> pc-bios/optionrom/optionrom.h
> slirp/mbuf.h
> slirp/misc.h
> slirp/sbuf.h
> slirp/tcp.h
> slirp/tcp_timer.h
> slirp/tcp_var.h
> target/i386/svm.h
> target/sparc/asi.h
> target/xtensa/core-dc232b/xtensa-modules.inc.c
> target/xtensa/core-dc233c/xtensa-modules.inc.c
> target/xtensa/core-de212/core-isa.h
> target/xtensa/core-de212/xtensa-modules.inc.c
> target/xtensa/core-fsf/xtensa-modules.inc.c
> target/xtensa/core-sample_controller/core-isa.h
> target/xtensa/core-sample_controller/xtensa-modules.inc.c
> target/xtensa/core-test_kc705_be/core-isa.h
> target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
> tests/tcg/cris/check_abs.c
> tests/tcg/cris/check_addc.c
> tests/tcg/cris/check_addcm.c
> tests/tcg/cris/check_addoq.c
> tests/tcg/cris/check_bound.c
> tests/tcg/cris/check_ftag.c
> tests/tcg/cris/check_int64.c
> tests/tcg/cris/check_lz.c
> tests/tcg/cris/check_openpf5.c
> tests/tcg/cris/check_sigalrm.c
> tests/tcg/cris/crisutils.h
> tests/tcg/cris/sys.c
> tests/tcg/i386/test-i386-ssse3.c
> ui/vgafont.h
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/bochs.c

[Xen-devel] [PATCH v2 6/5] tools/docs: Remove PVRDTSCP remnants

2018-12-13 Thread Andrew Cooper

PVRDTSCP is believed-unused, and its implementation has adverse consequences
on unrelated functionality in the hypervisor.  As a result, support has been
removed.

Modify libxl to provide a slightly more helpful error message if it encounters
PVRDTSCP being selected.  While adjusting TSC handling, make libxl check for
errors from the set_tsc hypercall.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
---
 docs/man/xen-tscmode.pod.7|  94 +---
 docs/man/xl.cfg.pod.5.in  |   9 +-
 docs/misc/pvrdtscp.c  | 307 --
 tools/libxl/libxl_x86.c   |  13 +-
 tools/python/xen/lowlevel/xc/xc.c |   2 +-
 5 files changed, 19 insertions(+), 406 deletions(-)
 delete mode 100644 docs/misc/pvrdtscp.c

diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
index 819c61d..1d81a3f 100644
--- a/docs/man/xen-tscmode.pod.7
+++ b/docs/man/xen-tscmode.pod.7
@@ -77,9 +77,7 @@ highest performance is required.
 
 =item * B (PVRDTSCP).
 
-High-TSC-frequency apps may be paravirtualized (modified) to
-obtain both correctness and highest performance; any unmodified
-apps must be TSC-resilient.
+This mode has been removed.
 
 =back
 
@@ -215,30 +213,6 @@ is emulated.  Note that, though emulated, the "apparent" 
TSC frequency
 will be the TSC frequency of the initial physical machine, even after
 migration.
 
-For environments where both TSC-safeness AND highest performance
-even across migration is a requirement, application code can be specially
-modified to use an algorithm explicitly designed into Xen for this purpose.
-This mode (tsc_mode==3) is called PVRDTSCP, because it requires
-app paravirtualization (awareness by the app that it may be running
-on top of Xen), and utilizes a variation of the rdtsc instruction
-called rdtscp that is available on most recent generation processors.
-(The rdtscp instruction differs from the rdtsc instruction in that it
-reads not only the TSC but an additional register set by system software.)
-When a pvrdtscp-modified app is running on a processor that is both TSC-safe
-and supports the rdtscp instruction, information can be obtained
-about migration and TSC frequency/offset adjustment to allow the
-vast majority of timestamps to be obtained at top performance; when
-running on a TSC-unsafe processor or a processor that doesn't support
-the rdtscp instruction, rdtscp is emulated.
-
-PVRDTSCP (tsc_mode==3) has two limitations.  First, it applies to
-all apps running in this virtual machine.  This means that all
-apps must either be TSC-resilient or pvrdtscp-modified.  Second,
-highest performance is only obtained on TSC-safe machines that
-support the rdtscp instruction; when running on older machines,
-rdtscp is emulated and thus slower.  For more information on PVRDTSCP,
-see below.
-
 Finally, tsc_mode==1 always enables TSC emulation, regardless of
 the underlying physical hardware. The "apparent" TSC frequency will
 be the TSC frequency of the initial physical machine, even after migration.
@@ -287,56 +261,7 @@ have been replaced by a paravirtualized equivalent of the 
cpuid instruction
 ("pvcpuid") and also trap to Xen.  But apps in a PV guest that use a
 cpuid instruction execute it directly, without a trap to Xen.  As a result,
 an app may directly examine the physical TSC Invariant cpuid bit and make
-decisions based on that bit.  This is still an unsolved problem, though
-a workaround exists as part of the PVRDTSCP tsc_mode for apps that
-can be modified.
-
-=head1 MORE ON PVRDTSCP
-
-Paravirtualized OS's use the "pvclock" algorithm to manage the passing
-of time.  This sophisticated algorithm obtains information from a memory
-page shared between Xen and the OS and selects information from this
-page based on the current virtual CPU (vcpu) in order to properly adapt to
-TSC-unsafe systems and changes that occur across migration.  Neither
-this shared page nor the vcpu information is available to a userland
-app so the pvclock algorithm cannot be directly used by an app, at least
-without performance degradation roughly equal to the cost of just
-emulating an rdtsc.
-
-As a result, as of 4.0, Xen provides capabilities for a userland app
-to obtain key time values similar to the information accessible
-to the PV OS pvclock algorithm.  The app uses the rdtscp instruction
-which is defined in recent processors to obtain both the TSC and an
-auxiliary value called TSC_AUX.  Xen is responsible for setting TSC_AUX
-to the same value on all vcpus running any domain with tsc_mode==3;
-further, Xen tools are responsible for monotonically incrementing TSC_AUX
-anytime the domain is restored/migrated (thus changing key time values);
-and, when the domain is running on a physical machine that either
-is not TSC-safe or does not support the rdtscp instruction, Xen
-is responsible for emulating the rdtscp instruction and for setting
-TSC_AUX to zero on all processors.
-
-Xen also

Re: [Xen-devel] [PATCH 2/2] avoid TABs in files that only contain a few

2018-12-13 Thread David Gibson

On Thu, Dec 13, 2018 at 11:37:37PM +0100, Paolo Bonzini wrote:
> Most files that have TABs only contain a handful of them.  Change
> them to spaces so that we don't confuse people.
> 
> disas, standard-headers, linux-headers and libdecnumber are imported
> from other projects and probably should be exempted from the check.
> Outside those, after this patch the following files still contain both
> 8-space and TAB sequences at the beginning of the line.  Many of them
> have a majority of TABs, or were initially committed with all tabs.
> 
> bsd-user/i386/target_syscall.h
> bsd-user/x86_64/target_syscall.h
> crypto/aes.c
> hw/audio/fmopl.c
> hw/audio/fmopl.h
> hw/block/tc58128.c
> hw/display/cirrus_vga.c
> hw/display/xenfb.c
> hw/dma/etraxfs_dma.c
> hw/intc/sh_intc.c
> hw/misc/mst_fpga.c
> hw/net/pcnet.c
> hw/sh4/sh7750.c
> hw/timer/m48t59.c
> hw/timer/sh_timer.c
> include/crypto/aes.h
> include/disas/bfd.h
> include/hw/sh4/sh.h
> libdecnumber/decNumber.c
> linux-headers/asm-generic/unistd.h
> linux-headers/linux/kvm.h
> linux-user/alpha/target_syscall.h
> linux-user/arm/nwfpe/double_cpdo.c
> linux-user/arm/nwfpe/fpa11_cpdt.c
> linux-user/arm/nwfpe/fpa11_cprt.c
> linux-user/arm/nwfpe/fpa11.h
> linux-user/flat.h
> linux-user/flatload.c
> linux-user/i386/target_syscall.h
> linux-user/ppc/target_syscall.h
> linux-user/sparc/target_syscall.h
> linux-user/syscall.c
> linux-user/syscall_defs.h
> linux-user/x86_64/target_syscall.h
> slirp/cksum.c
> slirp/if.c
> slirp/ip.h
> slirp/ip_icmp.c
> slirp/ip_icmp.h
> slirp/ip_input.c
> slirp/ip_output.c
> slirp/mbuf.c
> slirp/misc.c
> slirp/sbuf.c
> slirp/socket.c
> slirp/socket.h
> slirp/tcp_input.c
> slirp/tcpip.h
> slirp/tcp_output.c
> slirp/tcp_subr.c
> slirp/tcp_timer.c
> slirp/tftp.c
> slirp/udp.c
> slirp/udp.h
> target/cris/cpu.h
> target/cris/mmu.c
> target/cris/op_helper.c
> target/sh4/helper.c
> target/sh4/op_helper.c
> target/sh4/translate.c
> tcg/sparc/tcg-target.inc.c
> tests/tcg/cris/check_addo.c
> tests/tcg/cris/check_moveq.c
> tests/tcg/cris/check_swap.c
> tests/tcg/multiarch/test-mmap.c
> ui/vnc-enc-hextile-template.h
> ui/vnc-enc-zywrle.h
> util/envlist.c
> util/readline.c
> 
> The following have only TABs:
> 
> bsd-user/i386/target_signal.h
> bsd-user/sparc64/target_signal.h
> bsd-user/sparc64/target_syscall.h
> bsd-user/sparc/target_signal.h
> bsd-user/sparc/target_syscall.h
> bsd-user/x86_64/target_signal.h
> crypto/desrfb.c
> hw/audio/intel-hda-defs.h
> hw/core/uboot_image.h
> hw/sh4/sh7750_regnames.c
> hw/sh4/sh7750_regs.h
> include/hw/cris/etraxfs_dma.h
> linux-user/alpha/termbits.h
> linux-user/arm/nwfpe/fpopcode.h
> linux-user/arm/nwfpe/fpsr.h
> linux-user/arm/syscall_nr.h
> linux-user/arm/target_signal.h
> linux-user/cris/target_signal.h
> linux-user/i386/target_signal.h
> linux-user/linux_loop.h
> linux-user/m68k/target_signal.h
> linux-user/microblaze/target_signal.h
> linux-user/mips64/target_signal.h
> linux-user/mips/target_signal.h
> linux-user/mips/target_syscall.h
> linux-user/mips/termbits.h
> linux-user/ppc/target_signal.h
> linux-user/sh4/target_signal.h
> linux-user/sh4/termbits.h
> linux-user/sparc64/target_syscall.h
> linux-user/sparc/target_signal.h
> linux-user/x86_64/target_signal.h
> linux-user/x86_64/termbits.h
> pc-bios/optionrom/optionrom.h
> slirp/mbuf.h
> slirp/misc.h
> slirp/sbuf.h
> slirp/tcp.h
> slirp/tcp_timer.h
> slirp/tcp_var.h
> target/i386/svm.h
> target/sparc/asi.h
> target/xtensa/core-dc232b/xtensa-modules.inc.c
> target/xtensa/core-dc233c/xtensa-modules.inc.c
> target/xtensa/core-de212/core-isa.h
> target/xtensa/core-de212/xtensa-modules.inc.c
> target/xtensa/core-fsf/xtensa-modules.inc.c
> target/xtensa/core-sample_controller/core-isa.h
> target/xtensa/core-sample_controller/xtensa-modules.inc.c
> target/xtensa/core-test_kc705_be/core-isa.h
> target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
> tests/tcg/cris/check_abs.c
> tests/tcg/cris/check_addc.c
> tests/tcg/cris/check_addcm.c
> tests/tcg/cris/check_addoq.c
> tests/tcg/cris/check_bound.c
> tests/tcg/cris/check_ftag.c
> tests/tcg/cris/check_int64.c
> tests/tcg/cris/check_lz.c
> tests/tcg/cris/check_openpf5.c
> tests/tcg/cris/check_sigalrm.c
> tests/tcg/cris/crisutils.h
> tests/tcg/cris/sys.c
> tests/tcg/i386/test-i386-ssse3.c
> ui/vgafont.h
> 
> Signed-off-by: Paolo Bonzini 

ppc parts

Acked-by: David Gibson 

> ---
>  block/bochs.c  | 22 ++---
>  block/file-posix.c |  2 +-
>

Re: [Xen-devel] [PATCH 2/2] avoid TABs in files that only contain a few

2018-12-13 Thread Richard Henderson

On 12/13/18 4:37 PM, Paolo Bonzini wrote:
> Most files that have TABs only contain a handful of them.  Change
> them to spaces so that we don't confuse people.


Acked-by: Richard Henderson 


r~


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH 2/2] avoid TABs in files that only contain a few

2018-12-13 Thread Paolo Bonzini

Most files that have TABs only contain a handful of them.  Change
them to spaces so that we don't confuse people.

disas, standard-headers, linux-headers and libdecnumber are imported
from other projects and probably should be exempted from the check.
Outside those, after this patch the following files still contain both
8-space and TAB sequences at the beginning of the line.  Many of them
have a majority of TABs, or were initially committed with all tabs.

bsd-user/i386/target_syscall.h
bsd-user/x86_64/target_syscall.h
crypto/aes.c
hw/audio/fmopl.c
hw/audio/fmopl.h
hw/block/tc58128.c
hw/display/cirrus_vga.c
hw/display/xenfb.c
hw/dma/etraxfs_dma.c
hw/intc/sh_intc.c
hw/misc/mst_fpga.c
hw/net/pcnet.c
hw/sh4/sh7750.c
hw/timer/m48t59.c
hw/timer/sh_timer.c
include/crypto/aes.h
include/disas/bfd.h
include/hw/sh4/sh.h
libdecnumber/decNumber.c
linux-headers/asm-generic/unistd.h
linux-headers/linux/kvm.h
linux-user/alpha/target_syscall.h
linux-user/arm/nwfpe/double_cpdo.c
linux-user/arm/nwfpe/fpa11_cpdt.c
linux-user/arm/nwfpe/fpa11_cprt.c
linux-user/arm/nwfpe/fpa11.h
linux-user/flat.h
linux-user/flatload.c
linux-user/i386/target_syscall.h
linux-user/ppc/target_syscall.h
linux-user/sparc/target_syscall.h
linux-user/syscall.c
linux-user/syscall_defs.h
linux-user/x86_64/target_syscall.h
slirp/cksum.c
slirp/if.c
slirp/ip.h
slirp/ip_icmp.c
slirp/ip_icmp.h
slirp/ip_input.c
slirp/ip_output.c
slirp/mbuf.c
slirp/misc.c
slirp/sbuf.c
slirp/socket.c
slirp/socket.h
slirp/tcp_input.c
slirp/tcpip.h
slirp/tcp_output.c
slirp/tcp_subr.c
slirp/tcp_timer.c
slirp/tftp.c
slirp/udp.c
slirp/udp.h
target/cris/cpu.h
target/cris/mmu.c
target/cris/op_helper.c
target/sh4/helper.c
target/sh4/op_helper.c
target/sh4/translate.c
tcg/sparc/tcg-target.inc.c
tests/tcg/cris/check_addo.c
tests/tcg/cris/check_moveq.c
tests/tcg/cris/check_swap.c
tests/tcg/multiarch/test-mmap.c
ui/vnc-enc-hextile-template.h
ui/vnc-enc-zywrle.h
util/envlist.c
util/readline.c

The following have only TABs:

bsd-user/i386/target_signal.h
bsd-user/sparc64/target_signal.h
bsd-user/sparc64/target_syscall.h
bsd-user/sparc/target_signal.h
bsd-user/sparc/target_syscall.h
bsd-user/x86_64/target_signal.h
crypto/desrfb.c
hw/audio/intel-hda-defs.h
hw/core/uboot_image.h
hw/sh4/sh7750_regnames.c
hw/sh4/sh7750_regs.h
include/hw/cris/etraxfs_dma.h
linux-user/alpha/termbits.h
linux-user/arm/nwfpe/fpopcode.h
linux-user/arm/nwfpe/fpsr.h
linux-user/arm/syscall_nr.h
linux-user/arm/target_signal.h
linux-user/cris/target_signal.h
linux-user/i386/target_signal.h
linux-user/linux_loop.h
linux-user/m68k/target_signal.h
linux-user/microblaze/target_signal.h
linux-user/mips64/target_signal.h
linux-user/mips/target_signal.h
linux-user/mips/target_syscall.h
linux-user/mips/termbits.h
linux-user/ppc/target_signal.h
linux-user/sh4/target_signal.h
linux-user/sh4/termbits.h
linux-user/sparc64/target_syscall.h
linux-user/sparc/target_signal.h
linux-user/x86_64/target_signal.h
linux-user/x86_64/termbits.h
pc-bios/optionrom/optionrom.h
slirp/mbuf.h
slirp/misc.h
slirp/sbuf.h
slirp/tcp.h
slirp/tcp_timer.h
slirp/tcp_var.h
target/i386/svm.h
target/sparc/asi.h
target/xtensa/core-dc232b/xtensa-modules.inc.c
target/xtensa/core-dc233c/xtensa-modules.inc.c
target/xtensa/core-de212/core-isa.h
target/xtensa/core-de212/xtensa-modules.inc.c
target/xtensa/core-fsf/xtensa-modules.inc.c
target/xtensa/core-sample_controller/core-isa.h
target/xtensa/core-sample_controller/xtensa-modules.inc.c
target/xtensa/core-test_kc705_be/core-isa.h
target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
tests/tcg/cris/check_abs.c
tests/tcg/cris/check_addc.c
tests/tcg/cris/check_addcm.c
tests/tcg/cris/check_addoq.c
tests/tcg/cris/check_bound.c
tests/tcg/cris/check_ftag.c
tests/tcg/cris/check_int64.c
tests/tcg/cris/check_lz.c
tests/tcg/cris/check_openpf5.c
tests/tcg/cris/check_sigalrm.c
tests/tcg/cris/crisutils.h
tests/tcg/cris/sys.c
tests/tcg/i386/test-i386-ssse3.c
ui/vgafont.h

Signed-off-by: Paolo Bonzini 
---
 block/bochs.c  | 22 ++---
 block/file-posix.c |  2 +-
 block/file-win32.c |  8 +-
 block/linux-aio.c  |  4 +-
 block/qcow2-cluster.c  |  2 +-
 block/vpc.c|  2 +-
 bsd-user/elfload.c |  2 +-
 contrib/elf2dmp/main.c |  2 +-
 hw/alpha/typhoon.c | 12 +--
 hw/arm/stellaris.c

Re: [Xen-devel] [PATCH for-4.12] dt: fix integer check in dt_device_is_compatible

2018-12-13 Thread Stefano Stabellini

On Thu, 13 Dec 2018, Julien Grall wrote:
> Hi Stefano,
> 
> On 12/12/18 11:53 PM, Stefano Stabellini wrote:
> > On Wed, 12 Dec 2018, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 11/12/2018 18:46, Stefano Stabellini wrote:
> > > > cplen is unsigned, thus, it can never be < 0. Use a signed integer local
> > > > variable instead.
> > > 
> > > The current code checks > 0. Looking at the code I don't think it can ever
> > > be
> > > negative. So can you details what is the problem you are trying to
> > > resolve?
> > 
> > Yes, it can be "negative" (not actually negative because it is defined
> > as unsigned), in fact it happens with the nodes added dynamically by grub
> > at boot. This patches fixes booting from grub.
> > 
> > Specifically ilen is initialed to 16, but strlen+1 is 17, so length
> > becomes -1. The length of the property generated by grub seems to be
> > short by 1.
> Such explanation should have been in the commit message, it helps to
> diagnostics where the problem is coming from.
> 
> Looking at the specification [1] section 2.3.1, a compatible property is a
> concatenated list of null terminated strings. So the current code is actually
> correct and there is a bug in GRUB.
> 
> This bug was actually fixed by commit ae5817f1d "arm64/xen_boot: Fix Xen boot
> using GRUB2 on AARCH64".
> 
> I assume you are using Grub 2.02 which does not contain the full support for
> Xen. So I would not even consider to work-around it in Xen.
> 
> Instead, I would recommend you to upgrade to a more recent GRUB. It would be
> more likely staging as there are no new GRUB version.
> 
> Another solution is to use chainloading.

it is true that it was a "temporary" bug in grub

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/3] x86/svm: Simplify svm_get_insn_len()

2018-12-13 Thread Andrew Cooper

The existing __get_instruction_length_from_list() has a single user
which uses the list functionality.  That user however should be looking
specifically for INVD or WBINVD, as reported by the vmexit exit reason.

Modify svm_vmexit_do_invalidate_cache() to ask for the correct
instruction, and drop all list functionality from the helper.

Take the opportunity to rename it to svm_get_insn_len(), and drop the
IOIO length handling whch has never been used.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Boris Ostrovsky 
CC: Suravee Suthikulpanit 
CC: Brian Woods 

v2:
 * New
---
 xen/arch/x86/hvm/svm/emulate.c| 65 ---
 xen/arch/x86/hvm/svm/nestedsvm.c  |  9 ++---
 xen/arch/x86/hvm/svm/svm.c| 34 +-
 xen/include/asm-x86/hvm/svm/emulate.h |  9 +
 4 files changed, 51 insertions(+), 66 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
index 3d04af0..3f695b9 100644
--- a/xen/arch/x86/hvm/svm/emulate.c
+++ b/xen/arch/x86/hvm/svm/emulate.c
@@ -83,13 +83,12 @@ static const struct {
 [INSTR_CPUID]   = { X86EMUL_OPC(0x0f, 0xa2) },
 };
 
-int __get_instruction_length_from_list(struct vcpu *v,
-const enum instruction_index *list, unsigned int list_count)
+int svm_get_insn_len(struct vcpu *v, enum instruction_index insn)
 {
 struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb;
 struct hvm_emulate_ctxt ctxt;
 struct x86_emulate_state *state;
-unsigned long inst_len, j;
+unsigned long nrip_len, emul_len;
 unsigned int modrm_rm, modrm_reg;
 int modrm_mod;
 
@@ -98,13 +97,10 @@ int __get_instruction_length_from_list(struct vcpu *v,
  * hardware.
  */
 #ifdef NDEBUG
-if ( (inst_len = svm_nextrip_insn_length(v)) > MAX_INST_LEN )
-gprintk(XENLOG_WARNING, "NRip reported inst_len %lu\n", inst_len);
-else if ( inst_len != 0 )
-return inst_len;
-
-if ( vmcb->exitcode == VMEXIT_IOIO )
-return vmcb->exitinfo2 - vmcb->rip;
+if ( (nrip_len = svm_nextrip_insn_length(v)) > MAX_INST_LEN )
+gprintk(XENLOG_WARNING, "NRip reported inst_len %lu\n", nrip_len);
+else if ( nrip_len != 0 )
+return nrip_len;
 #endif
 
 ASSERT(v == current);
@@ -114,46 +110,43 @@ int __get_instruction_length_from_list(struct vcpu *v,
 if ( IS_ERR_OR_NULL(state) )
 return 0;
 
-inst_len = x86_insn_length(state, );
+emul_len = x86_insn_length(state, );
 modrm_mod = x86_insn_modrm(state, _rm, _reg);
 x86_emulate_free_state(state);
+
 #ifndef NDEBUG
-if ( vmcb->exitcode == VMEXIT_IOIO )
-j = vmcb->exitinfo2 - vmcb->rip;
-else
-j = svm_nextrip_insn_length(v);
-if ( j && j != inst_len )
+nrip_len = svm_nextrip_insn_length(v);
+if ( nrip_len && nrip_len != emul_len )
 {
 gprintk(XENLOG_WARNING, "insn-len[%02x]=%lu (exp %lu)\n",
-ctxt.ctxt.opcode, inst_len, j);
-return j;
+ctxt.ctxt.opcode, nrip_len, emul_len);
+return nrip_len;
 }
 #endif
 
-for ( j = 0; j < list_count; j++ )
+if ( (unsigned int)insn >= ARRAY_SIZE(opc_tab) )
 {
-unsigned int instr = list[j];
-
-if ( instr >= ARRAY_SIZE(opc_tab) )
-{
-ASSERT_UNREACHABLE();
-break;
-}
-if ( opc_tab[instr].opcode == ctxt.ctxt.opcode )
-{
-if ( !opc_tab[instr].modrm.mod )
-return inst_len;
-
-if ( modrm_mod == opc_tab[instr].modrm.mod &&
- (modrm_rm & 7) == opc_tab[instr].modrm.rm &&
- (modrm_reg & 7) == opc_tab[instr].modrm.reg )
-return inst_len;
-}
+gdprintk(XENLOG_ERR, "insn %d out of range\n", insn);
+ASSERT_UNREACHABLE();
+goto out;
+}
+
+if ( opc_tab[insn].opcode == ctxt.ctxt.opcode )
+{
+if ( !opc_tab[insn].modrm.mod )
+return emul_len;
+
+if ( modrm_mod == opc_tab[insn].modrm.mod &&
+ (modrm_rm & 7) == opc_tab[insn].modrm.rm &&
+ (modrm_reg & 7) == opc_tab[insn].modrm.reg )
+return emul_len;
 }
 
 gdprintk(XENLOG_WARNING,
  "%s: Mismatch between expected and actual instruction: "
  "eip = %lx\n",  __func__, (unsigned long)vmcb->rip);
+
+ out:
 hvm_inject_hw_exception(TRAP_gp_fault, 0);
 return 0;
 }
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 9660202..35c1a04 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -743,8 +743,9 @@ nsvm_vcpu_vmrun(struct vcpu *v, struct cpu_user_regs *regs)
 struct nestedvcpu *nv = _nestedhvm(v);
 struct nestedsvm *svm = _nestedsvm(v);
 
-inst_len = __get_instruction_length(v, INSTR_VMRUN);
-if (inst_len == 0) {
+inst_len = svm_get_insn_len(v, INSTR_VMRUN);
+if ( inst_len == 0 )
+{

[Xen-devel] [PATCH v2 3/3] x86/hvm: Corrections to RDTSCP intercept handling

2018-12-13 Thread Andrew Cooper

For both VT-x and SVM, the RDTSCP intercept will trigger if the pipeline
supports the instruction, but the guest may have not have rdtscp in its
featureset.  Bring the vmexit handlers in line with the main emulator
behaviour by optionally handing back #UD.

Next on the AMD side, if RDTSCP actually ends up being intercepted on a debug
build, we first update regs->rcx, then call __get_instruction_length() asking
for RDTSC.  As the two instructions are different (and indeed, different
lengths!), __get_instruction_length_from_list() fails and hands back a #GP
fault.

This can demonstrated by putting a guest into tsc_mode="always emulate" and
executing an rdtscp instruction:

  (d1) --- Xen Test Framework ---
  (d1) Environment: HVM 64bit (Long mode 4 levels)
  (d1) Test rdtscp
  (d1) TSC mode 1
  (XEN) emulate.c:147:d1v0 svm_get_insn_len: Mismatch between expected and 
actual instruction:
  (XEN) emulate.c:152:d1v0   insn_index 8, opcode 0xf0031 modrm 0
  (XEN) emulate.c:154:d1v0   rip 0x10475f, nextrip 0x104762, len 3
  (XEN) SVM insn len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01 
f9 0f 31 5b 31 ff 31 c0 e9 c2 db ff ff 00
  (d1) **
  (d1) PANIC: Unhandled exception at 0008:0010475f
  (d1) Vec 13 #GP[]
  (d1) **

First, teach svm_get_insn_len() to cope with RDTSCP, and improve
svm_vmexit_do_rdtsc() to ask for the correct instruction.  Move the regs->rcx
adjustment into this function to ensure it gets done after we are done
potentially raising faults.

Reported-by: Paul Durrant 
Signed-off-by: Andrew Cooper 
Reviewed-by: Brian Woods 
Reviewed-by: Jan Beulich 
Reviewed-by: Kevin Tian 
---
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Paul Durrant 

v2:
 * Rebase
---
 xen/arch/x86/hvm/svm/emulate.c|  1 +
 xen/arch/x86/hvm/svm/svm.c| 22 +-
 xen/arch/x86/hvm/vmx/vmx.c|  8 
 xen/include/asm-x86/hvm/svm/emulate.h |  1 +
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
index 73cef5b..0778dc7 100644
--- a/xen/arch/x86/hvm/svm/emulate.c
+++ b/xen/arch/x86/hvm/svm/emulate.c
@@ -75,6 +75,7 @@ static const struct {
 [INSTR_STGI]= { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 4) },
 [INSTR_CLGI]= { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 5) },
 [INSTR_INVLPGA] = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 7) },
+[INSTR_RDTSCP]  = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 7, 1) },
 [INSTR_INVD]= { X86EMUL_OPC(0x0f, 0x08) },
 [INSTR_WBINVD]  = { X86EMUL_OPC(0x0f, 0x09) },
 [INSTR_WRMSR]   = { X86EMUL_OPC(0x0f, 0x30) },
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index f8b7e8b..d2c0d64 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2279,14 +2279,28 @@ static void svm_vmexit_do_hlt(struct vmcb_struct *vmcb,
 hvm_hlt(regs->eflags);
 }
 
-static void svm_vmexit_do_rdtsc(struct cpu_user_regs *regs)
+static void svm_vmexit_do_rdtsc(struct cpu_user_regs *regs, bool rdtscp)
 {
+struct vcpu *curr = current;
+const struct domain *currd = curr->domain;
+enum instruction_index insn = rdtscp ? INSTR_RDTSCP : INSTR_RDTSC;
 unsigned int inst_len;
 
-if ( (inst_len = svm_get_insn_len(current, INSTR_RDTSC)) == 0 )
+if ( rdtscp && !currd->arch.cpuid->extd.rdtscp &&
+ currd->arch.tsc_mode != TSC_MODE_PVRDTSCP )
+{
+hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
 return;
+}
+
+if ( (inst_len = svm_get_insn_len(curr, insn)) == 0 )
+return;
+
 __update_guest_eip(regs, inst_len);
 
+if ( rdtscp )
+regs->rcx = hvm_msr_tsc_aux(curr);
+
 hvm_rdtsc_intercept(regs);
 }
 
@@ -2966,10 +2980,8 @@ void svm_vmexit_handler(struct cpu_user_regs *regs)
 break;
 
 case VMEXIT_RDTSCP:
-regs->rcx = hvm_msr_tsc_aux(v);
-/* fall through */
 case VMEXIT_RDTSC:
-svm_vmexit_do_rdtsc(regs);
+svm_vmexit_do_rdtsc(regs, exit_reason == VMEXIT_RDTSCP);
 break;
 
 case VMEXIT_MONITOR:
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 7f77d1f..2166b0d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -3589,6 +3589,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0;
 unsigned int vector = 0, mode;
 struct vcpu *v = current;
+struct domain *currd = v->domain;
 
 __vmread(GUEST_RIP,>rip);
 __vmread(GUEST_RSP,>rsp);
@@ -3956,6 +3957,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 vmx_invlpg_intercept(exit_qualification);
 break;
 case EXIT_REASON_RDTSCP:
+if ( !currd->arch.cpuid->extd.rdtscp &&
+ currd->arch.tsc_mode != TSC_MODE_PVRDTSCP )
+{
+hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC);
+

[Xen-devel] [PATCH v2 2/3] x86/svm: Improve diagnostics when svm_get_insn_len() fails

2018-12-13 Thread Andrew Cooper

Sadly, a lone:

  (XEN) emulate.c:156:d2v0 svm_get_insn_len: Mismatch between expected and 
actual instruction: eip = f804564139c0

on the console is of no use trying to identify what went wrong.  Dump as much
state as we can to help identify what went wrong.

Reported-by: Paul Durrant 
Signed-off-by: Andrew Cooper 
Acked-by: Brian Woods 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
CC: Paul Durrant 
CC: Boris Ostrovsky 
CC: Suravee Suthikulpanit 
CC: Brian Woods 

v2:
 * Drop anonymous union
 * Rebase
---
 xen/arch/x86/hvm/svm/emulate.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c
index 3f695b9..73cef5b 100644
--- a/xen/arch/x86/hvm/svm/emulate.c
+++ b/xen/arch/x86/hvm/svm/emulate.c
@@ -143,8 +143,17 @@ int svm_get_insn_len(struct vcpu *v, enum 
instruction_index insn)
 }
 
 gdprintk(XENLOG_WARNING,
- "%s: Mismatch between expected and actual instruction: "
- "eip = %lx\n",  __func__, (unsigned long)vmcb->rip);
+ "%s: Mismatch between expected and actual instruction:\n",
+ __func__);
+gdprintk(XENLOG_WARNING,
+ "  insn_index %d, opcode %#x modrm %#x\n",
+ insn, opc_tab[insn].opcode, ((opc_tab[insn].modrm.rm  << 6) |
+  (opc_tab[insn].modrm.reg << 3) |
+  (opc_tab[insn].modrm.mod)));
+gdprintk(XENLOG_WARNING, "  rip %#lx, nextrip %#lx, len %lu\n",
+ vmcb->rip, vmcb->nextrip, vmcb->nextrip - vmcb->rip);
+hvm_dump_emulation_state(XENLOG_G_WARNING, "SVM Insn len",
+ , X86EMUL_UNHANDLEABLE);
 
  out:
 hvm_inject_hw_exception(TRAP_gp_fault, 0);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 0/3] Fixes to RDTSCP interception

2018-12-13 Thread Andrew Cooper

Updates in v2:
  * Rework svm_get_insn_len() to be more simple.
  * Drop anonymous union which will surely fail to compile on RHEL 6.x vintage
compilers.
  * Rebase other changes

Andrew Cooper (3):
  x86/svm: Simplify svm_get_insn_len()
  x86/svm: Improve diagnostics when svm_get_insn_len() fails
  x86/hvm: Corrections to RDTSCP intercept handling

 xen/arch/x86/hvm/svm/emulate.c| 79 ++-
 xen/arch/x86/hvm/svm/nestedsvm.c  |  9 ++--
 xen/arch/x86/hvm/svm/svm.c| 54 ++--
 xen/arch/x86/hvm/vmx/vmx.c|  8 
 xen/include/asm-x86/hvm/svm/emulate.h | 10 +
 5 files changed, 88 insertions(+), 72 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [linux-next test] 131263: regressions - FAIL

2018-12-13 Thread osstest service owner

flight 131263 linux-next real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131263/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
131190
 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  7 xen-bootfail REGR. vs. 131190
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
131190
 test-amd64-i386-xl-qemuu-win10-i386  7 xen-boot  fail REGR. vs. 131190
 test-amd64-amd64-xl-credit2   7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-xl-qemut-ws16-amd64  7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-xl-qemuu-ws16-amd64  7 xen-boot fail REGR. vs. 131190
 test-amd64-i386-freebsd10-amd64  7 xen-boot  fail REGR. vs. 131190
 test-amd64-amd64-qemuu-nested-intel  7 xen-boot  fail REGR. vs. 131190
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-amd64-pvgrub  7 xen-bootfail REGR. vs. 131190
 test-amd64-amd64-xl-shadow7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-i386-pvgrub  7 xen-boot fail REGR. vs. 131190
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 
131190
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. 
vs. 131190
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
131190
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-xl-qemut-win7-amd64  7 xen-boot fail REGR. vs. 131190
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-boot fail REGR. vs. 131190
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-boot  fail REGR. vs. 131190
 test-amd64-i386-freebsd10-i386  7 xen-boot   fail REGR. vs. 131190
 test-amd64-amd64-qemuu-nested-amd  7 xen-bootfail REGR. vs. 131190
 test-amd64-amd64-xl-qemut-win10-i386  7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-xl-qemut-debianhvm-amd64  7 xen-bootfail REGR. vs. 131190
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 131190
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-boot  fail REGR. vs. 131190
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 131190
 test-amd64-amd64-xl-xsm   7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 
131190
 test-amd64-amd64-xl-pvshim7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-libvirt  7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. 
vs. 131190
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-boot  fail REGR. vs. 131190
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 
131190
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-xl-qemuu-win10-i386  7 xen-boot fail REGR. vs. 131190
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 xen-boot fail REGR. vs. 131190
 test-amd64-i386-xl-pvshim 7 xen-boot fail REGR. vs. 131190
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-boot  fail REGR. vs. 131190
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-boot  fail REGR. vs. 131190
 test-amd64-amd64-xl-pvhv2-amd  7 xen-bootfail REGR. vs. 131190
 test-amd64-amd64-xl   7 xen-boot fail REGR. vs. 131190
 test-armhf-armhf-libvirt-raw  7 xen-boot fail REGR. vs. 131190
 test-armhf-armhf-xl-vhd   7 xen-boot fail REGR. vs. 131190
 test-armhf-armhf-examine  8 reboot   fail REGR. vs. 131190
 test-armhf-armhf-xl-credit1   7 xen-boot fail REGR. vs. 131190
 test-armhf-armhf-xl-cubietruck  7 xen-boot   fail REGR. vs. 131190
 test-armhf-armhf-xl-multivcpu  7 xen-bootfail REGR. vs. 131190
 test-armhf-armhf-libvirt  7 xen-boot fail REGR. vs. 131190
 test-armhf-armhf-xl-credit2   7 xen-boot fail REGR. vs. 131190
 test-armhf-armhf-xl   7 xen-boot fail REGR. vs. 131190
 test-armhf-armhf-xl-arndale   7 xen-boot fail REGR. vs. 131190

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds  7 xen-boot fail REGR. vs. 131190

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvhv2-intel  7 xen-boot   fail like 131190
 test-amd64-amd64-libvirt-vhd  7 xen-boot fail  like 131190
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm  7 xen-boot  fail like 131190
 test-amd64-i386-rumprun-i386  7 xen-boot fail  like

Re: [Xen-devel] arm: xl vcpu-pin leads to oom-killer slashing processes

2018-12-13 Thread Juergen Gross


On 12/13/18 3:13 PM, Andrii Anisov wrote:

Hello All,

OK, I've discovered a mechanism of the issue.
It is  because of `d->max_pages = ~0U;` in a `construct_dom0()`.
When I do vcpu-pin, libxl updates memory nodes in xenstore for Dom0. 
Then kernel watch sees those changes and trying to set new target for 
ballon, but the target becomes extremely high, and baloon sucks all the 
pages.


In my kernel (4.14) in `watch_target()` function there is a code:

     target_diff = xen_pv_domain() ? 0
     : static_max - balloon_stats.target_pages;

Here we have `xen_pv_domain()` equal to zero, so `target_diff` big. 
Then, few lines below:


 balloon_set_new_target(new_target - target_diff);

`balloon_set_new_target()` receives a value wrapped over 64bit what 
kills the system.


Now I'm looking for an appropriate kernel patch for the kernel, to fix 
that. Any suggestions?




You should use linux kernel commit 3596924a233e45aa918.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 6/7] xen/arm: zynqmp: implement zynqmp_eemi

2018-12-13 Thread Julien Grall


Hi,

On 12/12/18 11:56 PM, Stefano Stabellini wrote:

On Wed, 12 Dec 2018, Julien Grall wrote:

On 11/12/2018 22:23, Stefano Stabellini wrote:

On Tue, 11 Dec 2018, Julien Grall wrote:

Furthermore, you are forwarding unsanitized values to the firmware. For
instance, what would happen if the number of parameters of the call are
increased? How are you sure this will not open a hole?


EEMI is backward compatible and the implementation is tested with Xen
regularly. A change like the one you describe should be considered a
backward compatibility breakage.


I disagree, you can still make backward compatible. For instance, the new
parameters could be gated by a flag in an existing parameter. Another way is
Xilinx promise that non-existing arguments should always be 0 and then
re-purpose the value for non-zero case.

While it is backward compatible, you may end up passing unsanitized value to
the
firmware. Not sure this is what we want...


Backward compatibility is not only about avoiding breaking existing call
parameters and return values. It is also about not breaking the
semantics of the calls.

If a call is deemed guest safe (when a guest has a device assigned) so
Xen forwards the call to firmware, but then firmware introduces a new
parameter (before it was ignored) and with it, allows more extensive
operations that go beyong a single device, I think that is semantics
breakage.

In any case, this is almost impossible to do with EEMI because calls
take a node_id parameter or similar that identifies the area of the
effect of the operation, and we already check on the node_id to see if
the guest is allowed to make the call.


Please write it down in the code so we know why we don't need to worry 
on parameters.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 5/7] xen/arm: zynqmp: eemi access control

2018-12-13 Thread Julien Grall


Hi Stefano,

On 12/12/18 11:57 PM, Stefano Stabellini wrote:

On Tue, 11 Dec 2018, Julien Grall wrote:

Hi,

On 03/12/2018 21:03, Stefano Stabellini wrote:

From: "Edgar E. Iglesias" 

From: Edgar E. Iglesias 

Introduce data structs to implement basic access controls.
Introduce the following three functions:

domain_has_node_access: check access to the node
domain_has_reset_access: check access to the reset line
domain_has_mmio_access: check access to the register

Signed-off-by: Edgar E. Iglesias 
Signed-off-by: Stefano Stabellini 

---
Statically defines:

- pm_node_access
It encodes the relationship between a node id and the start of the MMIO
region of a device in the corresponding power domain. It is used for
permission checking. Although the MMIO region start address is available
on device tree and could be derived from there (we plan to improve that
in the future), the relationship between a node id and corresponding
devices is not described and needs to be hardcoded.

- pm_reset_access
Same as pm_node_access for reset lines.

---
Changes in v5:
- improve in-code comments
- use mfn_t in struct pm_access
- remove mmio_access table

Changes in v4:
- add #include as needed
- add #if 0 for bisectability
- use mfn_t in pm_check_access
- add wrap-around ASSERT in domain_has_mmio_access
- use GENMASK in domain_has_mmio_access
- proper bound checks (== ARRAY_SIZE is out of bound)
---
   xen/arch/arm/platforms/xilinx-zynqmp-eemi.c | 348

   1 file changed, 348 insertions(+)

diff --git a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
index 369bb3f..92a02df 100644
--- a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
+++ b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c
@@ -16,9 +16,357 @@
* GNU General Public License for more details.
*/
   +/*
+ *  EEMI Power Management API access
+ *
+ * Refs:
+ *
https://www.xilinx.com/support/documentation/user_guides/ug1200-eemi-api.pdf
+ *
+ * Background:
+ * The ZynqMP has a subsystem named the PMU with a CPU and special devices
+ * dedicated to running Power Management Firmware. Other masters in the
+ * system need to send requests to the PMU in order to for example:
+ * * Manage power state
+ * * Configure clocks
+ * * Program bitstreams for the programmable logic
+ * * etc
+ *
+ * Although the details of the setup are configurable, in the common case
+ * the PMU lives in the Secure world. NS World cannot directly communicate
+ * with it and must use proxy services from ARM Trusted Firmware to reach
+ * the PMU.
+ *
+ * Power Management on the ZynqMP is implemented in a layered manner.
+ * The PMU knows about various masters and will enforce access controls
+ * based on a pre-configured partitioning. This configuration dictates
+ * which devices are owned by the various masters and the PMU FW makes sure
+ * that a given master cannot turn off a device that it does not own or
that
+ * is in use by other masters.
+ *
+ * The PMU is not aware of multiple execution states in masters.
+ * For example, it treats the ARMv8 cores as single units and does not
+ * distinguish between Secure vs NS OS's nor does it know about Hypervisors
+ * and multiple guests. It is up to software on the ARMv8 cores to present
+ * a unified view of its power requirements.
+ *
+ * To implement this unified view, ARM Trusted Firmware at EL3 provides
+ * access to the PM API via SMC calls. ARM Trusted Firmware is responsible
+ * for mediating between the Secure and the NS world, rejecting SMC calls
+ * that request changes that are not allowed.
+ *
+ * Xen running above ATF owns the NS world and is responsible for
presenting
+ * unified PM requests taking all guests and the hypervisor into account.
+ *
+ * Implementation:
+ * The PM API contains different classes of calls.
+ * Certain calls are harmless to expose to any guest.
+ * These include calls to get the PM API Version, or to read out the
version
+ * of the chip we're running on.
+ *
+ * In order to correctly virtualize these calls, we need to know if
+ * guests issuing these calls have ownership of the given device.
+ * The approach taken here is to map PM API Nodes identifying
+ * a device into base addresses for registers that belong to that
+ * same device.
+ *
+ * If the guest has access to devices registers, we give the guest
+ * access to PM API calls that affect that device. This is implemented
+ * by pm_node_access and domain_has_node_access().
+ */
+
+#include 
+#include 
   #include 
   #include 
   +#if 0
+struct pm_access
+{
+mfn_t mfn;
+bool hwdom_access;/* HW domain gets access regardless.  */
+};
+
+/*
+ * This table maps a node into a memory address.


Some of the nodes below don't have memory address. So this comment has to be
updated.


Yes, I'll update and improve the comment



+ * If a guest has access to the address, it has enough control
+ * over the node to grant it access to EEMI calls for that node.
+ */
+static const struct

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 16:34,  wrote:
> On Thu, Dec 13, 2018 at 07:52:16AM -0700, Jan Beulich wrote:
>> >>> On 13.12.18 at 15:14,  wrote:
>> > On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote:
>> >> >>> On 13.12.18 at 12:39,  wrote:
>> >> > Well, Just keeping correct order between each domain locks should be
>> >> > enough?
>> >> > 
>> >> > Ie: exactly the same that Xen currently does but on a per-domain
>> >> > basis. This is feasible, but each CPU would need to store the lock
>> >> > order of each possible domain:
>> >> > 
>> >> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
>> >> > 
>> >> > This would consume ~32KB per CPU, which is not that much but seems a
>> >> > waste when most of the time a single entry will be used.
>> >> 
>> >> Well, tracking by domain ID wouldn't help you - the controlling
>> >> domain may well have a higher ID than the being controlled one,
>> >> i.e. the nesting you want needs to be independent of domain ID.
>> > 
>> > It's not tracking the domain ID, but rather tracking the lock level of
>> > each different domain, hence the need for the array in the pcpu
>> > structure. The lock checker would take a domain id and a level, and
>> > perform the check as:
>> > 
>> > if ( mm_lock_level[domid] > level )
>> > panic
>> 
>> But this would open things up for deadlocks because of intermixed
>> lock usage between the calling domain's and the subject one's.
>> There needs to be a linear sequence of locks (of all involved
>> domains) describing the one and only order in which they may be
>> acquired.
> 
> Well, my plan was to only check for deadlocks between the locks of the
> same domain, without taking into account intermixed domain locking.
> 
> I guess at this point I will need some input from Tim and George about
> how to proceed, because I'm not sure how to weight locks when using
> intermixed domain locks, neither what is the correct order. The order
> in paging_log_dirty_op looks like a valid order that we want to
> support, but are there any others?
> 
> Is it possible to have multiple valid interdomain lock orders that
> cannot be expressed using the current weighted lock ordering?

Well, first of all I'm afraid I didn't look closely enough at you
original mail: We're not talking about the paging lock of two
domains here, but about he paging lock of the subject domain
and dom0's p2m lock.

Second I then notice that

(XEN) mm locking order violation: 64 > 16

indicates that it might not have complained when two similar
locks of different domains were acquired in a nested fashion,
which I'd call a shortcoming that would be nice to eliminate at
this same occasion.

And third, to answer your question, I can't see anything
conceptually wrong with an arbitrary intermix of locks from
two different domains, as long their inner-domain ordering is
correct. E.g. a Dom0 hypercall may find a need to acquire
- the subject domain's p2m lock
- its own p2m lock
- the subject domain's PoD lock
- its own paging lock
Of course it may be possible to determine that "own" locks
are not supposed to be acquired outside of any "subject
domain" ones, in which case we'd have a workable hierarchy
(along the lines of what you had earlier suggested). But I'm
not sure how expensive (in terms of code auditing) such
determination is going be, which is why for the moment I'm
trying to think of a solution (ordering criteria) for the general
case.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 3/7] xen/arm: zynqmp: introduce zynqmp specific defines

2018-12-13 Thread Julien Grall


Hi Stefano,

On 12/12/18 11:55 PM, Stefano Stabellini wrote:

On Wed, 12 Dec 2018, Julien Grall wrote:

Hi Stefano,

On 11/12/2018 19:22, Stefano Stabellini wrote:

On Tue, 11 Dec 2018, Julien Grall wrote:

Hi,

On 03/12/2018 21:03, Stefano Stabellini wrote:
What is the plan there?


The plan is to extract the node_id from a power-domain node on device
tree. Each device would have a phandler to link to the right
power-domain node which contains a power-domain-id attribute. The
power-domain-id attribute is the node_id here.

The power-domain-id changes to the Xilinx MPSoC device tree are under
discussion with the device tree community.


If I understand correctly, we will never be able to remove the hardcoded
values in Xen. This is because some device-tree may not have the bindings. Am
I correct?


If we want to support running on existing hardware and firmware releases,
then you are correct. We won't be able to remove the hardcoded values.
That ship has sailed, not much we can do about it.

If in the future we decide to drop support for older firmware releases
and ask users to update their firmare/devicetrees, then we'll be able to
remove the hardcoded values. I think it is something we can consider.
In fact, I would rather break compatibility than not providing support
for basic power management functionalities at all.


With your suggestion this means that Xen and the firmware are tied 
together. So you can't update Xen without updating the firmware.


We already ruled out that behavior for x-gene. See the partial revert 
you did 420596c868 to revert back the quirk.


I don't think this is very different here. We are introducing a feature, 
that we know will be broken afterwards unless you update your firmware. 
It is not like it was not planned... We should aim to support most of 
the official firmware unless there are a strong argument not to do.


Basic power management for other than Dom0 is not a valid enough reason 
for me to break compatibility.




I think it is better to introduce basic EEMI support now, even if in a
couple of Xen releases from now we'll make the decision to drop the
hardcoded values and require new Xilinx device trees. Xilinx users are
used to updating firmware on these boards every 6 months, and it would
beneficial for them to be able to take a Xen Project release rather than
be forced to use a Xilinx Xen release to have support for power
management. When the new bindings become available, as we introduce
support for them in Xen, we can decide whether to keep the hardcoded
values or whether we should take the opportunity to drop them.
This is the right moment to discuss about it. It will be too late once 
we get merged. I don't want to see 2 solutions in Xen if we know that 
bindings are work-in-progress.


I am happy to consider EEMI for Dom0. For the guests, it will have to 
wait the device-tree bindings.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] drm/xen-front: Make shmem backed display buffer coherent

2018-12-13 Thread Daniel Vetter

On Thu, Dec 13, 2018 at 12:17:54PM +0200, Oleksandr Andrushchenko wrote:
> Daniel, could you please comment?

Cross-revieweing someone else's stuff would scale better, I don't think
I'll get around to anything before next year.
-Daniel

> 
> Thank you
> 
> On 11/27/18 12:32 PM, Oleksandr Andrushchenko wrote:
> > From: Oleksandr Andrushchenko 
> > 
> > When GEM backing storage is allocated with drm_gem_get_pages
> > the backing pages may be cached, thus making it possible that
> > the backend sees only partial content of the buffer which may
> > lead to screen artifacts. Make sure that the frontend's
> > memory is coherent and the backend always sees correct display
> > buffer content.
> > 
> > Fixes: c575b7eeb89f ("drm/xen-front: Add support for Xen PV display 
> > frontend")
> > 
> > Signed-off-by: Oleksandr Andrushchenko 
> > ---
> >   drivers/gpu/drm/xen/xen_drm_front_gem.c | 62 +++--
> >   1 file changed, 48 insertions(+), 14 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c 
> > b/drivers/gpu/drm/xen/xen_drm_front_gem.c
> > index 47ff019d3aef..c592735e49d2 100644
> > --- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
> > +++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
> > @@ -33,8 +33,11 @@ struct xen_gem_object {
> > /* set for buffers allocated by the backend */
> > bool be_alloc;
> > -   /* this is for imported PRIME buffer */
> > -   struct sg_table *sgt_imported;
> > +   /*
> > +* this is for imported PRIME buffer or the one allocated via
> > +* drm_gem_get_pages.
> > +*/
> > +   struct sg_table *sgt;
> >   };
> >   static inline struct xen_gem_object *
> > @@ -77,10 +80,21 @@ static struct xen_gem_object *gem_create_obj(struct 
> > drm_device *dev,
> > return xen_obj;
> >   }
> > +struct sg_table *xen_drm_front_gem_get_sg_table(struct drm_gem_object 
> > *gem_obj)
> > +{
> > +   struct xen_gem_object *xen_obj = to_xen_gem_obj(gem_obj);
> > +
> > +   if (!xen_obj->pages)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   return drm_prime_pages_to_sg(xen_obj->pages, xen_obj->num_pages);
> > +}
> > +
> >   static struct xen_gem_object *gem_create(struct drm_device *dev, size_t 
> > size)
> >   {
> > struct xen_drm_front_drm_info *drm_info = dev->dev_private;
> > struct xen_gem_object *xen_obj;
> > +   struct address_space *mapping;
> > int ret;
> > size = round_up(size, PAGE_SIZE);
> > @@ -113,10 +127,14 @@ static struct xen_gem_object *gem_create(struct 
> > drm_device *dev, size_t size)
> > xen_obj->be_alloc = true;
> > return xen_obj;
> > }
> > +
> > /*
> >  * need to allocate backing pages now, so we can share those
> >  * with the backend
> >  */
> > +   mapping = xen_obj->base.filp->f_mapping;
> > +   mapping_set_gfp_mask(mapping, GFP_USER | __GFP_DMA32);
> > +
> > xen_obj->num_pages = DIV_ROUND_UP(size, PAGE_SIZE);
> > xen_obj->pages = drm_gem_get_pages(_obj->base);
> > if (IS_ERR_OR_NULL(xen_obj->pages)) {
> > @@ -125,8 +143,27 @@ static struct xen_gem_object *gem_create(struct 
> > drm_device *dev, size_t size)
> > goto fail;
> > }
> > +   xen_obj->sgt = xen_drm_front_gem_get_sg_table(_obj->base);
> > +   if (IS_ERR_OR_NULL(xen_obj->sgt)){
> > +   ret = PTR_ERR(xen_obj->sgt);
> > +   xen_obj->sgt = NULL;
> > +   goto fail_put_pages;
> > +   }
> > +
> > +   if (!dma_map_sg(dev->dev, xen_obj->sgt->sgl, xen_obj->sgt->nents,
> > +   DMA_BIDIRECTIONAL)) {
> > +   ret = -EFAULT;
> > +   goto fail_free_sgt;
> > +   }
> > +
> > return xen_obj;
> > +fail_free_sgt:
> > +   sg_free_table(xen_obj->sgt);
> > +   xen_obj->sgt = NULL;
> > +fail_put_pages:
> > +   drm_gem_put_pages(_obj->base, xen_obj->pages, true, false);
> > +   xen_obj->pages = NULL;
> >   fail:
> > DRM_ERROR("Failed to allocate buffer with size %zu\n", size);
> > return ERR_PTR(ret);
> > @@ -149,7 +186,7 @@ void xen_drm_front_gem_free_object_unlocked(struct 
> > drm_gem_object *gem_obj)
> > struct xen_gem_object *xen_obj = to_xen_gem_obj(gem_obj);
> > if (xen_obj->base.import_attach) {
> > -   drm_prime_gem_destroy(_obj->base, xen_obj->sgt_imported);
> > +   drm_prime_gem_destroy(_obj->base, xen_obj->sgt);
> > gem_free_pages_array(xen_obj);
> > } else {
> > if (xen_obj->pages) {
> > @@ -158,6 +195,13 @@ void xen_drm_front_gem_free_object_unlocked(struct 
> > drm_gem_object *gem_obj)
> > xen_obj->pages);
> > gem_free_pages_array(xen_obj);
> > } else {
> > +   if (xen_obj->sgt) {
> > +   dma_unmap_sg(xen_obj->base.dev->dev,
> > +xen_obj->sgt->sgl,
> > +xen_obj->sgt->nents,
> > +

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Roger Pau Monné

On Thu, Dec 13, 2018 at 07:52:16AM -0700, Jan Beulich wrote:
> >>> On 13.12.18 at 15:14,  wrote:
> > On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote:
> >> >>> On 13.12.18 at 12:39,  wrote:
> >> > Well, Just keeping correct order between each domain locks should be
> >> > enough?
> >> > 
> >> > Ie: exactly the same that Xen currently does but on a per-domain
> >> > basis. This is feasible, but each CPU would need to store the lock
> >> > order of each possible domain:
> >> > 
> >> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
> >> > 
> >> > This would consume ~32KB per CPU, which is not that much but seems a
> >> > waste when most of the time a single entry will be used.
> >> 
> >> Well, tracking by domain ID wouldn't help you - the controlling
> >> domain may well have a higher ID than the being controlled one,
> >> i.e. the nesting you want needs to be independent of domain ID.
> > 
> > It's not tracking the domain ID, but rather tracking the lock level of
> > each different domain, hence the need for the array in the pcpu
> > structure. The lock checker would take a domain id and a level, and
> > perform the check as:
> > 
> > if ( mm_lock_level[domid] > level )
> > panic
> 
> But this would open things up for deadlocks because of intermixed
> lock usage between the calling domain's and the subject one's.
> There needs to be a linear sequence of locks (of all involved
> domains) describing the one and only order in which they may be
> acquired.

Well, my plan was to only check for deadlocks between the locks of the
same domain, without taking into account intermixed domain locking.

I guess at this point I will need some input from Tim and George about
how to proceed, because I'm not sure how to weight locks when using
intermixed domain locks, neither what is the correct order. The order
in paging_log_dirty_op looks like a valid order that we want to
support, but are there any others?

Is it possible to have multiple valid interdomain lock orders that
cannot be expressed using the current weighted lock ordering?

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] HVM driver domains do not appear to be usable with stubdomains

2018-12-13 Thread Chris Brannon

Jason Andryuk  writes:

>> So if I understand correctly, the problem is that PCI passthrough
>> doesn't work with stubdomains, unless qemu is patched?
>
> Hi, Chris.
>
> I pulled in the QEMU patch because I found that my Intel wired
> ethernet device didn't work without it.  I believe my Intel wireless
> NIC did though.  Maybe it was the opposite... I should have documented
> it more.  However the device was passed-through - it just wasn't
> operational.
>
> What device is :05:00.0?  Is it listed by `xl pci-assignable-list`?

Hi Jason,
:05:00.0 was an Intel wired ethernet device.  Yes it showed up
in xl pci-assignable-list.  In fact, my driver domain worked fine without
a stubdomain; the card was passed through and I had it passing traffic
with no problem at all.

-- Chris

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events

2018-12-13 Thread Razvan Cojocaru

On 12/13/18 4:58 PM, Jan Beulich wrote:
 On 13.12.18 at 14:18,  wrote:
>> So, long story short, on VMX we first send out the vm_event, while
>> processing it an interrupt / exception may become pending, before
>> resuming the VCPU that has sent out the vm_event there's a Xen function
>> that picks up the pending interrupt and schedules it (writes it in the
>> VMCS), and only then we attempt the emulation, which may overwrite it
>> (because there's only one place we can write to schedule interrupts /
>> exceptions).
> 
> So perhaps the solution is indeed to change the order of how things
> get done, instead of blocking interrupts? You seem to think this way
> too, as per ...
> 
>> 2. Interrupts are not blocked indefinitely - only until the emulation is
>> done. It could be argued that that's really the proper place for them to
>> be processed anyway - on an instruction boundary, _after_ the
>> in-progress instruction has finished executing. It's just that with the
>> vm_event introspection thing you could say that executing the current
>> instruction may take a bit longer.
> 
> ... this.

Quite possibly, following the lead of the singlestepping code just
seemed like the most straightforward way out of the problem. Of course
an alternative way of handling interrupts would probably be preferrable,
however we'd need a bit of guidance on how to go about it and in the
meantime I don't see how this small fix would hurt.

I remember Andrew suggesting taking something like this on at the Xen
Developer Summit in Budapest but then of course much more important
things happened with Meltdown & al.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 16/18] xen: automatically create XenBlockDevice-s

2018-12-13 Thread Kevin Wolf

Am 13.12.2018 um 13:44 hat Paul Durrant geschrieben:
> > Essentially, what I'm wondering is whether we have anything that could
> > be treated more or less like another monitor besides QMP and HMP, which
> > would internally work similar to HMP, i.e. map (almost) everything to
> > QMP commands.
> 
> Yes, it would be possible to have a separate 'compatibility' daemon to
> watch xenstore and then formulate the correct sequence of QMP commands
> to instantiate the backend, but that is more complicated and the right
> answer of course is to have the toolstack send the QMP commands in the
> first place.

Okay, if someone is working on actually using QMP instead of xenstore,
that's even better, of course. Disregard this point then.

> > > +drive_new(drive_opts, IF_NONE, _err);
> > > +if (local_err) {
> > > +error_propagate_prepend(errp, local_err,
> > > +"failed to create drive: ");
> > > +goto done;
> > > +}
> > 
> > The other major point is that you're using the legacy drive_*()
> > infrastructure, which should not only go away as soon as we can, but
> > which is also full of magic and nasty surprises.
> > 
> > I think the best way would be to create only a block node
> > (BlockDriverState) here, and get an automatically created anonymous
> > BlockBackend from the qdev drive property.
> > 
> > There are two ways to achieve this: qmp_blockdev_add() would be optimal
> > because that's a stable external interface. It would require you to
> > specify a node-name (you already have the id parameter), and you'd use
> > this node-name for the qdev drive property.
> > 
> > qmp_blockdev_add() requires a BlockdevOptions object, which you can
> > either construct manually in C or use a visitor to convert from an
> > options QDict. Maybe in this case, converting from a QDict is better
> > because otherwise you need special code for each block driver.
> > 
> 
> I was using the legacy interfaces because this code is, as I said
> above, supposed to be a mechanism only required for compatibility with
> the way toolstacks currently operate (and so is essentially 'legacy')
> but using the top-level QMP entry point to construct the does sound
> do-able as long as the underlying file locking can still be avoided
> with that mechanism. Since BlockdevOptions seems to be an
> auto-generated structure, figuring out how to fill it in manually is
> somewhat tricky so the QDict approach is preferable but I'll have to
> figure out how to use a visitor to do the translation.

You can grep for qobject_input_visitor_new() for examples. Some block
drivers use this internally to convert .bdrv_create options to a QAPI
object, it looks like this:

/* Now get the QAPI type BlockdevCreateOptions */
v = qobject_input_visitor_new_flat_confused(qdict, errp);
if (!v) {
ret = -EINVAL;
goto finish;
}

visit_type_BlockdevCreateOptions(v, NULL, _options, _err);
visit_free(v);

if (local_err) {
error_propagate(errp, local_err);
ret = -EINVAL;
goto finish;
}

Obviously, you'd use visit_type_BlockdevOptions instead. You also might
not need the _flat_confused variant, which is about QDicts where we
don't know whether values are stored as their actual type or as strings.
In your code, you have control over the types in the QDict, so this
shouldn't be a problem.

> > The other way would be calling bdrv_open() directly, which gives you a
> > BlockDriverState, but it risks using legacy functionality that will be
> > deprecated soon. Again, you'd take the node-name and pass it to the qdev
> > drive option below.
> 
> Yes, xen_disk does things this way but then we end up with legacy
> block device and still fall foul of the assertions buried in the code.

Yes and no. xen_disk is better in that it avoids the drive_* things
(which internally call bdrv_open() anyway), but it's worse in that it
directly assigns blkdev->blk instead of using the qdev property.

What I meant here is that you create the BDS with bdrv_open(), but then
you wouldn't assign it directly to some field in the device state, but
just put the node-name of the BDS into the qdev property 'drive'. This
would be almost like qmp_blockdev_add(), except without validation
against the QAPI schema.

But if using qmp_blockdev_add() is easy enough, that's preferable.

> > 
> > > +
> > > +done:
> > > +g_free(drive_optstr);
> > > +g_free(format);
> > > +g_free(file);
> > > +}
> > > +
> > > +static void xen_block_device_create(BusState *bus, const char *name,
> > > +QDict *opts, Error **errp)
> > > +{
> > > +unsigned long number;
> > > +const char *vdev, *device_type;
> > > +BlockBackend *blk = NULL;
> > > +IOThread *iothread = NULL;
> > > +DeviceState *dev = NULL;
> > > +Error *local_err = NULL;
> > > +const char *type;
> > > +XenBlockDevice *blockdev;
> > > +
> > > +

Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 14:18,  wrote:
> So, long story short, on VMX we first send out the vm_event, while
> processing it an interrupt / exception may become pending, before
> resuming the VCPU that has sent out the vm_event there's a Xen function
> that picks up the pending interrupt and schedules it (writes it in the
> VMCS), and only then we attempt the emulation, which may overwrite it
> (because there's only one place we can write to schedule interrupts /
> exceptions).

So perhaps the solution is indeed to change the order of how things
get done, instead of blocking interrupts? You seem to think this way
too, as per ...

> 2. Interrupts are not blocked indefinitely - only until the emulation is
> done. It could be argued that that's really the proper place for them to
> be processed anyway - on an instruction boundary, _after_ the
> in-progress instruction has finished executing. It's just that with the
> vm_event introspection thing you could say that executing the current
> instruction may take a bit longer.

... this.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 15:14,  wrote:
> On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote:
>> >>> On 13.12.18 at 12:39,  wrote:
>> > Well, Just keeping correct order between each domain locks should be
>> > enough?
>> > 
>> > Ie: exactly the same that Xen currently does but on a per-domain
>> > basis. This is feasible, but each CPU would need to store the lock
>> > order of each possible domain:
>> > 
>> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
>> > 
>> > This would consume ~32KB per CPU, which is not that much but seems a
>> > waste when most of the time a single entry will be used.
>> 
>> Well, tracking by domain ID wouldn't help you - the controlling
>> domain may well have a higher ID than the being controlled one,
>> i.e. the nesting you want needs to be independent of domain ID.
> 
> It's not tracking the domain ID, but rather tracking the lock level of
> each different domain, hence the need for the array in the pcpu
> structure. The lock checker would take a domain id and a level, and
> perform the check as:
> 
> if ( mm_lock_level[domid] > level )
> panic

But this would open things up for deadlocks because of intermixed
lock usage between the calling domain's and the subject one's.
There needs to be a linear sequence of locks (of all involved
domains) describing the one and only order in which they may be
acquired.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 131293: tolerable all pass - PUSHED

2018-12-13 Thread osstest service owner

flight 131293 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131293/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  5c08550ff4f3804df471b12c29ae170de981fc13
baseline version:
 xen  00c96d77422a4b84247bec5dadf434363d312cac

Last test of basis   131284  2018-12-13 00:00:41 Z0 days
Testing same since   131293  2018-12-13 12:00:46 Z0 days1 attempts


People who touched revisions under test:
  Brian Woods 
  Paul Durrant 

jobs:
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   00c96d7742..5c08550ff4  5c08550ff4f3804df471b12c29ae170de981fc13 -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 15:20,  wrote:
> On Thu, Dec 13, 2018 at 03:17:05AM -0700, Jan Beulich wrote:
>> >>> On 13.12.18 at 10:14,  wrote:
>> > On Thu, Dec 13, 2018 at 12:45:07AM -0700, Jan Beulich wrote:
>> >> >>> On 12.12.18 at 18:05,  wrote:
>> >> > On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote:
>> >> >> The MMIO side of things of course still remains unclear.
>> >> > 
>> >> > Right, for the MMIO and the handling of grant and foreign mappings it's
>> >> > not clear how we want to proceed.
>> >> > 
>> >> > Maybe account for all host RAM (total_pages) plus MMIO BARs?
>> >> 
>> >> Well, I thought we've already settled on it being impossible to
>> >> account for all MMIO BARs at this point.
>> > 
>> > Well, I could iterate over all the registered PCI devices and size
>> > the BARs (without VF BARs at least initially). This is quite
>> > cumbersome, my other option would be using max_page and hope that
>> > there are enough holes to make up for BAR MMIO regions.
>> 
>> Well, maybe we could live with this for now. I certainly would
>> prefer to have a 3rd opinion though, as I continue to feel uneasy
>> with this rather imprecise estimation (i.e. I'd much prefer a more
>> dynamic / on-demand approach).
> 
> I agree it's not a perfect solution, but I think what's currently done
> is even worse, and we already had bug reports of users seeing Xen
> panic at PVH Dom0 build time if no dom0_mem parameter is specified.
> 
> Would you be OK with using max_page then?

I'm not going to say yes or no here without having seen a (qualified)
3rd opinion.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 14:25,  wrote:
> The question is, how much drift can be tolerated even without my patch.

This depends on what a system is used for. A few seconds off may
be fine for one purpose, but a significant problem for another.
Similarly a sudden (however small) change to the TSC tick rate
may be a problem for one purpose, but not for another.

If otoh (and just to give an example) it was determined that the
TSC tick rate on a physical system varies over time within
certain boundaries, then leveraging this to change (slowly, i.e.
with no steeper a gradient than might be observable on bare
hardware) the rate to match the new host's might be an option.
Emulation then could be turned off after a certain period of time.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 25/25] argo: implement the get_config op to query notification config

2018-12-13 Thread Jan Beulich

>>> On 01.12.18 at 02:33,  wrote:
> --- a/xen/common/argo.c
> +++ b/xen/common/argo.c
> @@ -1656,6 +1656,46 @@ argo_sendv(struct domain *src_d, const argo_addr_t 
> *src_addr,
>  return ( ret < 0 ) ? ret : len;
>  }
>  
> +static void
> +argo_get_config(struct domain *d, argo_get_config_t *get_config)
> +{
> +unsigned int method = argo_signal_method(d);
> +
> +get_config->signal_method = method;
> +
> +switch ( method )
> +{
> +case ARGO_SIGNAL_METHOD_EVTCHN:
> +{
> +read_lock(_lock);
> +read_lock(>argo->lock);
> +
> +get_config->signal.evtchn = d->argo->evtchn_port;
> +
> +read_unlock(>argo->lock);
> +read_unlock(_lock);
> +
> +argo_dprintk("signal for dom:%d evtchn %u\n", d->domain_id,
> + get_config->signal.evtchn);
> +
> +break;
> +}
> +case ARGO_SIGNAL_METHOD_VIRQ:
> +{
> +get_config->signal.virq = VIRQ_ARGO;
> +
> +argo_dprintk("signal for dom:%d virq %u\n", d->domain_id,
> + get_config->signal.virq);
> +break;
> +}
> +default:
> +{
> +BUG();
> +break;
> +}

There are quite a few stray braces here.

> +typedef struct argo_get_config
> +{
> +uint32_t signal_method;
> +union
> +{
> +evtchn_port_t evtchn;
> +uint32_t virq;
> +} signal;
> +uint32_t reserved;

Judging from the description, did you perhaps mean to put
uint32_t reserved[2] inside the union?

Then again "get_config" sounds much more generic than just
obtaining the notification method.

> @@ -244,6 +257,21 @@ struct argo_ring_message_header
>   */
>  #define ARGO_MESSAGE_OP_notify  4
>  
> +/*
> + * ARGO_MESSAGE_OP_get_config
> + *
> + * Queries Xen for argo configuration values.
> + *
> + * Used by a guest to obtain the signal method in use for Argo notifications
> + * and the event channel port or isa irq in use.

ISA IRQ? It's a vIRQ that you have as alternative to bare
event-channel signaling.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [ovmf test] 131283: regressions - FAIL

2018-12-13 Thread osstest service owner

flight 131283 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131283/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i3866 xen-buildfail REGR. vs. 129475
 build-i386-xsm6 xen-buildfail REGR. vs. 129475
 build-amd64   6 xen-buildfail REGR. vs. 129475
 build-amd64-xsm   6 xen-buildfail REGR. vs. 129475

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a

version targeted for testing:
 ovmf 0d68ce514b922f887da28c2a12b8d37cf23903f0
baseline version:
 ovmf 5ae3184d8c59f7bbb84bad482df6b8020ba58188

Last test of basis   129475  2018-11-05 21:13:11 Z   37 days
Failing since129526  2018-11-06 20:49:26 Z   36 days  153 attempts
Testing same since   131283  2018-12-12 23:03:20 Z0 days1 attempts


People who touched revisions under test:
  Achin Gupta 
  Ard Biesheuvel 
  Bob Feng 
  bob.c.f...@intel.com 
  BobCF 
  Chasel Chiu 
  Chasel, Chiu 
  Chen A Chen 
  Dandan Bi 
  David Wei 
  Eric Dong 
  Feng, Bob C 
  Fu Siyuan 
  Gary Lin 
  Hao Wu 
  Jaben Carsey 
  Jeff Brasen 
  Jian J Wang 
  Jiaxin Wu 
  Jiewen Yao 
  Laszlo Ersek 
  Leif Lindholm 
  Liming Gao 
  Liu Yu 
  Marc Zyngier 
  Marcin Wojtas 
  Ming Huang 
  Pedroa Liu 
  Ruiyu Ni 
  shenglei 
  Shenglei Zhang 
  Star Zeng 
  Sughosh Ganu 
  Sumit Garg 
  Sun, Zailiang 
  Thomas Abraham 
  Tomasz Michalec 
  Vijayenthiran Subramaniam 
  Wang BinX A 
  Wu Jiaxin 
  Yonghong Zhu 
  yuchenlin 
  Zailiang Sun 
  Zhang, Chao B 
  Zhao, ZhiqiangX 
  ZhiqiangX Zhao 
  zwei4 

jobs:
 build-amd64-xsm  fail
 build-i386-xsm   fail
 build-amd64  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked 
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 3421 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 24/25] argo: unmap rings on suspend and send signal to ring-owners on resume

2018-12-13 Thread Jan Beulich

>>> On 01.12.18 at 02:33,  wrote:
> so that the guest may re-register the rings on resume with current mappings.

Is this something guests really need help with, rather than managing
it on their own? What does "current mappings" here mean, i.e. why
do rings need re-registration in the first place?

> +void
> +argo_resume(struct domain *d)
> +{
> +bool send_wakeup;
> +
> +if ( !d )
> +return;
> +
> +if ( !get_domain(d) )
> +return;
> +
> +read_lock(_lock);
> +
> +read_lock(>argo->lock);
> +send_wakeup = ( d->argo->ring_count > 0 );
> +read_unlock(>argo->lock);
> +
> +if ( send_wakeup )
> +argo_signal_domain(d);
> +
> +read_unlock(_lock);
> +
> +put_domain(d);
> +}

domain_resume() also gets called from domain_soft_reset(). Do
you really want such handling in that case as well, when after a
soft-reset the domain is supposed to be "blank"?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function

2018-12-13 Thread Roger Pau Monné

On Thu, Dec 13, 2018 at 03:17:05AM -0700, Jan Beulich wrote:
> >>> On 13.12.18 at 10:14,  wrote:
> > On Thu, Dec 13, 2018 at 12:45:07AM -0700, Jan Beulich wrote:
> >> >>> On 12.12.18 at 18:05,  wrote:
> >> > On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote:
> >> >> The MMIO side of things of course still remains unclear.
> >> > 
> >> > Right, for the MMIO and the handling of grant and foreign mappings it's
> >> > not clear how we want to proceed.
> >> > 
> >> > Maybe account for all host RAM (total_pages) plus MMIO BARs?
> >> 
> >> Well, I thought we've already settled on it being impossible to
> >> account for all MMIO BARs at this point.
> > 
> > Well, I could iterate over all the registered PCI devices and size
> > the BARs (without VF BARs at least initially). This is quite
> > cumbersome, my other option would be using max_page and hope that
> > there are enough holes to make up for BAR MMIO regions.
> 
> Well, maybe we could live with this for now. I certainly would
> prefer to have a 3rd opinion though, as I continue to feel uneasy
> with this rather imprecise estimation (i.e. I'd much prefer a more
> dynamic / on-demand approach).

I agree it's not a perfect solution, but I think what's currently done
is even worse, and we already had bug reports of users seeing Xen
panic at PVH Dom0 build time if no dom0_mem parameter is specified.

Would you be OK with using max_page then? This is the less complex
option to implement ATM, and BAR sizing can be added later together
with a more dynamic p2m memory management.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ

2018-12-13 Thread Jan Beulich

>>> On 01.12.18 at 02:33,  wrote:
> * x86 PV domains are notified via event channel.
> 
> PV guests are known to have the event channel software present in the guest
> kernel, so it is fine to depend on and use it.
> 
> * x86 HVM domains and all ARM domains are notified via VIRQ.
> 
> The intent is to remove the requirement for event channel software to be
> installed within these guests in order to use Argo. VIRQ signalling is also
> the method that has been in use for the longest period with this hypercall
> in both XenClient and OpenXT.

I'm afraid I don't follow: send_guest_global_virq() uses, well,
evtchn_port_set_pending(), just like evtchn_send() does.
Therefore how does sending a vIRQ help with a guest without
event channel awareness?

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Roger Pau Monné

On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote:
> >>> On 13.12.18 at 12:39,  wrote:
> > On Wed, Dec 12, 2018 at 09:07:14AM -0700, Jan Beulich wrote:
> >> >>> On 12.12.18 at 15:54,  wrote:
> >> > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d,
> >> >  bytes = (unsigned int)((sc->pages - pages + 7) >> 
> >> > 3);
> >> >  if ( likely(peek) )
> >> >  {
> >> > +if ( paging_mode_enabled(current->domain) )
> >> > +/*
> >> > + * Drop the target p2m lock, or else Xen will 
> >> > panic
> >> > + * when trying to acquire the p2m lock of the 
> >> > caller
> >> > + * due to invalid lock order. Note that there 
> >> > are no
> >> > + * lock ordering issues here, and the panic is 
> >> > due to
> >> > + * the fact that the lock level tracking 
> >> > doesn't record
> >> > + * the domain the lock belongs to.
> >> > + */
> >> > +paging_unlock(d);
> >> 
> >> This makes it sound as if tracking the domain would help. It doesn't,
> >> at least not as long as there is not also some sort of ordering or
> >> hierarchy between domains. IOW I'd prefer if the "doesn't" became
> >> "can't".
> > 
> > Well, Just keeping correct order between each domain locks should be
> > enough?
> > 
> > Ie: exactly the same that Xen currently does but on a per-domain
> > basis. This is feasible, but each CPU would need to store the lock
> > order of each possible domain:
> > 
> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
> > 
> > This would consume ~32KB per CPU, which is not that much but seems a
> > waste when most of the time a single entry will be used.
> 
> Well, tracking by domain ID wouldn't help you - the controlling
> domain may well have a higher ID than the being controlled one,
> i.e. the nesting you want needs to be independent of domain ID.

It's not tracking the domain ID, but rather tracking the lock level of
each different domain, hence the need for the array in the pcpu
structure. The lock checker would take a domain id and a level, and
perform the check as:

if ( mm_lock_level[domid] > level )
panic

> >> Now before we go this route I think we need to consider whether
> >> this really is the only place where the two locks get into one
> >> another's way. That's because I don't think we want to introduce
> >> more of such, well, hackery, and hence we'd otherwise need a
> >> better solution. For example the locking model could be adjusted
> >> to allow such nesting in the general case: Dom0 and any domain
> >> whose ->target matches the subject domain here could be given
> >> a slightly different "weight" in the lock order violation check logic.
> > 
> > So locks from domains != current would be given a lower order, let's
> > say:
> > 
> > #define MM_LOCK_ORDER_nestedp2m   8
> > #define MM_LOCK_ORDER_p2m16
> > #define MM_LOCK_ORDER_per_page_sharing   24
> > #define MM_LOCK_ORDER_altp2mlist 32
> > #define MM_LOCK_ORDER_altp2m 40
> > #define MM_LOCK_ORDER_pod48
> > #define MM_LOCK_ORDER_page_alloc 56
> > #define MM_LOCK_ORDER_paging 64
> > #define MM_LOCK_ORDER_MAXMM_LOCK_ORDER_paging
> > 
> > If domain != current, the above values are used. If domain == current
> > the values above are used + MM_LOCK_ORDER_MAX? So in that case the
> > order of MM_LOCK_ORDER_p2m against the current domain would be 16 + 64
> > = 80?
> > 
> > This has the slight inconvenience that not all mm lock call sites have
> > the target domain available, but can be solved.
> 
> No, I wasn't envisioning a really generic approach, i.e.
> permitting this for all of the locks. Do we need this? Till now
> I was rather considering to do this for just the paging lock,
> with Dom0 and the controlling domains using just a small
> (+/- 2 and +/- 1) offset from the normal paging one.
> However, I now realize that indeed there may be more of
> such nesting, and you've merely hit a specific case of it.

So far that's the only case I've hit, but I'm quite sure there are a
non-trivial amount of hypercalls that where only used by a PV Dom0, so
I don't discard finding other such instances.

> In any event it seems to me that you've got the ordering
> inversed, i.e. domain == current (or really: the general case,
> i.e. after excluding the two special cases) would use base
> values, domain == currd->target would use some offset, and
> Dom0 would use double the offset.
> 
> With it extended to all locks I'm no longer sure though that
> the end result would be safe. I think it would be helpful to
> rope in whoever did originally design this locking model.

I'm adding Tim which I think is the original author of

Re: [Xen-devel] [PATCH 22/25] xen/evtchn: expose send_guest_global_virq for use within Xen

2018-12-13 Thread Jan Beulich

>>> On 01.12.18 at 02:33,  wrote:
> To be used by Argo for delivery of notifications to some guests.

Better not to make this a separate patch: By folding it into where it's
needed it is easier for everyone to judge whether the exposure is
indeed necessary, and it also eliminates the risk of the series getting
committed up to here, and the function then being pointlessly non-
static for an extended period of time.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] arm: xl vcpu-pin leads to oom-killer slashing processes

2018-12-13 Thread Andrii Anisov


Hello All,

OK, I've discovered a mechanism of the issue.
It is  because of `d->max_pages = ~0U;` in a `construct_dom0()`.
When I do vcpu-pin, libxl updates memory nodes in xenstore for Dom0. Then 
kernel watch sees those changes and trying to set new target for ballon, but 
the target becomes extremely high, and baloon sucks all the pages.

In my kernel (4.14) in `watch_target()` function there is a code:

target_diff = xen_pv_domain() ? 0
: static_max - balloon_stats.target_pages;

Here we have `xen_pv_domain()` equal to zero, so `target_diff` big. Then, few 
lines below:

balloon_set_new_target(new_target - target_diff);

`balloon_set_new_target()` receives a value wrapped over 64bit what kills the 
system.

Now I'm looking for an appropriate kernel patch for the kernel, to fix that. 
Any suggestions?

--
Sincerely,
Andrii Anisov.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 21/25] argo: add array_index_nospec to guard the result of the hash func

2018-12-13 Thread Jan Beulich

>>> On 01.12.18 at 02:33,  wrote:
> This is out of an abundance of caution, since this is a very basic hash
> function, chosen more for its bucket distribution properties to cluster 
> related
> rings rather than for cryptographic strength or any uniformness of output,
> and it operates upon values supplied by the guest just before being used as an
> array index.

Same here: Better to put this in place right away for new code
than to incrementally add it later.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 18/25] argo: limit the max number of rings that a domain may register.

2018-12-13 Thread Jan Beulich

>>> On 01.12.18 at 02:32,  wrote:
> Very basic implementation: a fixed limit of 128.

Such restrictions to limit resource use would better be implemented
right away for code that can be used (in a limited way) already with
just the initial parts of the series applied.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 16/25] argo: implement the notify op

2018-12-13 Thread Jan Beulich

>>> On 01.12.18 at 02:32,  wrote:
> +static uint32_t
> +argo_ringbuf_payload_space(struct domain *d, struct argo_ring_info 
> *ring_info)
> +{
> +argo_ring_t ring;
> +int32_t ret;
> +
> +ASSERT(spin_is_locked(_info->lock));
> +
> +ring.len = ring_info->len;
> +if ( !ring.len )
> +return 0;
> +
> +ring.tx_ptr = ring_info->tx_ptr;
> +
> +if ( argo_ringbuf_get_rx_ptr(ring_info, _ptr) )
> +return 0;
> +
> +argo_dprintk("argo_ringbuf_payload_space: tx_ptr=%d rx_ptr=%d\n",
> + ring.tx_ptr, ring.rx_ptr);
> +
> +if ( ring.rx_ptr == ring.tx_ptr )
> +return ring.len - sizeof(struct argo_ring_message_header);
> +
> +ret = ring.rx_ptr - ring.tx_ptr;
> +if ( ret < 0 )
> +ret += ring.len;

Seeing these two if()-s - how is an empty ring distinguished from
a completely full one? I'm getting the impression that
ring.rx_ptr == ring.tx_ptr in both cases.

> +ret -= sizeof(struct argo_ring_message_header);
> +ret -= ARGO_ROUNDUP(1);

Wouldn't you instead better round ret to a suitable multiple of
whatever granularity you try to arrange for here? Otherwise
what is this extra subtraction supposed to do?

> @@ -627,6 +679,43 @@ argo_pending_remove_all(struct argo_ring_info *ring_info)
>  }
>  }
>  
> +static void
> +argo_pending_notify(struct hlist_head *to_notify)
> +{
> +struct hlist_node *node, *next;
> +struct argo_pending_ent *pending_ent;
> +
> +ASSERT(rw_is_locked(_lock));
> +
> +hlist_for_each_entry_safe(pending_ent, node, next, to_notify, node)
> +{
> +hlist_del(_ent->node);
> +argo_signal_domid(pending_ent->id);
> +xfree(pending_ent);
> +}
> +}
> +
> +static void
> +argo_pending_find(const struct domain *d, struct argo_ring_info *ring_info,
> +  uint32_t payload_space, struct hlist_head *to_notify)
> +{
> +struct hlist_node *node, *next;
> +struct argo_pending_ent *ent;
> +
> +ASSERT(rw_is_locked(>argo->lock));
> +
> +spin_lock(_info->lock);
> +hlist_for_each_entry_safe(ent, node, next, _info->pending, node)
> +{
> +if ( payload_space >= ent->len )
> +{
> +hlist_del(>node);
> +hlist_add_head(>node, to_notify);
> +}
> +}

So if there's space available to fit e.g. just the first pending entry,
you'd continue the loop and also signal all others, provided their
lengths aren't too big? What good does producing such a burst of
notifications do, when only one of the interested parties is
actually going to be able to put something on the ring?

> @@ -705,6 +812,107 @@ argo_ring_remove_info(struct domain *d, struct 
> argo_ring_info *ring_info)
>  xfree(ring_info);
>  }
>  
> +/*ring data*/

Now this comment is malformed in any event.

> +static int
> +argo_fill_ring_data(struct domain *src_d,

const?

> +static long
> +argo_notify(struct domain *d,
> +XEN_GUEST_HANDLE_PARAM(argo_ring_data_t) ring_data_hnd)
> +{
> +argo_ring_data_t ring_data;
> +int ret = 0;
> +
> +read_lock(_lock);
> +
> +if ( !d->argo )
> +{
> +read_unlock(_lock);
> +argo_dprintk("!d->argo, ENODEV\n");
> +return -ENODEV;
> +}
> +
> +argo_notify_check_pending(d);
> +
> +do {
> +if ( !guest_handle_is_null(ring_data_hnd) )
> +{
> +/* Quick sanity check on ring_data_hnd */
> +ret = copy_field_from_guest_errno(_data, ring_data_hnd, 
> magic);
> +if ( ret )
> +break;
> +
> +if ( ring_data.magic != ARGO_RING_DATA_MAGIC )
> +{
> +argo_dprintk(
> +"ring.magic(%"PRIx64") != ARGO_RING_MAGIC(%llx), 
> EINVAL\n",
> +ring_data.magic, ARGO_RING_MAGIC);
> +ret = -EINVAL;
> +break;
> +}
> +
> +ret = copy_from_guest_errno(_data, ring_data_hnd, 1);
> +if ( ret )
> +break;
> +
> +{
> +/*
> + * This is a guest pointer passed as a field in a struct
> + * so XEN_GUEST_HANDLE is used.
> + */
> +XEN_GUEST_HANDLE(argo_ring_data_ent_t) ring_data_ent_hnd;
> +ring_data_ent_hnd = guest_handle_for_field(ring_data_hnd,
> +   
> argo_ring_data_ent_t,
> +   data[0]);
> +ret = argo_fill_ring_data_array(d, ring_data.nent,
> +ring_data_ent_hnd);
> +}

Stray braces and bogus indentation: The declaration can move up
and then you don't need the braces and the extra level of indentation.

> @@ -103,6 +104,40 @@ typedef struct argo_ring
>   */
>  #define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf)
>  
> +/*
> + * Notify flags
> + */
> +/* Ring is empty */
> +#define

Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation

2018-12-13 Thread Olaf Hering

Am Thu, 13 Dec 2018 03:41:44 -0700
schrieb "Jan Beulich" :

> And further argumentation that everyone is using NTP anyway
> doesn't make it any better, when it's no-where written down that
> Xen is unusable with NTP running in all guests (I'm exaggerating
> here just to get the point over). Don't forget that e.g. with
> XenoLinux'es independent-wallclock setting defaulting to false, we've
> been suggesting that people _don't_ need to use NTP inside their
> (admittedly PV) guests. IOW - your change may not break people
> not using NTP. Hence I don't think the mode you introduce can be
> a default-on one, which in turn means a per-domain or at least
> global enable control is needed (as over previous iterations we
> seem to have been agreeing).

Regarding the possible unexpected drift if NTP is not used, and the domU
uses TSC as clocksource anyway: If one host is on one edge of the assumed
cpu_khz value and the other host is on the opposite edge, and the range
will be 200 as it is in my patch, the daily drift on a 2.3GHz host would
be 7.4seconds per day. This number is based on the fact that 200kz happen
within a time span of 86us, which I think is correct.

The question is, how much drift can be tolerated even without my patch.
Looking through 5 different hosts I use, the range is between -11 and +22,
and it happens to be +56 on my Laptop. So even that host with a calculated
drift of +22 would be off by 2 seconds per day if ntpd would not run.


Olaf


pgpizy58joV2T.pgp
Description: Digitale Signatur von OpenPGP
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events

2018-12-13 Thread Razvan Cojocaru

On 12/13/18 2:39 PM, Julien Grall wrote:
> Hi,
> 
> On 12/13/18 12:15 PM, Razvan Cojocaru wrote:
>> On 12/13/18 2:04 PM, Julien Grall wrote:
>>> Hi,
>>>
>>> On 12/13/18 8:03 AM, Razvan Cojocaru wrote:
 On 12/13/18 8:54 AM, Tian, Kevin wrote:
>> From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com]
>> Sent: Tuesday, December 11, 2018 8:33 PM
>>
>>> In any case, I think you want to rename the function and/or document
>>> more that expected behavior.
>>
>> You're right, I should probably rename that function / variable to
>> better reflect what it signifies - that sync vm_event processing
>> is in
>> progress. For VMX and SVM, that simply means that interrupts will be
>> blocked, and the value of the variable will be correct and possibly
>> useful for ARM as well.
>>
>
> what about vm_event_block_interrupt_injection? in that case
> it's injection instead of interrupt itself being blocked. blocking
> injection should mean same thing cross archs?
>>>
>>> Why would you want to block all interrupts injections? When I looked at
>>> the details, it feels more you want to block exceptions.
>>>
>>> I can see use for blocking exception on Arm, blocking all the interrupts
>>> is likely going to bring more issues than solving anything.
>>>
>>> So a better name would be vm_event_block_exception_injection.
>>
>> I'd like to block the writing of anything, by vmx_intr_assist(), into
>> VM_ENTRY_INTR_INFO, because an emulation attempt that happens
>> post-vmx_intr_assist() (because the vm_event client application has
>> requested it) may write an exception of its own there.
>>
>> Since vmx_intr_assist() is called on VMX between the time of sending out
>> the vm_event and the emulation (which happens in
>> hvm_vm_event_do_resume()), we want to block everything that it may write
>> in the VMCS until the emulation is done. I think that's more than just
>> exceptions.
> 
> I don't know in details how x86 virtualization works, so it is a bit
> hard to comment on that. However, it feels to me that you are
> introducing in common code a function that will workaround an
> architecture specific problem.
> 
> Can you try to explain it in agnostic word?

I'll certainly do my best. :)

Assume the following scenario:

1. A guest instruction tries to write into read-only memory (as set in
the EPT), with monitoring active for the domain.

2. An EPT violation exit occurs, and in the course of it, the VCPU that
was running the code that produced the violation is paused and a
vm_event is sent to an introspection application.

3. The introspection application is processing the vm_event. During this
time, some event may occur (which will remain pending).

4. The introspection agent replies with "please emulate" the current
instruction - since the emulator (currently) does not care about EPT
restrictions, so this is a cheap way of proceeding without lifting them.

5. With x86, we have {vmx,svm}_intr_assist(), which is guaranteed to be
called _before_ the VCPU is woken up again. This is the designated "pick
up pending interrupts" function on x86. Perhaps this (real-life)
backtrace is helpful:

(XEN) Xen call trace:
(XEN)[] vmx.c#__vmx_inject_exception+0xa1/0xda
(XEN)[] vmx_inject_extint+0x94/0x9f
(XEN)[] vmx_intr_assist+0x4ee/0x5ad
(XEN)[] vmx_asm_vmexit_handler+0xff/0x270

On x86, there can (currently) be only one scheduled interrupt /
exception, and that is written into VM_ENTRY_INTR_INFO in the VMCS on Intel.

6. _After_ vmx_intr_assist() has run, we are now trying to emulate the
current instruction, which may cause an exception, which will overwrite
the pending interrupt.

So, long story short, on VMX we first send out the vm_event, while
processing it an interrupt / exception may become pending, before
resuming the VCPU that has sent out the vm_event there's a Xen function
that picks up the pending interrupt and schedules it (writes it in the
VMCS), and only then we attempt the emulation, which may overwrite it
(because there's only one place we can write to schedule interrupts /
exceptions).

> To expand what I said above, I think it is reasonable to request
> blocking exception (e.g page-fault...) because they can be generated by
> an instruction. However, all interrupts generated by the interrupt
> controller (e.g device, IPI..) should not be blocked.
> 
> AFAIU your description, it is the same path to handle the two on x86,
> right?

Pretty much, yes. Technically speaking there are two that I am aware of,
the second of them being the IDT vectoring case just before the EPT
fault exit - but that's outside the scope of this patch. Just mentioning
it for completeness.

Also, it is worth mentioning that:

1. This is the exact same strategy employed by the single-stepping
functionality on VMX / Intel. In fact, if you look at
xen/arch/x86/hvm/vmx/intr.c, in vmx_intr_assist(), you'll see an early
return blocking injection of interrupts for the duration of

Re: [Xen-devel] [PATCH 05/25] argo: Add initial argo_init and argo_destroy

2018-12-13 Thread Jan Beulich

>>> On 01.12.18 at 02:32,  wrote:
> --- /dev/null
> +++ b/xen/include/public/argo.h
> @@ -0,0 +1,55 @@
> +/**
> + * Argo : Hypervisor-Mediated data eXchange
> + *
> + * Derived from v4v, the version 2 of v2v.
> + *
> + * Copyright (c) 2010, Citrix Systems
> + * Copyright (c) 2018, BAE Systems
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + */
> +
> +#ifndef __XEN_PUBLIC_ARGO_H__
> +#define __XEN_PUBLIC_ARGO_H__
> +
> +#include "xen.h"
> +
> +typedef struct argo_addr
> +{
> +uint32_t port;
> +domid_t domain_id;
> +uint16_t pad;
> +} argo_addr_t;
> +
> +typedef struct argo_ring_id
> +{
> +struct argo_addr addr;
> +domid_t partner;
> +uint16_t pad;
> +} argo_ring_id_t;
> +
> +typedef struct argo_ring
> +{
> +uint64_t magic;
> +argo_ring_id_t id;
> +uint32_t len;
> +/* Guests should use atomic operations to access rx_ptr */
> +uint32_t rx_ptr;
> +/* Guests should use atomic operations to access tx_ptr */
> +uint32_t tx_ptr;
> +uint8_t reserved[32];
> +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
> +uint8_t ring[];
> +#elif defined(__GNUC__)
> +uint8_t ring[0];
> +#endif
> +} argo_ring_t;

Btw, for all structure types you define, and with your desire to avoid
compat mode translation, you should add ?-prefixed entries to
xen/include/xlat.lst and invoke the produced CHECK_* macros from
somewhere. If, for reference, you'd look at existing instances, you'll
then also find another reason why all these would better have xen_
prefixes.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-12-13 Thread Paolo Bonzini

On 12/12/18 21:39, Borislav Petkov wrote:
> On Tue, Dec 11, 2018 at 11:29:21AM -0800, Maran Wilson wrote:
>> Is your question about what options you need to provide to Qemu? Or is your
>> question about the SW implementation choices?
>>
>> Assuming the former...
> Yeah, that's what I wanted to know. But looking at it, I'm booting
> bzImage here just as quickly and as flexible so I don't see the
> advantage of this new method for my use case here of booting kernels
> in qemu.

It's not firmware that is slow, decompression is.  Unlike Xen, which is
using PVH with a regular bzImage and decompression in the host, KVM is
using PVH to boot a vmlinux with no decompression at all.

Paolo

> But maybe there's a good use case where firmware is slow and one doesn't
> really wanna noodle through it or when one does start a gazillion VMs
> per second or whatever...


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point

2018-12-13 Thread Paolo Bonzini

On 12/12/18 19:09, Maran Wilson wrote:
> On 12/6/2018 1:21 PM, Paolo Bonzini wrote:
>> On 06/12/18 07:02, Maran Wilson wrote:
>>> For certain applications it is desirable to rapidly boot a KVM virtual
>>> machine. In cases where legacy hardware and software support within the
>>> guest is not needed, Qemu should be able to boot directly into the
>>> uncompressed Linux kernel binary without the need to run firmware.
>>>
>>> There already exists an ABI to allow this for Xen PVH guests and the ABI
>>> is supported by Linux and FreeBSD:
>>>
>>>     https://xenbits.xen.org/docs/unstable/misc/pvh.html
>>>
>>> This patch series would enable Qemu to use that same entry point for
>>> booting KVM guests.
>> Thanks!  I should be able to post a Tested-by next Monday.  Boris, are
>> you going to pick it up for 4.21?
> 
> Hi Paolo,
> 
> Are you still planning on running some testing of your own for these
> patches? Should Boris wait to hear from you before moving forward?

He can go ahead with v9, I was just curious about it.

Paolo


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device

2018-12-13 Thread Chao Gao

On Thu, Dec 13, 2018 at 12:54:52AM -0700, Jan Beulich wrote:
 On 13.12.18 at 04:46,  wrote:
>> On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote:
>> On 12.12.18 at 16:18,  wrote:
 On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote:
 On 12.12.18 at 08:06,  wrote:
>> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote:
>>>On 12/5/18 4:32 AM, Roger Pau Monné wrote:
 On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote:
> I find some pass-thru devices don't work any more across guest reboot.
> Assigning it to another guest also meets the same issue. And the only
> way to make it work again is un-binding and binding it to pciback.
> Someone reported this issue one year ago [1]. More detail also can be
> found in [2].
>
> The root-cause is Xen's internal MSI-X state isn't reset properly
> during reboot or re-assignment. In the above case, Xen set maskall bit
> to mask all MSI interrupts after it detected a potential security
> issue. Even after device reset, Xen didn't reset its internal maskall
> bit. As a result, maskall bit would be set again in next write to
> MSI-X message control register.
>
> Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X
> internal state of a device, we employ it to fix this issue rather than
> introducing another dedicated sub-hypercall.
>
> Note that PHYSDEVOPS_release_msix() will fail if the mapping between
> the device's msix and pirq has been created. This limitation prevents
> us calling this function when detaching a device from a guest during
> guest shutdown. Thus it is called right before calling
> PHYSDEVOPS_prepare_msix().
 s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the
 () at the end of the hypercall name since it's not a function.

 I'm also wondering why the release can't be done when the device is
 detached from the guest (or the guest has been shut down). This makes
 me worry about the raciness of the attach/detach procedure: if there's
 a state where pciback assumes the device has been detached from the
 guest, but there are still pirqs bound, an attempt to attach to
 another guest in such state will fail.
>>>
>>>I wonder whether this additional reset functionality could be done out
>>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset
>>>and then do the extra things that are not properly done there.
>> 
>> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying
>> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal
>> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished
>> without error. But ATM, xen expects that no msi is bound to pirq when
>> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY.
>> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove().
>> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen
>> at last minute, which happens after device reset in 
>> xen_pcibk_xenbus_remove().
>
>But that may need taking care of: I don't think it is a good idea to have
>anything left from the prior owning domain when the device gets reset.
>I.e. left over IRQ bindings should perhaps be forcibly cleared before
>invoking the reset;
 
 Agree. How about pciback to track the established IRQ bindings? Then
 pciback can clear irq binding before invoking the reset.
>>>
>>>How would pciback even know of those mappings, when it's qemu
>>>who establishes (and manages) them?
>> 
>> I meant to expose some interfaces from pciback. And pciback serves
>> as the proxy of IRQ (un)binding APIs.
>
>If at all possible we should avoid having to change more parties (qemu,
>libxc, kernel, hypervisor) than really necessary. Remember that such
>a bug fix may want backporting, and making sure affected people have
>all relevant components updated is increasingly difficult with their
>number growing.
>
>in fact I'd expect this to happen in the course of
>domain destruction, and I'd expect the device reset to come after the
>domain was cleaned up. Perhaps simply an ordering issue in the tool
>stack?
 
 I don't think reversing the sequences of device reset and domain
 destruction would be simple. Furthermore, during device hot-unplug,
 device reset is done when the owner is alive. So if we use domain
 destruction to enforce all irq binding cleared, in theory, it won't be
 applicable to hot-unplug case (if qemu's hot-unplug logic is
 compromised).
>>>
>>>Even in the hot-unplug case the tool stack could issue unbind
>>>requests, behind the back of the possibly compromised qemu,
>>>once

Re: [Xen-devel] [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ

2018-12-13 Thread James

I think there are two issues:

1) VIRQ vs some other sort of event channel

   For PV guests we originally chose a VIRQ in order to have a well known
   number against which the kernel driver could bind, so that it wasn't
   dependent on any of the other interdomain communication systems (such
   as xenstore) to find the correct channel.

   VIRQs and events in general make sense for PV domains since the
   up call mechanism fits well into the way argo expects scheduling to work.

   I dont see any pressing reason to not use a VIRQ for PV or PVH domains,
   perhaps I've missed something.

2) VIRQ vs direct injection of vector in hvm case.

   for HVM guests - you can make the argument for injection via a (potentially
   hardware) emulated LAPIC. In this case the best performance is obtained by
   not having to clear the interrupt (requiring another VMEXIT). The only two
   sorts of interrupts that have that property are 1) MSIs, and 2) edge 
interrupts
   via the IOAPIC:

   Not all operating systems had/have good MSI support, PCI doesn't [really] 
support
   edge interrupts on LNK[ABCD], and even if you go the PCI route almost no 
OSes do
   the right thing anyway. However it is possible to specify a device via ACPI 
with a
   fixed GSI with an edge interrupt. Windows from at least XP, and linux both 
handle
   that well and it's the perfect fit for sending the idempotent "you have work 
to
   do" type interrupt that argo needs.

   One of the design goals was that multiple independent actors in a VM should
   be able to run without knowledge of each other, and using a standard GSI 
allows
   something like an EDK II, grub and linux to all use it without having to hand
   forward state.


J.
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI

2018-12-13 Thread Stefano Garzarella

Hi Liam,
in order to support PVH also with SeaBIOS, I'm going to work on a new
option rom (like linuxboot/multiboot) that can be used in this case.

I'll keep you updated on it!

Cheers,
Stefano
On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick  wrote:
>
> These changes (along with corresponding qboot and Linux kernel changes)
> enable a guest to be booted using the x86/HVM direct boot ABI.
>
> This commit adds a load_elfboot() routine to pass the size and
> location of the kernel entry point to qboot (which will fill in
> the start_info struct information needed to to boot the guest).
> Having loaded the ELF binary, load_linux() will run qboot
> which continues the boot.
>
> The address for the kernel entry point has already been read
> from an ELF Note in the uncompressed kernel binary earlier
> in pc_memory_init().
>
> Signed-off-by: George Kennedy 
> Signed-off-by: Liam Merwick 
> ---
>  hw/i386/pc.c | 72 
> 
>  1 file changed, 72 insertions(+)
>
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 056aa46d99b9..d3012cbd8597 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -54,6 +54,7 @@
>  #include "sysemu/qtest.h"
>  #include "kvm_i386.h"
>  #include "hw/xen/xen.h"
> +#include "hw/xen/start_info.h"
>  #include "ui/qemu-spice.h"
>  #include "exec/memory.h"
>  #include "exec/address-spaces.h"
> @@ -1098,6 +1099,50 @@ done:
>  return pvh_start_addr != 0;
>  }
>
> +static bool load_elfboot(const char *kernel_filename,
> +   int kernel_file_size,
> +   uint8_t *header,
> +   size_t pvh_xen_start_addr,
> +   FWCfgState *fw_cfg)
> +{
> +uint32_t flags = 0;
> +uint32_t mh_load_addr = 0;
> +uint32_t elf_kernel_size = 0;
> +uint64_t elf_entry;
> +uint64_t elf_low, elf_high;
> +int kernel_size;
> +
> +if (ldl_p(header) != 0x464c457f) {
> +return false; /* no elfboot */
> +}
> +
> +bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> +flags = elf_is64 ?
> +((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> +
> +if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> +error_report("elfboot unsupported flags = %x", flags);
> +exit(1);
> +}
> +
> +kernel_size = load_elf(kernel_filename, NULL, NULL, _entry,
> +   _low, _high, 0, I386_ELF_MACHINE,
> +   0, 0);
> +
> +if (kernel_size < 0) {
> +error_report("Error while loading elf kernel");
> +exit(1);
> +}
> +mh_load_addr = elf_low;
> +elf_kernel_size = elf_high - elf_low;
> +
> +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
> +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> +
> +return true;
> +}
> +
>  static void load_linux(PCMachineState *pcms,
> FWCfgState *fw_cfg)
>  {
> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
>  if (ldl_p(header+0x202) == 0x53726448) {
>  protocol = lduw_p(header+0x206);
>  } else {
> +/* If the kernel address for using the x86/HVM direct boot ABI has
> + * been saved then proceed with booting the uncompressed kernel */
> +if (pvh_start_addr) {
> +if (load_elfboot(kernel_filename, kernel_size,
> + header, pvh_start_addr, fw_cfg)) {
> +struct hvm_modlist_entry ramdisk_mod = { 0 };
> +
> +fclose(f);
> +
> +fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE,
> +strlen(kernel_cmdline) + 1);
> +fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, 
> kernel_cmdline);
> +
> +assert(machine->device_memory != NULL);
> +ramdisk_mod.paddr = machine->device_memory->base;
> +ramdisk_mod.size =
> +memory_region_size(>device_memory->mr);
> +
> +fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, _mod,
> + sizeof(ramdisk_mod));
> +fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header));
> +fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA,
> + header, sizeof(header));
> +
> +return;
> +}
> +}
>  /* This looks like a multiboot kernel. If it is, let's stop
> treating it like a Linux kernel. */
>  if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename,
> --
> 1.8.3.1
>


-- 
Stefano Garzarella
Red Hat

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI

2018-12-13 Thread Stefano Garzarella

On Tue, Dec 11, 2018 at 7:35 PM Maran Wilson  wrote:
>
> On 12/11/2018 9:11 AM, Stefano Garzarella wrote:
> > Hi Liam,
> > in order to support PVH also with SeaBIOS, I'm going to work on a new
> > option rom (like linuxboot/multiboot) that can be used in this case.
>
> That is awesome. Yes, please keep us posted when you have something working.

Yes, I'll keep you updated!

>
> Just FYI, before switching over to using Qemu+qboot, we had been using a
> Qemu only solution (but not using an option rom) internally that worked
> very well using no FW at all. We had Qemu simply parse the ELF file and
> jump to the PVH entry point if one is found. The only gotcha was that we
> had to include a pair of patches that were originally written by folks
> at Intel as part of the clear containers work. Specifically, in order to
> be able to skip firmware entirely, we had to do 2 additional things: (1)
> ACPI tables generated by Qemu are usually patched up by FW. Since we
> were running no FW, we needed to do that patching up of the ACPI tables
> in Qemu when it was detected that we were going to enter the OS via the
> PVH entry point. (2) We also needed to add a patch to Qemu to enable a
> few PM registers -- something typically done by FW.

I had a look of qemu-lite, are you referring to this?

>
> But if SeaBIOS is involved in the solution you are working on, I guess
> you won't really need those extra patches. Just figured I'd mention it
> so you have the full picture.

Thank you very much to share with me these details!

Cheers,
Stefano

>
> Thanks,
> -Maran
>
> > I'll keep you updated on it!
> >
> > Cheers,
> > Stefano
> > On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick  
> > wrote:
> >> These changes (along with corresponding qboot and Linux kernel changes)
> >> enable a guest to be booted using the x86/HVM direct boot ABI.
> >>
> >> This commit adds a load_elfboot() routine to pass the size and
> >> location of the kernel entry point to qboot (which will fill in
> >> the start_info struct information needed to to boot the guest).
> >> Having loaded the ELF binary, load_linux() will run qboot
> >> which continues the boot.
> >>
> >> The address for the kernel entry point has already been read
> >> from an ELF Note in the uncompressed kernel binary earlier
> >> in pc_memory_init().
> >>
> >> Signed-off-by: George Kennedy 
> >> Signed-off-by: Liam Merwick 
> >> ---
> >>   hw/i386/pc.c | 72 
> >> 
> >>   1 file changed, 72 insertions(+)
> >>
> >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >> index 056aa46d99b9..d3012cbd8597 100644
> >> --- a/hw/i386/pc.c
> >> +++ b/hw/i386/pc.c
> >> @@ -54,6 +54,7 @@
> >>   #include "sysemu/qtest.h"
> >>   #include "kvm_i386.h"
> >>   #include "hw/xen/xen.h"
> >> +#include "hw/xen/start_info.h"
> >>   #include "ui/qemu-spice.h"
> >>   #include "exec/memory.h"
> >>   #include "exec/address-spaces.h"
> >> @@ -1098,6 +1099,50 @@ done:
> >>   return pvh_start_addr != 0;
> >>   }
> >>
> >> +static bool load_elfboot(const char *kernel_filename,
> >> +   int kernel_file_size,
> >> +   uint8_t *header,
> >> +   size_t pvh_xen_start_addr,
> >> +   FWCfgState *fw_cfg)
> >> +{
> >> +uint32_t flags = 0;
> >> +uint32_t mh_load_addr = 0;
> >> +uint32_t elf_kernel_size = 0;
> >> +uint64_t elf_entry;
> >> +uint64_t elf_low, elf_high;
> >> +int kernel_size;
> >> +
> >> +if (ldl_p(header) != 0x464c457f) {
> >> +return false; /* no elfboot */
> >> +}
> >> +
> >> +bool elf_is64 = header[EI_CLASS] == ELFCLASS64;
> >> +flags = elf_is64 ?
> >> +((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags;
> >> +
> >> +if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */
> >> +error_report("elfboot unsupported flags = %x", flags);
> >> +exit(1);
> >> +}
> >> +
> >> +kernel_size = load_elf(kernel_filename, NULL, NULL, _entry,
> >> +   _low, _high, 0, I386_ELF_MACHINE,
> >> +   0, 0);
> >> +
> >> +if (kernel_size < 0) {
> >> +error_report("Error while loading elf kernel");
> >> +exit(1);
> >> +}
> >> +mh_load_addr = elf_low;
> >> +elf_kernel_size = elf_high - elf_low;
> >> +
> >> +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr);
> >> +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr);
> >> +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size);
> >> +
> >> +return true;
> >> +}
> >> +
> >>   static void load_linux(PCMachineState *pcms,
> >>  FWCfgState *fw_cfg)
> >>   {
> >> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms,
> >>   if (ldl_p(header+0x202) == 0x53726448) {
> >>   protocol = lduw_p(header+0x206);
> >>   } else {
> >> +/* If the kernel address for using the x86/HVM direct boot ABI has
> >> +

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 12:39,  wrote:
> On Wed, Dec 12, 2018 at 09:07:14AM -0700, Jan Beulich wrote:
>> >>> On 12.12.18 at 15:54,  wrote:
>> > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d,
>> >  bytes = (unsigned int)((sc->pages - pages + 7) >> 3);
>> >  if ( likely(peek) )
>> >  {
>> > +if ( paging_mode_enabled(current->domain) )
>> > +/*
>> > + * Drop the target p2m lock, or else Xen will 
>> > panic
>> > + * when trying to acquire the p2m lock of the 
>> > caller
>> > + * due to invalid lock order. Note that there are 
>> > no
>> > + * lock ordering issues here, and the panic is 
>> > due to
>> > + * the fact that the lock level tracking doesn't 
>> > record
>> > + * the domain the lock belongs to.
>> > + */
>> > +paging_unlock(d);
>> 
>> This makes it sound as if tracking the domain would help. It doesn't,
>> at least not as long as there is not also some sort of ordering or
>> hierarchy between domains. IOW I'd prefer if the "doesn't" became
>> "can't".
> 
> Well, Just keeping correct order between each domain locks should be
> enough?
> 
> Ie: exactly the same that Xen currently does but on a per-domain
> basis. This is feasible, but each CPU would need to store the lock
> order of each possible domain:
> 
> DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);
> 
> This would consume ~32KB per CPU, which is not that much but seems a
> waste when most of the time a single entry will be used.

Well, tracking by domain ID wouldn't help you - the controlling
domain may well have a higher ID than the being controlled one,
i.e. the nesting you want needs to be independent of domain ID.

>> Now before we go this route I think we need to consider whether
>> this really is the only place where the two locks get into one
>> another's way. That's because I don't think we want to introduce
>> more of such, well, hackery, and hence we'd otherwise need a
>> better solution. For example the locking model could be adjusted
>> to allow such nesting in the general case: Dom0 and any domain
>> whose ->target matches the subject domain here could be given
>> a slightly different "weight" in the lock order violation check logic.
> 
> So locks from domains != current would be given a lower order, let's
> say:
> 
> #define MM_LOCK_ORDER_nestedp2m   8
> #define MM_LOCK_ORDER_p2m16
> #define MM_LOCK_ORDER_per_page_sharing   24
> #define MM_LOCK_ORDER_altp2mlist 32
> #define MM_LOCK_ORDER_altp2m 40
> #define MM_LOCK_ORDER_pod48
> #define MM_LOCK_ORDER_page_alloc 56
> #define MM_LOCK_ORDER_paging 64
> #define MM_LOCK_ORDER_MAXMM_LOCK_ORDER_paging
> 
> If domain != current, the above values are used. If domain == current
> the values above are used + MM_LOCK_ORDER_MAX? So in that case the
> order of MM_LOCK_ORDER_p2m against the current domain would be 16 + 64
> = 80?
> 
> This has the slight inconvenience that not all mm lock call sites have
> the target domain available, but can be solved.

No, I wasn't envisioning a really generic approach, i.e.
permitting this for all of the locks. Do we need this? Till now
I was rather considering to do this for just the paging lock,
with Dom0 and the controlling domains using just a small
(+/- 2 and +/- 1) offset from the normal paging one.
However, I now realize that indeed there may be more of
such nesting, and you've merely hit a specific case of it.

In any event it seems to me that you've got the ordering
inversed, i.e. domain == current (or really: the general case,
i.e. after excluding the two special cases) would use base
values, domain == currd->target would use some offset, and
Dom0 would use double the offset.

With it extended to all locks I'm no longer sure though that
the end result would be safe. I think it would be helpful to
rope in whoever did originally design this locking model.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 16/18] xen: automatically create XenBlockDevice-s

2018-12-13 Thread Paul Durrant

> -Original Message-
> From: Kevin Wolf [mailto:kw...@redhat.com]
> Sent: 13 December 2018 11:52
> To: Paul Durrant 
> Cc: qemu-de...@nongnu.org; qemu-bl...@nongnu.org; xen-
> de...@lists.xenproject.org; Max Reitz ; Stefano
> Stabellini 
> Subject: Re: [PATCH v4 16/18] xen: automatically create XenBlockDevice-s
> 
> Am 11.12.2018 um 16:57 hat Paul Durrant geschrieben:
> > This patch adds a creator function for XenBlockDevice-s so that they can
> > be created automatically when the Xen toolstack instantiates a new
> > PV backend. When the XenBlockDevice is created this way it is also
> > necessary to create a drive which matches the configuration that the Xen
> > toolstack has written into xenstore. This drive is marked 'auto_del' so
> > that it will be removed when the XenBlockDevice is destroyed. Also, for
> > compatibility with the legacy 'xen_disk' implementation, an iothread
> > is automatically created for the new XenBlockDevice. This will also be
> > removed when the XenBlockDevice is destroyed.
> >
> > Correspondingly the legacy backend scan for 'qdisk' is removed.
> >
> > After this patch is applied the legacy 'xen_disk' code is redundant. It
> > will be removed by a subsequent patch.
> >
> > Signed-off-by: Paul Durrant 
> > Reviewed-by: Anthony Perard 
> 
> So I have two points for this patch.
> 
> The first is that devices creating their own backends feels so wrong.

Indeed it does feel wrong, but this is the only way to deal with existing Xen 
toolstacks. Once toolstacks have been ported to using QMP to instantiate 
backends then this code can go away.

> I
> know that the old xen_disk did the same, and fixing it might neither be
> easy nor directly related to the qdevification, so requiring that from
> you would probably be unfair. But I still have to make the note, and
> hopefully we can get to it eventually (or maybe it is even easy enough
> that we can indeed address it in this series).
> 

No, it's not easy but once this series is committed the work can be started.

> My problem here is that I don't really understand the Xen mechanisms.
> Could you give me a very high-level overview of how adding a disk works
> and which component communicates with which other component to get the
> information down to QEMU and eventually the newly added
> xen_block_device_create()?

Xen toolstacks instantiate PV backends by just writing values into xenstore. It 
is up to the entity implementing the backend to set 'watches' so that it gets 
notified when these values appear. Currently that entity may be QEMU, or it may 
be a kernel driver such as Linux xen-blkback.ko.

> 
> Essentially, what I'm wondering is whether we have anything that could
> be treated more or less like another monitor besides QMP and HMP, which
> would internally work similar to HMP, i.e. map (almost) everything to
> QMP commands.

Yes, it would be possible to have a separate 'compatibility' daemon to watch 
xenstore and then formulate the correct sequence of QMP commands to instantiate 
the backend, but that is more complicated and the right answer of course is to 
have the toolstack send the QMP commands in the first place.

> I see that there is this XenWatch infrastructure to get
> notified about changes (which would be treated like monitor commands),
> but I'm not sure if everything would be covered by this mechanism or
> whether some things must be fetched explicitly.
> 
> Anyway, this is probably for later.
> 
> > +static void xen_block_drive_create(const char *id, const char
> *device_type,
> > +   QDict *opts, Error **errp)
> > +{
> > +const char *params = qdict_get_try_str(opts, "params");
> > +const char *mode = qdict_get_try_str(opts, "mode");
> > +const char *direct_io_safe = qdict_get_try_str(opts, "direct-io-
> safe");
> > +const char *discard_enable = qdict_get_try_str(opts, "discard-
> enable");
> > +char *format = NULL;
> > +char *file = NULL;
> > +char *drive_optstr = NULL;
> > +QemuOpts *drive_opts;
> > +Error *local_err = NULL;
> > +
> > +if (params) {
> > +char **v = g_strsplit(params, ":", 2);
> > +
> > +if (v[1] == NULL) {
> > +file = g_strdup(v[0]);
> > +} else {
> > +if (strcmp(v[0], "aio") == 0) {
> > +format = g_strdup("raw");
> > +} else if (strcmp(v[0], "vhd") == 0) {
> > +format = g_strdup("vpc");
> > +} else {
> > +format = g_strdup(v[0]);
> > +}
> > +file = g_strdup(v[1]);
> > +}
> > +
> > +g_strfreev(v);
> > +}
> > +
> > +if (!file) {
> > +error_setg(errp, "no file parameter");
> > +return;
> > +}
> > +
> > +drive_optstr = g_strdup_printf("id=%s", id);
> > +drive_opts = drive_def(drive_optstr);
> > +if (!drive_opts) {
> > +error_setg(errp, "failed to create drive options");
> > +goto done;
> > +}
> > +
> >

Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events

2018-12-13 Thread Julien Grall


Hi,

On 12/13/18 12:15 PM, Razvan Cojocaru wrote:

On 12/13/18 2:04 PM, Julien Grall wrote:

Hi,

On 12/13/18 8:03 AM, Razvan Cojocaru wrote:

On 12/13/18 8:54 AM, Tian, Kevin wrote:

From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com]
Sent: Tuesday, December 11, 2018 8:33 PM


In any case, I think you want to rename the function and/or document
more that expected behavior.


You're right, I should probably rename that function / variable to
better reflect what it signifies - that sync vm_event processing is in
progress. For VMX and SVM, that simply means that interrupts will be
blocked, and the value of the variable will be correct and possibly
useful for ARM as well.



what about vm_event_block_interrupt_injection? in that case
it's injection instead of interrupt itself being blocked. blocking
injection should mean same thing cross archs?


Why would you want to block all interrupts injections? When I looked at
the details, it feels more you want to block exceptions.

I can see use for blocking exception on Arm, blocking all the interrupts
is likely going to bring more issues than solving anything.

So a better name would be vm_event_block_exception_injection.


I'd like to block the writing of anything, by vmx_intr_assist(), into
VM_ENTRY_INTR_INFO, because an emulation attempt that happens
post-vmx_intr_assist() (because the vm_event client application has
requested it) may write an exception of its own there.

Since vmx_intr_assist() is called on VMX between the time of sending out
the vm_event and the emulation (which happens in
hvm_vm_event_do_resume()), we want to block everything that it may write
in the VMCS until the emulation is done. I think that's more than just
exceptions.


I don't know in details how x86 virtualization works, so it is a bit 
hard to comment on that. However, it feels to me that you are 
introducing in common code a function that will workaround an 
architecture specific problem.


Can you try to explain it in agnostic word?

To expand what I said above, I think it is reasonable to request 
blocking exception (e.g page-fault...) because they can be generated by 
an instruction. However, all interrupts generated by the interrupt 
controller (e.g device, IPI..) should not be blocked.


AFAIU your description, it is the same path to handle the two on x86, right?

On Arm, there are 2 distinct paths, interrupt generated by the interrupt 
controller are queued. The exceptions will be generated using multiple 
different paths. Yet exceptions can still override each other.


I can't see any reason for Arm to block interrupt generated by the 
interrupt controller. This would actually be dangerous due to the way we 
handle them in Xen currently. Instead we may want to block the exception 
as they can be generated by an instruction.




I've probably been confusing when I was talking about the exceptions
that emulating the current instruction may trigger - we don't want to
block those.


I understood that bit.




Thanks,
Razvan



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 0/3] Performance improvements for xen_disk^Wxen-block

2018-12-13 Thread Anthony PERARD

On Wed, Dec 12, 2018 at 11:16:23AM +, Paul Durrant wrote:
> This series is a re-base of Tim's v2 series [1] on top of my series [2].
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg00243.html
> [2] https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02271.html

For the series:
Acked-by: Anthony PERARD 

And I've pushed that here:
https://xenbits.xen.org/gitweb/?p=people/aperard/qemu-dm.git;a=shortlog;h=refs/heads/xen-next

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events

2018-12-13 Thread Razvan Cojocaru

On 12/13/18 2:04 PM, Julien Grall wrote:
> Hi,
> 
> On 12/13/18 8:03 AM, Razvan Cojocaru wrote:
>> On 12/13/18 8:54 AM, Tian, Kevin wrote:
 From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com]
 Sent: Tuesday, December 11, 2018 8:33 PM

> In any case, I think you want to rename the function and/or document
> more that expected behavior.

 You're right, I should probably rename that function / variable to
 better reflect what it signifies - that sync vm_event processing is in
 progress. For VMX and SVM, that simply means that interrupts will be
 blocked, and the value of the variable will be correct and possibly
 useful for ARM as well.

>>>
>>> what about vm_event_block_interrupt_injection? in that case
>>> it's injection instead of interrupt itself being blocked. blocking
>>> injection should mean same thing cross archs?
> 
> Why would you want to block all interrupts injections? When I looked at
> the details, it feels more you want to block exceptions.
> 
> I can see use for blocking exception on Arm, blocking all the interrupts
> is likely going to bring more issues than solving anything.
> 
> So a better name would be vm_event_block_exception_injection.

I'd like to block the writing of anything, by vmx_intr_assist(), into
VM_ENTRY_INTR_INFO, because an emulation attempt that happens
post-vmx_intr_assist() (because the vm_event client application has
requested it) may write an exception of its own there.

Since vmx_intr_assist() is called on VMX between the time of sending out
the vm_event and the emulation (which happens in
hvm_vm_event_do_resume()), we want to block everything that it may write
in the VMCS until the emulation is done. I think that's more than just
exceptions.

I've probably been confusing when I was talking about the exceptions
that emulating the current instruction may trigger - we don't want to
block those.

Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Qemu-devel] [PATCH for-3.2 v5 07/19] hw: apply accel compat properties without touching globals

2018-12-13 Thread Igor Mammedov

On Wed, 12 Dec 2018 16:00:06 +0400
Marc-André Lureau  wrote:

> Hi
> On Mon, Dec 10, 2018 at 8:55 PM Igor Mammedov  wrote:
> >
> > On Mon, 10 Dec 2018 17:45:22 +0100
> > Igor Mammedov  wrote:
> >  
> > > On Tue,  4 Dec 2018 18:20:11 +0400
> > > Marc-André Lureau  wrote:
> > >  
> > > > Instead of registering compat properties as globals, let's keep them
> > > > in their own array, to avoid mixing with user globals.
> > > >
> > > > Introduce object_apply_global_props() function, to apply compatibility
> > > > properties from a GPtrArray.
> > > >
> > > > Signed-off-by: Marc-André Lureau 
> > > > ---
> > > >  include/hw/qdev-core.h | 10 ++
> > > >  include/qom/object.h   |  3 +++
> > > >  include/sysemu/accel.h |  4 +---
> > > >  accel/accel.c  | 12 
> > > >  hw/core/qdev.c |  9 +
> > > >  hw/xen/xen-common.c|  9 ++---
> > > >  qom/object.c   | 25 +
> > > >  vl.c   |  1 -
> > > >  8 files changed, 54 insertions(+), 19 deletions(-)
> > > >  
> > > [...]  
> > > > diff --git a/hw/xen/xen-common.c b/hw/xen/xen-common.c
> > > > index 6ec14c73ca..4532aa8632 100644
> > > > --- a/hw/xen/xen-common.c
> > > > +++ b/hw/xen/xen-common.c
> > > > @@ -174,18 +174,21 @@ static GlobalProperty xen_compat_props[] = {
> > > >  .driver = "migration",
> > > >  .property = "send-section-footer",
> > > >  .value = "off",
> > > > -},
> > > > -{ /* end of list */ },
> > > > +}
> > > >  };
> > > >
> > > >  static void xen_accel_class_init(ObjectClass *oc, void *data)
> > > >  {
> > > >  AccelClass *ac = ACCEL_CLASS(oc);
> > > > +
> > > >  ac->name = "Xen";
> > > >  ac->init_machine = xen_init;
> > > >  ac->setup_post = xen_setup_post;
> > > >  ac->allowed = _allowed;
> > > > -ac->global_props = xen_compat_props;
> > > > +ac->compat_props = g_ptr_array_new();  
> > > where is matching free for that?  
> > can we at least annotate it somehow so that valgrind won't complain about 
> > this leak?  
> 
> If you check my commits on qemu, you should see that I do care (too
> much?) about leaks :)
> 
> In this case though, I don't see valgrind or asan complaining, I guess
> it's still a reachable reference.
> Do you think a FIXME comment would be helpful?
I've looked at other cases were we leak, and well we leak a lot so it's
probably futile exercise. But the comment won't hurt and will work as remainder.


> (/me wish we had a proper object system, GObject, but that ship as
> long sailed..)
> 
> >  
> > >  
> > > > +
> > > > +compat_props_add(ac->compat_props,
> > > > + xen_compat_props, G_N_ELEMENTS(xen_compat_props));
> > > >  }
> > > >
> > > >  #define TYPE_XEN_ACCEL ACCEL_CLASS_NAME("xen")
> > > > diff --git a/qom/object.c b/qom/object.c
> > > > index 17921c0a71..dbdab0aead 100644
> > > > --- a/qom/object.c
> > > > +++ b/qom/object.c
> > > > @@ -370,6 +370,31 @@ static void object_post_init_with_type(Object 
> > > > *obj, TypeImpl *ti)
> > > >  }
> > > >  }
> > > >
> > > > +void object_apply_global_props(Object *obj, const GPtrArray *props, 
> > > > Error **errp)
> > > > +{
> > > > +Error *err = NULL;
> > > > +int i;
> > > > +
> > > > +if (!props) {
> > > > +return;
> > > > +}
> > > > +
> > > > +for (i = 0; i < props->len; i++) {
> > > > +GlobalProperty *p = g_ptr_array_index(props, i);
> > > > +
> > > > +if (object_dynamic_cast(obj, p->driver) == NULL) {
> > > > +continue;
> > > > +}
> > > > +p->used = true;
> > > > +object_property_parse(obj, p->value, p->property, );
> > > > +if (err != NULL) {
> > > > +error_prepend(, "can't apply global %s.%s=%s: ",
> > > > +  p->driver, p->property, p->value);
> > > > +error_propagate(errp, err);
> > > > +}
> > > > +}
> > > > +}
> > > > +
> > > >  static void object_initialize_with_type(void *data, size_t size, 
> > > > TypeImpl *type)
> > > >  {
> > > >  Object *obj = data;
> > > > diff --git a/vl.c b/vl.c
> > > > index a5ae5f23d2..88ba658572 100644
> > > > --- a/vl.c
> > > > +++ b/vl.c
> > > > @@ -2968,7 +2968,6 @@ static void user_register_global_props(void)
> > > >   */
> > > >  static void register_global_properties(MachineState *ms)
> > > >  {
> > > > -accel_register_compat_props(ms->accelerator);
> > > >  machine_register_compat_props(ms);
> > > >  user_register_global_props();
> > > >  }  
> > >
> > >  
> >
> >  
> 
> 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events

2018-12-13 Thread Julien Grall


Hi,

On 12/13/18 8:03 AM, Razvan Cojocaru wrote:

On 12/13/18 8:54 AM, Tian, Kevin wrote:

From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com]
Sent: Tuesday, December 11, 2018 8:33 PM


In any case, I think you want to rename the function and/or document
more that expected behavior.


You're right, I should probably rename that function / variable to
better reflect what it signifies - that sync vm_event processing is in
progress. For VMX and SVM, that simply means that interrupts will be
blocked, and the value of the variable will be correct and possibly
useful for ARM as well.



what about vm_event_block_interrupt_injection? in that case
it's injection instead of interrupt itself being blocked. blocking
injection should mean same thing cross archs?


Why would you want to block all interrupts injections? When I looked at 
the details, it feels more you want to block exceptions.


I can see use for blocking exception on Arm, blocking all the interrupts 
is likely going to bring more issues than solving anything.


So a better name would be vm_event_block_exception_injection.

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] xen_disk: fix memory leak

2018-12-13 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Anthony PERARD
> Sent: 13 December 2018 11:34
> To: Olaf Hering 
> Cc: Kevin Wolf ; Stefano Stabellini
> ; open list:Block layer core  bl...@nongnu.org>; qemu-de...@nongnu.org; Max Reitz ;
> open list:X86 
> Subject: Re: [Xen-devel] [PATCH v1] xen_disk: fix memory leak
> 
> On Tue, Dec 11, 2018 at 05:02:24PM +0100, Olaf Hering wrote:
> > There are some code paths that clobber ioreq->buf, which leads to a huge
> > memory leak after a few hours of runtime. One code path is
> > qemu_aio_complete, which might be called recursive. Another one is
> 
> I think it's s/recursive/recursively/.
> 
> > ioreq_reset, which might clobber ioreq->buf as well.
> >
> > Add wrappers to free ioreq->buf before reassignment.
> >
> > Signed-off-by: Olaf Hering 
> 
> That patch seems fine, with a few coding style issues, and is going to
> be needed to be forward ported to Paul's reimplementation (not yet
> merged).

I already posted a patch from Tim Smith (re-based to the new xen-block 
datapath) that should fix this issue.

  Paul

> 
> > ---
> >  hw/block/xen_disk.c | 22 +-
> >  1 file changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index 36eff94f84..e15eefe625 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -103,12 +103,24 @@ struct XenBlkDev {
> >
> >  /* - */
> >
> > +static void ioreq_buf_alloc(struct ioreq *ioreq, size_t alignment)
> 
> You have the parameter `alignment` but don't actually use it, I don't
> think it's needed.
> 
> > +{
> > +if (ioreq->buf)
> > +qemu_vfree(ioreq->buf);
> 
> You could call ioreq_buf_free here instead of duplicating the code.
> 
> > +ioreq->buf = qemu_memalign(XC_PAGE_SIZE, ioreq->size);
> > +}
> > +static void ioreq_buf_free(struct ioreq *ioreq)
> > +{
> > +if (ioreq->buf)
> > +qemu_vfree(ioreq->buf);
> > +ioreq->buf = NULL;
> > +}
> 
> Thanks,
> 
> --
> Anthony PERARD
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 16/18] xen: automatically create XenBlockDevice-s

2018-12-13 Thread Kevin Wolf

Am 11.12.2018 um 16:57 hat Paul Durrant geschrieben:
> This patch adds a creator function for XenBlockDevice-s so that they can
> be created automatically when the Xen toolstack instantiates a new
> PV backend. When the XenBlockDevice is created this way it is also
> necessary to create a drive which matches the configuration that the Xen
> toolstack has written into xenstore. This drive is marked 'auto_del' so
> that it will be removed when the XenBlockDevice is destroyed. Also, for
> compatibility with the legacy 'xen_disk' implementation, an iothread
> is automatically created for the new XenBlockDevice. This will also be
> removed when the XenBlockDevice is destroyed.
> 
> Correspondingly the legacy backend scan for 'qdisk' is removed.
> 
> After this patch is applied the legacy 'xen_disk' code is redundant. It
> will be removed by a subsequent patch.
> 
> Signed-off-by: Paul Durrant 
> Reviewed-by: Anthony Perard 

So I have two points for this patch.

The first is that devices creating their own backends feels so wrong. I
know that the old xen_disk did the same, and fixing it might neither be
easy nor directly related to the qdevification, so requiring that from
you would probably be unfair. But I still have to make the note, and
hopefully we can get to it eventually (or maybe it is even easy enough
that we can indeed address it in this series).

My problem here is that I don't really understand the Xen mechanisms.
Could you give me a very high-level overview of how adding a disk works
and which component communicates with which other component to get the
information down to QEMU and eventually the newly added
xen_block_device_create()?

Essentially, what I'm wondering is whether we have anything that could
be treated more or less like another monitor besides QMP and HMP, which
would internally work similar to HMP, i.e. map (almost) everything to
QMP commands. I see that there is this XenWatch infrastructure to get
notified about changes (which would be treated like monitor commands),
but I'm not sure if everything would be covered by this mechanism or
whether some things must be fetched explicitly.

Anyway, this is probably for later.

> +static void xen_block_drive_create(const char *id, const char *device_type,
> +   QDict *opts, Error **errp)
> +{
> +const char *params = qdict_get_try_str(opts, "params");
> +const char *mode = qdict_get_try_str(opts, "mode");
> +const char *direct_io_safe = qdict_get_try_str(opts, "direct-io-safe");
> +const char *discard_enable = qdict_get_try_str(opts, "discard-enable");
> +char *format = NULL;
> +char *file = NULL;
> +char *drive_optstr = NULL;
> +QemuOpts *drive_opts;
> +Error *local_err = NULL;
> +
> +if (params) {
> +char **v = g_strsplit(params, ":", 2);
> +
> +if (v[1] == NULL) {
> +file = g_strdup(v[0]);
> +} else {
> +if (strcmp(v[0], "aio") == 0) {
> +format = g_strdup("raw");
> +} else if (strcmp(v[0], "vhd") == 0) {
> +format = g_strdup("vpc");
> +} else {
> +format = g_strdup(v[0]);
> +}
> +file = g_strdup(v[1]);
> +}
> +
> +g_strfreev(v);
> +}
> +
> +if (!file) {
> +error_setg(errp, "no file parameter");
> +return;
> +}
> +
> +drive_optstr = g_strdup_printf("id=%s", id);
> +drive_opts = drive_def(drive_optstr);
> +if (!drive_opts) {
> +error_setg(errp, "failed to create drive options");
> +goto done;
> +}
> +
> +qemu_opt_set(drive_opts, "file", file, _err);
> +if (local_err) {
> +error_propagate_prepend(errp, local_err, "failed to set 'file': ");
> +goto done;
> +}
> +
> +qemu_opt_set(drive_opts, "media", device_type, _err);
> +if (local_err) {
> +error_propagate_prepend(errp, local_err,
> +"failed to set 'media': ");
> +goto done;
> +}
> +
> +if (format) {
> +qemu_opt_set(drive_opts, "format", format, _err);
> +if (local_err) {
> +error_propagate_prepend(errp, local_err,
> +"failed to set 'format': ");
> +goto done;
> +}
> +}
> +
> +if (mode && *mode != 'w') {
> +qemu_opt_set_bool(drive_opts, BDRV_OPT_READ_ONLY, true, _err);
> +if (local_err) {
> +error_propagate_prepend(errp, local_err, "failed to set '%s': ",
> +BDRV_OPT_READ_ONLY);
> +goto done;
> +}
> +}
> +
> +/*
> + * It is necessary to turn file locking off as an emulated device
> + * my have already opened the same image file.
> + */
> +qemu_opt_set(drive_opts, "file.locking", "off", _err);
> +if (local_err) {
> +error_propagate_prepend(errp, local_err,
> +"failed to set

Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests

2018-12-13 Thread Roger Pau Monné

On Wed, Dec 12, 2018 at 09:07:14AM -0700, Jan Beulich wrote:
> >>> On 12.12.18 at 15:54,  wrote:
> > Fix this by releasing the target paging lock before attempting to
> > perform the copy of the dirty bitmap, and then forcing a restart of
> > the whole process in case there have been changes to the dirty bitmap
> > tables.
> 
> I'm afraid it's not that simple: The writer side (paging_mark_pfn_dirty())
> uses the same lock, and I think the assumption is that while the copying
> takes place no updates to the bitmap can occur. Then again the subject
> domain gets paused anyway, so updates probably can't happen (the
> "probably" of course needs to be eliminated).

I've looked into this, and I think you are right, the copy must be
done with the paging lock held. Even if the domain is paused, the
device model can still mark gfns as dirty using
XEN_DMOP_modified_memory, so the only option would be to copy to a
temporary page while holding the lock, release the lock and copy the
temporary page to the called buffer.

> > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d,
> >  bytes = (unsigned int)((sc->pages - pages + 7) >> 3);
> >  if ( likely(peek) )
> >  {
> > +if ( paging_mode_enabled(current->domain) )
> > +/*
> > + * Drop the target p2m lock, or else Xen will panic
> > + * when trying to acquire the p2m lock of the 
> > caller
> > + * due to invalid lock order. Note that there are 
> > no
> > + * lock ordering issues here, and the panic is due 
> > to
> > + * the fact that the lock level tracking doesn't 
> > record
> > + * the domain the lock belongs to.
> > + */
> > +paging_unlock(d);
> 
> This makes it sound as if tracking the domain would help. It doesn't,
> at least not as long as there is not also some sort of ordering or
> hierarchy between domains. IOW I'd prefer if the "doesn't" became
> "can't".

Well, Just keeping correct order between each domain locks should be
enough?

Ie: exactly the same that Xen currently does but on a per-domain
basis. This is feasible, but each CPU would need to store the lock
order of each possible domain:

DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]);

This would consume ~32KB per CPU, which is not that much but seems a
waste when most of the time a single entry will be used.

> Now before we go this route I think we need to consider whether
> this really is the only place where the two locks get into one
> another's way. That's because I don't think we want to introduce
> more of such, well, hackery, and hence we'd otherwise need a
> better solution. For example the locking model could be adjusted
> to allow such nesting in the general case: Dom0 and any domain
> whose ->target matches the subject domain here could be given
> a slightly different "weight" in the lock order violation check logic.

So locks from domains != current would be given a lower order, let's
say:

#define MM_LOCK_ORDER_nestedp2m   8
#define MM_LOCK_ORDER_p2m16
#define MM_LOCK_ORDER_per_page_sharing   24
#define MM_LOCK_ORDER_altp2mlist 32
#define MM_LOCK_ORDER_altp2m 40
#define MM_LOCK_ORDER_pod48
#define MM_LOCK_ORDER_page_alloc 56
#define MM_LOCK_ORDER_paging 64
#define MM_LOCK_ORDER_MAXMM_LOCK_ORDER_paging

If domain != current, the above values are used. If domain == current
the values above are used + MM_LOCK_ORDER_MAX? So in that case the
order of MM_LOCK_ORDER_p2m against the current domain would be 16 + 64
= 80?

This has the slight inconvenience that not all mm lock call sites have
the target domain available, but can be solved.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] xen_disk: fix memory leak

2018-12-13 Thread Anthony PERARD

On Tue, Dec 11, 2018 at 05:02:24PM +0100, Olaf Hering wrote:
> There are some code paths that clobber ioreq->buf, which leads to a huge
> memory leak after a few hours of runtime. One code path is
> qemu_aio_complete, which might be called recursive. Another one is

I think it's s/recursive/recursively/.

> ioreq_reset, which might clobber ioreq->buf as well.
> 
> Add wrappers to free ioreq->buf before reassignment.
> 
> Signed-off-by: Olaf Hering 

That patch seems fine, with a few coding style issues, and is going to
be needed to be forward ported to Paul's reimplementation (not yet
merged).

> ---
>  hw/block/xen_disk.c | 22 +-
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 36eff94f84..e15eefe625 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -103,12 +103,24 @@ struct XenBlkDev {
>  
>  /* - */
>  
> +static void ioreq_buf_alloc(struct ioreq *ioreq, size_t alignment)

You have the parameter `alignment` but don't actually use it, I don't
think it's needed.

> +{
> +if (ioreq->buf)
> +qemu_vfree(ioreq->buf);

You could call ioreq_buf_free here instead of duplicating the code.

> +ioreq->buf = qemu_memalign(XC_PAGE_SIZE, ioreq->size);
> +}
> +static void ioreq_buf_free(struct ioreq *ioreq)
> +{
> +if (ioreq->buf)
> +qemu_vfree(ioreq->buf);
> +ioreq->buf = NULL;
> +}

Thanks,

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1] xen: preserve COMPAT in CFLAGS

2018-12-13 Thread Anthony PERARD

Hi,

Ian, we have those XC_WANT_COMPAT_* #defines to allow consumers of Xen
libs be able to use old interfaces. Do you think it's a good idea to
have this consumers (QEMU here) #undef the flag when it has implemented
the newer interface?

I guess the issue we have here is that when libxc interface are
re-implemented into a xen libs, the meaning of XC_WANT_COMPAT_* flags is
changed. And the QEMU fails to build even with the flags supplied in
cflags.

There is another thread that Olaf have started here:
<20181025140808.13eefc21.o...@aepfle.de>
https://lists.xenproject.org/archives/html/xen-devel/2018-10/msg01950.html

On Fri, Oct 26, 2018 at 12:10:16PM +0200, Olaf Hering wrote:
> A given Qemu version can not predict what version of Xen it will run on.
> There are some checks in configure to decide what Xen libraries and
> functions are available. How exactly these functions must be accessed
> has to be decided by configure and the user who is compiling Qemu.
> In no way some random header file must override this decision.
> 
> Remove the breakage introduced by commit 5eeb39c24b, which would always
> hide the libxc interfaces the given version of Qemu knows about.
> 
> The current symptom of such breakage is a build failure with qemu-2.9
> and older, in combination with Xen 4.12.
> 
> Fixes: 5eeb39c24b7d4da5d129bfdd9c4fd21cfb3d28d6
> Signed-off-by: Olaf Hering 
> ---
>  include/hw/xen/xen_common.h | 9 -
>  1 file changed, 9 deletions(-)
> 
> diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
> index 5f1402b494..33fa2d3497 100644
> --- a/include/hw/xen/xen_common.h
> +++ b/include/hw/xen/xen_common.h
> @@ -1,15 +1,6 @@
>  #ifndef QEMU_HW_XEN_COMMON_H
>  #define QEMU_HW_XEN_COMMON_H
>  
> -/*
> - * If we have new enough libxenctrl then we do not want/need these compat
> - * interfaces, despite what the user supplied cflags might say. They
> - * must be undefined before including xenctrl.h
> - */
> -#undef XC_WANT_COMPAT_EVTCHN_API
> -#undef XC_WANT_COMPAT_GNTTAB_API
> -#undef XC_WANT_COMPAT_MAP_FOREIGN_API
> -
>  #include 
>  #include 
>  #include 

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation

2018-12-13 Thread Olaf Hering

Am Thu, 13 Dec 2018 03:41:44 -0700
schrieb "Jan Beulich" :

> In a second step, let's consider what impact errors in calibration have.
> Between two systems with exactly the same hardware crystal
> frequency there of course is going to be some drift. The problem
> though is - between the two calibrated clock values from two systems
> you can't easily tell what part of the difference is a result of the
> calibration being imprecise, and what part of it is because of the
> crystals not providing the exact same frequency. Without knowing
> the possible range of both errors, the argumentation of _one of
> them_ being tolerable within a certain range to consider _both_
> systems sufficiently equal is at least questionable.

We already have a knob for that, if a reference clock can not be provided:
 tsc_mode=always_emulate
This would exactly cover the case there the assumed frequency that is
available during the start of domU will be used after migration.

Olaf


pgpJRJytqkhcN.pgp
Description: Digitale Signatur von OpenPGP
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH for-4.12] dt: fix integer check in dt_device_is_compatible

2018-12-13 Thread Julien Grall


Hi Stefano,

On 12/12/18 11:53 PM, Stefano Stabellini wrote:

On Wed, 12 Dec 2018, Julien Grall wrote:

Hi Stefano,

On 11/12/2018 18:46, Stefano Stabellini wrote:

cplen is unsigned, thus, it can never be < 0. Use a signed integer local
variable instead.


The current code checks > 0. Looking at the code I don't think it can ever be
negative. So can you details what is the problem you are trying to resolve?


Yes, it can be "negative" (not actually negative because it is defined
as unsigned), in fact it happens with the nodes added dynamically by grub
at boot. This patches fixes booting from grub.

Specifically ilen is initialed to 16, but strlen+1 is 17, so length
becomes -1. The length of the property generated by grub seems to be
short by 1.
Such explanation should have been in the commit message, it helps to 
diagnostics where the problem is coming from.


Looking at the specification [1] section 2.3.1, a compatible property is 
a concatenated list of null terminated strings. So the current code is 
actually correct and there is a bug in GRUB.


This bug was actually fixed by commit ae5817f1d "arm64/xen_boot: Fix Xen 
boot using GRUB2 on AARCH64".


I assume you are using Grub 2.02 which does not contain the full support 
for Xen. So I would not even consider to work-around it in Xen.


Instead, I would recommend you to upgrade to a more recent GRUB. It 
would be more likely staging as there are no new GRUB version.


Another solution is to use chainloading.

Cheers,

[1] 
https://github.com/devicetree-org/devicetree-specification/releases/tag/v0.2






Signed-off-by: Stefano Stabellini 
---
   xen/common/device_tree.c | 7 +--
   1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index 8fc401d..df274cc 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -213,17 +213,20 @@ bool_t dt_device_is_compatible(const struct
dt_device_node *device,
   {
   const char* cp;
   u32 cplen, l;
+s64 ilen;
 cp = dt_get_property(device, "compatible", );
   if ( cp == NULL )
   return 0;
-while ( cplen > 0 )
+
+ilen = cplen;
+while ( ilen > 0 )
   {
   if ( dt_compat_cmp(cp, compat) == 0 )
   return 1;
   l = strlen(cp) + 1;
   cp += l;
-cplen -= l;
+ilen -= l;
   }
 return 0;



--
Julien Grall



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V11 4/5] p2m: Always use hostp2m when clipping rangesets

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 11:22,  wrote:
> For my own part, I see no reason why not clipping end should not work
> when updating the ranges only (as long as start continues to be <=
> unclipped_end).
> 
> Would that modification + testing of it help this series continue?

I think so, at least as far as I'm concerned. But I think we really need
George's opinion as well.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 10:04,  wrote:
> Am Thu, 13 Dec 2018 01:45:39 -0700
> schrieb "Jan Beulich" :
> 
>> >>> On 13.12.18 at 09:18,  wrote:  
>> > Am Wed, 12 Dec 2018 09:39:25 -0700
>> > schrieb "Jan Beulich" :
>> >   
>> >> >>> On 12.12.18 at 16:20,  wrote:
>> >> > If a domU uses TSC as clocksoure it also must run NTP in some way to
>> >> > avoid the potential drift what will most likely happen, independent of
>> >> > any migration.
>> >> Which drift? While anyone's well advised to run NTP, a completely
>> >> isolated set of systems may have no need to, if their interactions don't
>> >> depend on exactly matching time.  
>> > 
>> > If these hosts do not sync time to some reference host, the advancing of 
>> > time
>> > is undefined before and after my change.  
>> 
>> I'm lost. I simply don't understand what you're trying to tell me,
>> or how your answer relates to my question.
> 
> Then please rephrase the question?

I was asking the drift of what you are talking about.

> I do not see how my path affects the
> advancing of time in domUs running on isolated systems. If their hardware
> clocks advance at the same speed, my patch will not affect it. And if they
> do advance at a different speed, what can emulation do to help with 
> "correctness"?

In a first step, let's assume clock calibration produces a precise result.
In such a case, are we in agreement that emulation is needed after
migration if clock speeds differ, even if just slightly?

In a second step, let's consider what impact errors in calibration have.
Between two systems with exactly the same hardware crystal
frequency there of course is going to be some drift. The problem
though is - between the two calibrated clock values from two systems
you can't easily tell what part of the difference is a result of the
calibration being imprecise, and what part of it is because of the
crystals not providing the exact same frequency. Without knowing
the possible range of both errors, the argumentation of _one of
them_ being tolerable within a certain range to consider _both_
systems sufficiently equal is at least questionable.

And further argumentation that everyone is using NTP anyway
doesn't make it any better, when it's no-where written down that
Xen is unusable with NTP running in all guests (I'm exaggerating
here just to get the point over). Don't forget that e.g. with
XenoLinux'es independent-wallclock setting defaulting to false, we've
been suggesting that people _don't_ need to use NTP inside their
(admittedly PV) guests. IOW - your change may not break people
not using NTP. Hence I don't think the mode you introduce can be
a default-on one, which in turn means a per-domain or at least
global enable control is needed (as over previous iterations we
seem to have been agreeing).

Jan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V11 4/5] p2m: Always use hostp2m when clipping rangesets

2018-12-13 Thread Razvan Cojocaru

On 12/5/18 6:30 PM, Jan Beulich wrote:
 On 05.12.18 at 10:18,  wrote:
>> --- a/xen/arch/x86/mm/p2m.c
>> +++ b/xen/arch/x86/mm/p2m.c
>> @@ -1002,30 +1002,43 @@ int p2m_change_type_one(struct domain *d, unsigned 
>> long gfn_l,
>>  return rc;
>>  }
>>  
>> -/* Modify the p2m type of a range of gfns from ot to nt. */
>> +/* Modify the p2m type of [start, end) from ot to nt. */
>>  static void change_type_range(struct p2m_domain *p2m,
>>unsigned long start, unsigned long end,
>>p2m_type_t ot, p2m_type_t nt)
>>  {
>> -unsigned long gfn = start;
>>  struct domain *d = p2m->domain;
>> +const unsigned long host_max_pfn = p2m_get_hostp2m(d)->max_mapped_pfn;
>>  int rc = 0;
>>  
>> -if ( unlikely(end > p2m->max_mapped_pfn) )
>> -{
>> -if ( !gfn )
>> -{
>> -p2m->change_entry_type_global(p2m, ot, nt);
>> -gfn = end;
>> -}
>> -end = p2m->max_mapped_pfn + 1;
>> -}
>> -if ( gfn < end )
>> -rc = p2m->change_entry_type_range(p2m, ot, nt, gfn, end - 1);
>> +--end;
>> +
>> +if ( start >= host_max_pfn )
>> +printk(XENLOG_G_WARNING "Dom%d logdirty rangeset clipped to 
>> max_mapped_pfn\n",
>> +   d->domain_id);
>> +
>> +/* Always clip the rangeset down to the host p2m. */
>> +if ( unlikely(end > host_max_pfn) )
>> +end = host_max_pfn;
>> +
>> +/* If the requested range is out of scope, return doing nothing. */
>> +if ( start > end )
>> +return;
> 
> My prior comment remains: Even if there's no change in behavior
> (and you avoid the assertion), especially due to the comment the
> impression results (at least to me) that all is well here, when it
> really is a (latent) bug to "do nothing" in this case. George, so far
> this was a discussion between Razvan and me - do you have an
> opinion either way here?

Obviously I can't speak for George, but to reiterate my previous
analysis, it looks like this patch has added the clipping:

http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=437f54d3a33d3787a7cc485eb2b3451e8be49ca7

and this patch has added the global_logdirty ranges code:

http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=90ac32559bfbd08127638ba13f99b5ed565cfc2b

but left the clipping code in (possibly accidentally). You may have some
insight into that, being their author, although it's been a few years
since then.

For my own part, I see no reason why not clipping end should not work
when updating the ranges only (as long as start continues to be <=
unclipped_end).

Would that modification + testing of it help this series continue?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 10:14,  wrote:
> On Thu, Dec 13, 2018 at 12:45:07AM -0700, Jan Beulich wrote:
>> >>> On 12.12.18 at 18:05,  wrote:
>> > On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote:
>> >> The MMIO side of things of course still remains unclear.
>> > 
>> > Right, for the MMIO and the handling of grant and foreign mappings it's
>> > not clear how we want to proceed.
>> > 
>> > Maybe account for all host RAM (total_pages) plus MMIO BARs?
>> 
>> Well, I thought we've already settled on it being impossible to
>> account for all MMIO BARs at this point.
> 
> Well, I could iterate over all the registered PCI devices and size
> the BARs (without VF BARs at least initially). This is quite
> cumbersome, my other option would be using max_page and hope that
> there are enough holes to make up for BAR MMIO regions.

Well, maybe we could live with this for now. I certainly would
prefer to have a 3rd opinion though, as I continue to feel uneasy
with this rather imprecise estimation (i.e. I'd much prefer a more
dynamic / on-demand approach).

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/3] drm/xen-front: Use Xen common shared buffer implementation

2018-12-13 Thread Oleksandr Andrushchenko


bump

On 12/5/18 10:20 AM, Oleksandr Andrushchenko wrote:

Hello, Daniel!

Could you please ack/nack the patch, so either we can merge the

series or I can address your comments if any

Thank you,

Oleksandr

On 11/30/18 9:42 AM, Oleksandr Andrushchenko wrote:

From: Oleksandr Andrushchenko 

Use page directory based shared buffer implementation
now available as common code for Xen frontend drivers.

Remove flushing of shared buffer on page flip as this
workaround needs a proper fix.

Signed-off-by: Oleksandr Andrushchenko 


---
  drivers/gpu/drm/xen/Kconfig   |   1 +
  drivers/gpu/drm/xen/Makefile  |   1 -
  drivers/gpu/drm/xen/xen_drm_front.c   |  65 ++--
  drivers/gpu/drm/xen/xen_drm_front_gem.c   |   1 -
  drivers/gpu/drm/xen/xen_drm_front_shbuf.c | 414 --
  drivers/gpu/drm/xen/xen_drm_front_shbuf.h |  64 
  6 files changed, 26 insertions(+), 520 deletions(-)
  delete mode 100644 drivers/gpu/drm/xen/xen_drm_front_shbuf.c
  delete mode 100644 drivers/gpu/drm/xen/xen_drm_front_shbuf.h

diff --git a/drivers/gpu/drm/xen/Kconfig b/drivers/gpu/drm/xen/Kconfig
index 4cca160782ab..f969d486855d 100644
--- a/drivers/gpu/drm/xen/Kconfig
+++ b/drivers/gpu/drm/xen/Kconfig
@@ -12,6 +12,7 @@ config DRM_XEN_FRONTEND
  select DRM_KMS_HELPER
  select VIDEOMODE_HELPERS
  select XEN_XENBUS_FRONTEND
+    select XEN_FRONT_PGDIR_SHBUF
  help
    Choose this option if you want to enable a para-virtualized
    frontend DRM/KMS driver for Xen guest OSes.
diff --git a/drivers/gpu/drm/xen/Makefile b/drivers/gpu/drm/xen/Makefile
index 712afff5ffc3..825905f67faa 100644
--- a/drivers/gpu/drm/xen/Makefile
+++ b/drivers/gpu/drm/xen/Makefile
@@ -4,7 +4,6 @@ drm_xen_front-objs := xen_drm_front.o \
    xen_drm_front_kms.o \
    xen_drm_front_conn.o \
    xen_drm_front_evtchnl.o \
-  xen_drm_front_shbuf.o \
    xen_drm_front_cfg.o \
    xen_drm_front_gem.o
  diff --git a/drivers/gpu/drm/xen/xen_drm_front.c 
b/drivers/gpu/drm/xen/xen_drm_front.c

index 6b6d5ab82ec3..4d3d36fc3a5d 100644
--- a/drivers/gpu/drm/xen/xen_drm_front.c
+++ b/drivers/gpu/drm/xen/xen_drm_front.c
@@ -19,6 +19,7 @@
  #include 
  #include 
  +#include 
  #include 
    #include "xen_drm_front.h"
@@ -26,28 +27,20 @@
  #include "xen_drm_front_evtchnl.h"
  #include "xen_drm_front_gem.h"
  #include "xen_drm_front_kms.h"
-#include "xen_drm_front_shbuf.h"
    struct xen_drm_front_dbuf {
  struct list_head list;
  u64 dbuf_cookie;
  u64 fb_cookie;
-    struct xen_drm_front_shbuf *shbuf;
+
+    struct xen_front_pgdir_shbuf shbuf;
  };
  -static int dbuf_add_to_list(struct xen_drm_front_info *front_info,
-    struct xen_drm_front_shbuf *shbuf, u64 dbuf_cookie)
+static void dbuf_add_to_list(struct xen_drm_front_info *front_info,
+ struct xen_drm_front_dbuf *dbuf, u64 dbuf_cookie)
  {
-    struct xen_drm_front_dbuf *dbuf;
-
-    dbuf = kzalloc(sizeof(*dbuf), GFP_KERNEL);
-    if (!dbuf)
-    return -ENOMEM;
-
  dbuf->dbuf_cookie = dbuf_cookie;
-    dbuf->shbuf = shbuf;
  list_add(>list, _info->dbuf_list);
-    return 0;
  }
    static struct xen_drm_front_dbuf *dbuf_get(struct list_head 
*dbuf_list,
@@ -62,15 +55,6 @@ static struct xen_drm_front_dbuf *dbuf_get(struct 
list_head *dbuf_list,

  return NULL;
  }
  -static void dbuf_flush_fb(struct list_head *dbuf_list, u64 fb_cookie)
-{
-    struct xen_drm_front_dbuf *buf, *q;
-
-    list_for_each_entry_safe(buf, q, dbuf_list, list)
-    if (buf->fb_cookie == fb_cookie)
-    xen_drm_front_shbuf_flush(buf->shbuf);
-}
-
  static void dbuf_free(struct list_head *dbuf_list, u64 dbuf_cookie)
  {
  struct xen_drm_front_dbuf *buf, *q;
@@ -78,8 +62,8 @@ static void dbuf_free(struct list_head *dbuf_list, 
u64 dbuf_cookie)

  list_for_each_entry_safe(buf, q, dbuf_list, list)
  if (buf->dbuf_cookie == dbuf_cookie) {
  list_del(>list);
-    xen_drm_front_shbuf_unmap(buf->shbuf);
-    xen_drm_front_shbuf_free(buf->shbuf);
+    xen_front_pgdir_shbuf_unmap(>shbuf);
+    xen_front_pgdir_shbuf_free(>shbuf);
  kfree(buf);
  break;
  }
@@ -91,8 +75,8 @@ static void dbuf_free_all(struct list_head *dbuf_list)
    list_for_each_entry_safe(buf, q, dbuf_list, list) {
  list_del(>list);
-    xen_drm_front_shbuf_unmap(buf->shbuf);
-    xen_drm_front_shbuf_free(buf->shbuf);
+    xen_front_pgdir_shbuf_unmap(>shbuf);
+    xen_front_pgdir_shbuf_free(>shbuf);
  kfree(buf);
  }
  }
@@ -171,9 +155,9 @@ int xen_drm_front_dbuf_create(struct 
xen_drm_front_info *front_info,

    u32 bpp, u64 size, struct page **pages)
  {
  struct xen_drm_front_evtchnl *evtchnl;
-    struct xen_drm_front_shbuf *shbuf;
+    struct xen_drm_front_dbuf *dbuf;
  struct xendispl_req *req;
-    struct xen_drm_front_shbuf_cfg

[Xen-devel] [xen-4.10-testing test] 131257: regressions - FAIL

2018-12-13 Thread osstest service owner

flight 131257 xen-4.10-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/131257/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stopfail REGR. vs. 129676

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemut-ws16-amd64 16 guest-localmigrate/x10 fail pass in 
131223

Tests which did not succeed, but are not blocking:
 test-xtf-amd64-amd64-1   69 xtf/test-hvm64-xsa-278  fail blocked in 129676
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop   fail in 131223 never pass
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail never pass
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install 
fail never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 xen  b6e203bc80e9d3e1dc7eb579d9665a77700d78cc
baseline version:
 xen  e907460fd61c350487ffee5d8aa375bef56bc81c

Last test of basis   129676  2018-11-09 01:56:32 Z   34 days
Testing same since   130611  2018-11-20 15:07:52 Z   22 days   13 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Roger Pau Monné 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass

Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function

2018-12-13 Thread Roger Pau Monné

On Thu, Dec 13, 2018 at 12:45:07AM -0700, Jan Beulich wrote:
> >>> On 12.12.18 at 18:05,  wrote:
> > On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote:
> >> The MMIO side of things of course still remains unclear.
> > 
> > Right, for the MMIO and the handling of grant and foreign mappings it's
> > not clear how we want to proceed.
> > 
> > Maybe account for all host RAM (total_pages) plus MMIO BARs?
> 
> Well, I thought we've already settled on it being impossible to
> account for all MMIO BARs at this point.

Well, I could iterate over all the registered PCI devices and size
the BARs (without VF BARs at least initially). This is quite
cumbersome, my other option would be using max_page and hope that
there are enough holes to make up for BAR MMIO regions.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation

2018-12-13 Thread Olaf Hering

Am Thu, 13 Dec 2018 01:45:39 -0700
schrieb "Jan Beulich" :

> >>> On 13.12.18 at 09:18,  wrote:  
> > Am Wed, 12 Dec 2018 09:39:25 -0700
> > schrieb "Jan Beulich" :
> >   
> >> >>> On 12.12.18 at 16:20,  wrote:
> >> > If a domU uses TSC as clocksoure it also must run NTP in some way to
> >> > avoid the potential drift what will most likely happen, independent of
> >> > any migration.
> >> Which drift? While anyone's well advised to run NTP, a completely
> >> isolated set of systems may have no need to, if their interactions don't
> >> depend on exactly matching time.  
> > 
> > If these hosts do not sync time to some reference host, the advancing of 
> > time
> > is undefined before and after my change.  
> 
> I'm lost. I simply don't understand what you're trying to tell me,
> or how your answer relates to my question.

Then please rephrase the question? I do not see how my path affects the
advancing of time in domUs running on isolated systems. If their hardware
clocks advance at the same speed, my patch will not affect it. And if they
do advance at a different speed, what can emulation do to help with 
"correctness"?

Olaf


pgpHzXPq_VPK0.pgp
Description: Digitale Signatur von OpenPGP
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware

2018-12-13 Thread Tian, Kevin

> From: Roger Pau Monné [mailto:roger@citrix.com]
> Sent: Thursday, December 13, 2018 4:40 PM
> 
> On Thu, Dec 13, 2018 at 02:44:00AM +, Tian, Kevin wrote:
> > btw can you also capture ISR/IRR/PPR right before local_irq_enable()?
> > though I didn't see a reason why code in-between may impact those
> > bits, it doesn't hurt to capture the context right before interrupt is
> > raised. :-)
> 
> I've done that and the result is the same as the ones that are
> currently printed on the trace, there's no change to the registers
> between the point where they are printed and the call to
> local_irq_enable.
> 

Thanks for confirmation.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware

2018-12-13 Thread Tian, Kevin

> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, December 13, 2018 4:37 PM
> 
> >>> On 13.12.18 at 02:28,  wrote:
> > btw I checked your original mail:
> >
> > (XEN)[] mwait-idle.c#mwait_idle+0x2a5/0x381
> > xen/arch/x86/cpu/mwait-idle.c:802
> >
> >788  if (cpu_is_haltable(cpu))
> >789  mwait_idle_with_hints(eax,
> MWAIT_ECX_INTERRUPT_BREAK);
> >790
> >791  after = cpuidle_get_tick();
> >792
> >793  cstate_restore_tsc();
> >794  trace_exit_reason(irq_traced);
> >795  TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after,
> >796  irq_traced[0], irq_traced[1], irq_traced[2],
> irq_traced[3]);
> >797
> >798  /* Now back in C0. */
> >799  update_idle_stats(power, cx, before, after);
> >800  local_irq_enable();
> >801
> > -> 802  if (!(lapic_timer_reliable_states & (1 << cstate)))
> >803  lapic_timer_on();
> >804
> >805  sched_tick_resume();
> >806  cpufreq_dbs_timer_resume();
> >
> > Looks above code is different from staging:
> >
> > acpi_processor_idle:
> > acpi_idle_do_entry:
> > acpi_processor_ffh_cstate_enter:
> > mwait_idle_with_hints
> >
> > there is no mwait_idle alone. and even with compiler optimization I didn't
> > find code sequence like above...
> 
> You're looking at two entirely different code paths, only one of
> which can be in use in any particular case: Either the idle
> entering routine used is acpi_processor_idle(), or (when the
> processor is supported by that driver code) it is mwait_idle().
> See mwait_idle_init() for when the latter gets used; the former
> may get installed at the point the Dom0 kernel reports ACPI
> C-state data (and only when mwait_idle() is not in use).
> 

yes, I missed the other path.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware

2018-12-13 Thread Roger Pau Monné

On Thu, Dec 13, 2018 at 01:28:23AM +, Tian, Kevin wrote:
> > From: Roger Pau Monné [mailto:roger@citrix.com]
> > Sent: Wednesday, December 12, 2018 8:18 PM
> > 
> > On Wed, Dec 12, 2018 at 11:48:52AM +, Tian, Kevin wrote:
> > > > From: Roger Pau Monné [mailto:roger@citrix.com]
> > > > Sent: Wednesday, December 12, 2018 7:25 PM
> > > >
> > > > On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote:
> > > > > > From: Roger Pau Monné [mailto:roger@citrix.com]
> > > > > > Sent: Monday, October 15, 2018 6:30 PM
> > > > > > (XEN)   [22642] POWERTYPE 4
> > > > > > (XEN)   [22643] IDLE PPR 0x0020
> > > > > > (XEN)IRR
> > > > > >
> > > >
> > 00
> > > > > > 00
> > > > > > (XEN)ISR
> > > > > >
> > > >
> > 02
> > > > > > 00
> > > > > > (XEN)   [22644] WAKE PPR 0x0020
> > > > > > (XEN)IRR
> > > > > >
> > > >
> > 02
> > > > > > 00
> > > > > > (XEN)ISR
> > > > > >
> > > >
> > 02
> > > > > > 00
> > > > >
> > > > > looks pending IRR (0x21) doesn't always trigger a spurious interrupt?
> > > >
> > > > Yes, that's correct. Having a pending IRR and going idle doesn't
> > > > always trigger the spurious interrupt re-injection.
> > > >
> > > > > is it a fixed pattern after how many rounds of Cstate enter/exit with
> > > > > pending IRR(0x21) then you see assertion happened (in this example
> > > > > it happens at 3rd time)?
> > > >
> > > > It's not a fixed pattern, here's another trace with IRR(0x21) being
> > > > pending just once during the Cstate transitions:
> > >
> > > did you observe a case where such asset may occur when IRR(0x21)
> > > is cleared but ISR (0x21) is set?
> > 
> > No, I've always seen both ISR and IRR set when the interrupt injection
> > happens. This of course doesn't mean it's not possible, but I have not
> > seen any trace with ISR(0x21) set and IRR(0x21) clear.
> > 
> 
> sorry but let me double confirm. You always see ISR[21]/IRR[21] being
> set "before and after entering C3" to hit the problem, right?

Yes, that's correct.

> When 
> interrupt injection happens later, ISR[21] is set but IRR[21] is cleared (as 
> expected for normal interrupt delivery process).
> 
> btw I checked your original mail:
> 
> (XEN)[] mwait-idle.c#mwait_idle+0x2a5/0x381
> xen/arch/x86/cpu/mwait-idle.c:802
> 
>788if (cpu_is_haltable(cpu))
>789mwait_idle_with_hints(eax, 
> MWAIT_ECX_INTERRUPT_BREAK);
>790
>791after = cpuidle_get_tick();
>792
>793cstate_restore_tsc();
>794trace_exit_reason(irq_traced);
>795TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after,
>796irq_traced[0], irq_traced[1], irq_traced[2], 
> irq_traced[3]);
>797
>798/* Now back in C0. */
>799update_idle_stats(power, cx, before, after);
>800local_irq_enable();
>801
> -> 802if (!(lapic_timer_reliable_states & (1 << cstate)))
>803lapic_timer_on();
>804
>805sched_tick_resume();
>806cpufreq_dbs_timer_resume();
> 
> Looks above code is different from staging:

The code matches staging at the point where I posted the original bug
report, 2 months ago.

> acpi_processor_idle:
>   acpi_idle_do_entry:
>   acpi_processor_ffh_cstate_enter:
>   mwait_idle_with_hints

There's another caller of mwait_idle_with_hints, the mwait_idle
function. This gets setup by mwait_idle_init.

> there is no mwait_idle alone.

There's a mwait_idle in current staging code, check:

xen/arch/x86/cpu/mwait-idle.c:718

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/cpu/mwait-idle.c;h=f89c52f2565e8deb7d4aec309124fe6fbd57b27d;hb=refs/heads/staging#l718

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 09:18,  wrote:
> Am Wed, 12 Dec 2018 09:39:25 -0700
> schrieb "Jan Beulich" :
> 
>> >>> On 12.12.18 at 16:20,  wrote:  
>> > If a domU uses TSC as clocksoure it also must run NTP in some way to
>> > avoid the potential drift what will most likely happen, independent of
>> > any migration.  
>> Which drift? While anyone's well advised to run NTP, a completely
>> isolated set of systems may have no need to, if their interactions don't
>> depend on exactly matching time.
> 
> If these hosts do not sync time to some reference host, the advancing of time
> is undefined before and after my change.

I'm lost. I simply don't understand what you're trying to tell me,
or how your answer relates to my question.

>> > The calculation of the drift is based on the time
>> > returned by remote servers versus how fast the local clock advances. NTP
>> > can handle a drift up to 500PPM. This means the local clocksource can
>> > run up to 500us slower or faster. This calculation is based on the TSC
>> > frequency of the host where the domU was started. Once a domU is
>> > migrated to a host of a different class, like from "2.3GHz" to "2.4GHz",
>> > the TSC frequency changes, but the domU kernel may not recalibrate
>> > itself.  
>> 
>> That's why we switch to emulated (or hardware scaling) mode in that
>> case. It's anyway not really clear to me what this entire ...
>> 
>> > As a result, the drift will be larger and might be outside of
>> > the 500 PPM range. In addition, the kernel may notice the change of
>> > speed in which the TSC advances and could change the clocksource. All
>> > this depends of course on the type of OS that is running in the domU.  
>> 
>> ... (up to here) paragraph is supposed to tell the reader.
>> 
>> > @@ -1885,6 +1888,16 @@ void __init early_time_init(void)
>> >  printk("Detected %lu.%03lu MHz processor.\n", 
>> > cpu_khz / 1000, cpu_khz % 1000);
>> >  
>> > +tmp = 1000 * 1000;
>> > +tmp += VTSC_NTP_PPM_TOLERANCE;
>> > +tmp *= cpu_khz;
>> > +tmp /= 1000 * 1000;
>> > +tmp -= cpu_khz;
>> > +if (tmp >= VTSC_JITTER_RANGE_KHZ)
>> > +tmp -= VTSC_JITTER_RANGE_KHZ;  
>> 
>> Besides the style issue in the if() - how can this be correct? This
>> clearly introduces a discontinuity (just consider the case where
>> tmp is exactly VTSC_JITTER_RANGE_KHZ before the if()). And
>> I also can't see how it guarantees the resulting value to be
>> below VTSC_JITTER_RANGE_KHZ. Did you perhaps mean to
>> cap the value (i.e. = instead of -= )?
> 
> This is supposed to make sure the value of Hz for 500us is always larger
> than the assumed jitter. So for a 2GHz host, with a theoretical tolerance
> of 1000, the actual tolerance is set to 800. tmp will be larger than the
> assumed jitter for cpu_khz > 400.

Again you don't appear to answer my question regarding the
discontinuity you introduce.

>> > +vtsc_tolerance_khz = (unsigned int)tmp;  
>> Stray cast.
> 
> Copy from the cpu_khz assignment, perhaps a remainder from i386 
> support?

Copy-and-paste is an explanation but not an excuse. Style
violations should not be cloned and thus furthered.

>> > +printk("Tolerating vtsc jitter for domUs: %u kHz.\n", 
> vtsc_tolerance_khz);  
>> Please omit the full stop; the printk() in context above shouldn't
>> have one either.
> 
> You mean the trailing dot, or what means "full stop" in this context?

Yes, "full stop" means the final period in a sentence.

>> > +disable_vtsc = khz_diff <= vtsc_tolerance_khz;
>> > +
>> > +printk(XENLOG_G_INFO "d%d: host has %lu kHz,"
>> > +   " domU expects %u kHz,"
>> > +   " difference of %ld is %s tolerance of %u\n",
>> > +   d->domain_id, cpu_khz, gtsc_khz, khz_diff,
>> > +   disable_vtsc ? "within" : "outside",
>> > +   vtsc_tolerance_khz);
>> > +}
>> > +
>> >  if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
>> > - (d->arch.tsc_khz == cpu_khz ||
>> > + (disable_vtsc ||
>> >(is_hvm_domain(d) &&
>> > hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
>> >  {  
>> 
>> In any event I don't follow why all of the sudden this becomes
>> an always-on mode, with not even a boot command line override.
> 
> Perhaps I failed to explain why there is no need to make this a knob.
> 
> If a domU uses TSC as its timesource, and if it also uses NTP to make
> sure the time advances correctly, then this change will make sure the
> advancing of time will be withing the bounds that NTP can correct.
> If it does use TSC, but does not use NTP then the advancing will be
> undefined before and after my change.

As per above - I'm lost. I simply don't understand. All I notice is that
you talk about one specific use case of the TSC in a guest, without
(apparently) considering uses for other than what Linux calls its
clocksource. I in particular don't understand

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware

2018-12-13 Thread Roger Pau Monné

On Thu, Dec 13, 2018 at 02:44:00AM +, Tian, Kevin wrote:
> btw can you also capture ISR/IRR/PPR right before local_irq_enable()?
> though I didn't see a reason why code in-between may impact those 
> bits, it doesn't hurt to capture the context right before interrupt is
> raised. :-)

I've done that and the result is the same as the ones that are
currently printed on the trace, there's no change to the registers
between the point where they are printed and the call to
local_irq_enable.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware

2018-12-13 Thread Jan Beulich

>>> On 13.12.18 at 02:28,  wrote:
> btw I checked your original mail:
> 
> (XEN)[] mwait-idle.c#mwait_idle+0x2a5/0x381
> xen/arch/x86/cpu/mwait-idle.c:802
> 
>788if (cpu_is_haltable(cpu))
>789mwait_idle_with_hints(eax, 
> MWAIT_ECX_INTERRUPT_BREAK);
>790
>791after = cpuidle_get_tick();
>792
>793cstate_restore_tsc();
>794trace_exit_reason(irq_traced);
>795TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after,
>796irq_traced[0], irq_traced[1], irq_traced[2], 
> irq_traced[3]);
>797
>798/* Now back in C0. */
>799update_idle_stats(power, cx, before, after);
>800local_irq_enable();
>801
> -> 802if (!(lapic_timer_reliable_states & (1 << cstate)))
>803lapic_timer_on();
>804
>805sched_tick_resume();
>806cpufreq_dbs_timer_resume();
> 
> Looks above code is different from staging:
> 
> acpi_processor_idle:
>   acpi_idle_do_entry:
>   acpi_processor_ffh_cstate_enter:
>   mwait_idle_with_hints
> 
> there is no mwait_idle alone. and even with compiler optimization I didn't
> find code sequence like above...

You're looking at two entirely different code paths, only one of
which can be in use in any particular case: Either the idle
entering routine used is acpi_processor_idle(), or (when the
processor is supported by that driver code) it is mwait_idle().
See mwait_idle_init() for when the latter gets used; the former
may get installed at the point the Dom0 kernel reports ACPI
C-state data (and only when mwait_idle() is not in use).

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [linux-3.18 bisection] complete test-amd64-amd64-pair

2018-12-13 Thread Jan Beulich

>>> On 12.12.18 at 22:41,  wrote:
> branch xen-unstable
> xenbranch xen-unstable
> job test-amd64-amd64-pair
> testid xen-boot/src_host
> 
> Tree: linux 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git
> Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git
> Tree: qemuu git://xenbits.xen.org/qemu-xen.git
> Tree: xen git://xenbits.xen.org/xen.git
> 
> *** Found and reproduced problem changeset ***
> 
>   Bug is in tree:  linux 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
>   Bug introduced:  7b8052e19304865477e03a0047062d977309a22f
>   Bug not present: d255d18a34a8d53ccc4a019dc07e17b6e8cf6bd1
>   Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/131278/ 
> 
> 
>   commit 7b8052e19304865477e03a0047062d977309a22f
>   Author: Jan Beulich 
>   Date:   Mon Oct 19 04:23:29 2015 -0600
>   
>   igb: fix NULL derefs due to skipped SR-IOV enabling

_Very_ interesting. An over three years old commit was determined
to cause whatever regression it is. But wait - that's the date of the
mainline commit, not that of the backport (which was done a month
ago). I notice that of the two original commits the combination of
which the one here is supposed to fix, only one actually got
backported. Hence I wonder whether backporting the one here
was actually appropriate.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation

2018-12-13 Thread Olaf Hering

Am Wed, 12 Dec 2018 09:39:25 -0700
schrieb "Jan Beulich" :

> >>> On 12.12.18 at 16:20,  wrote:  
> > If a domU uses TSC as clocksoure it also must run NTP in some way to
> > avoid the potential drift what will most likely happen, independent of
> > any migration.  
> Which drift? While anyone's well advised to run NTP, a completely
> isolated set of systems may have no need to, if their interactions don't
> depend on exactly matching time.

If these hosts do not sync time to some reference host, the advancing of time
is undefined before and after my change.

> > The calculation of the drift is based on the time
> > returned by remote servers versus how fast the local clock advances. NTP
> > can handle a drift up to 500PPM. This means the local clocksource can
> > run up to 500us slower or faster. This calculation is based on the TSC
> > frequency of the host where the domU was started. Once a domU is
> > migrated to a host of a different class, like from "2.3GHz" to "2.4GHz",
> > the TSC frequency changes, but the domU kernel may not recalibrate
> > itself.  
> 
> That's why we switch to emulated (or hardware scaling) mode in that
> case. It's anyway not really clear to me what this entire ...
> 
> > As a result, the drift will be larger and might be outside of
> > the 500 PPM range. In addition, the kernel may notice the change of
> > speed in which the TSC advances and could change the clocksource. All
> > this depends of course on the type of OS that is running in the domU.  
> 
> ... (up to here) paragraph is supposed to tell the reader.
> 
> > @@ -1885,6 +1888,16 @@ void __init early_time_init(void)
> >  printk("Detected %lu.%03lu MHz processor.\n", 
> > cpu_khz / 1000, cpu_khz % 1000);
> >  
> > +tmp = 1000 * 1000;
> > +tmp += VTSC_NTP_PPM_TOLERANCE;
> > +tmp *= cpu_khz;
> > +tmp /= 1000 * 1000;
> > +tmp -= cpu_khz;
> > +if (tmp >= VTSC_JITTER_RANGE_KHZ)
> > +tmp -= VTSC_JITTER_RANGE_KHZ;  
> 
> Besides the style issue in the if() - how can this be correct? This
> clearly introduces a discontinuity (just consider the case where
> tmp is exactly VTSC_JITTER_RANGE_KHZ before the if()). And
> I also can't see how it guarantees the resulting value to be
> below VTSC_JITTER_RANGE_KHZ. Did you perhaps mean to
> cap the value (i.e. = instead of -= )?

This is supposed to make sure the value of Hz for 500us is always larger
than the assumed jitter. So for a 2GHz host, with a theoretical tolerance
of 1000, the actual tolerance is set to 800. tmp will be larger than the
assumed jitter for cpu_khz > 400.

> > +vtsc_tolerance_khz = (unsigned int)tmp;  
> Stray cast.

Copy from the cpu_khz assignment, perhaps a remainder from i386 support?

> > +printk("Tolerating vtsc jitter for domUs: %u kHz.\n", 
> > vtsc_tolerance_khz);  
> Please omit the full stop; the printk() in context above shouldn't
> have one either.

You mean the trailing dot, or what means "full stop" in this context?

> > +disable_vtsc = khz_diff <= vtsc_tolerance_khz;
> > +
> > +printk(XENLOG_G_INFO "d%d: host has %lu kHz,"
> > +   " domU expects %u kHz,"
> > +   " difference of %ld is %s tolerance of %u\n",
> > +   d->domain_id, cpu_khz, gtsc_khz, khz_diff,
> > +   disable_vtsc ? "within" : "outside",
> > +   vtsc_tolerance_khz);
> > +}
> > +
> >  if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() &&
> > - (d->arch.tsc_khz == cpu_khz ||
> > + (disable_vtsc ||
> >(is_hvm_domain(d) &&
> > hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) )
> >  {  
> 
> In any event I don't follow why all of the sudden this becomes
> an always-on mode, with not even a boot command line override.

Perhaps I failed to explain why there is no need to make this a knob.

If a domU uses TSC as its timesource, and if it also uses NTP to make
sure the time advances correctly, then this change will make sure the
advancing of time will be withing the bounds that NTP can correct.
If it does use TSC, but does not use NTP then the advancing will be
undefined before and after my change.


Olaf


pgpYPccpEZ4nz.pgp
Description: Digitale Signatur von OpenPGP
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events

2018-12-13 Thread Razvan Cojocaru

On 12/13/18 8:54 AM, Tian, Kevin wrote:
>> From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com]
>> Sent: Tuesday, December 11, 2018 8:33 PM
>>
>>> In any case, I think you want to rename the function and/or document
>>> more that expected behavior.
>>
>> You're right, I should probably rename that function / variable to
>> better reflect what it signifies - that sync vm_event processing is in
>> progress. For VMX and SVM, that simply means that interrupts will be
>> blocked, and the value of the variable will be correct and possibly
>> useful for ARM as well.
>>
> 
> what about vm_event_block_interrupt_injection? in that case
> it's injection instead of interrupt itself being blocked. blocking
> injection should mean same thing cross archs?

Of course, if Julien agrees with the change I'll rename it as suggested.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

80 matches

Mail list logo