Re: [Xen-devel] [PATCH v3 1/1] cameraif: add ABI for para-virtual camera
PFA the diff between v2 and v3 for your convenience On 12/12/18 11:49 AM, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko This is the ABI for the two halves of a para-virtualized camera driver which extends Xen's reach multimedia capabilities even farther enabling it for video conferencing, In-Vehicle Infotainment, high definition maps etc. The initial goal is to support most needed functionality with the final idea to make it possible to extend the protocol if need be: 1. Provide means for base virtual device configuration: - pixel formats - resolutions - frame rates 2. Support basic camera controls: - contrast - brightness - hue - saturation 3. Support streaming control Signed-off-by: Oleksandr Andrushchenko --- xen/include/public/io/cameraif.h | 1374 ++ 1 file changed, 1374 insertions(+) create mode 100644 xen/include/public/io/cameraif.h diff --git a/xen/include/public/io/cameraif.h b/xen/include/public/io/cameraif.h new file mode 100644 index ..9aae0f47743b --- /dev/null +++ b/xen/include/public/io/cameraif.h @@ -0,0 +1,1374 @@ +/** + * cameraif.h + * + * Unified camera device I/O interface for Xen guest OSes. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (C) 2018 EPAM Systems Inc. + * + * Author: Oleksandr Andrushchenko + */ + +#ifndef __XEN_PUBLIC_IO_CAMERAIF_H__ +#define __XEN_PUBLIC_IO_CAMERAIF_H__ + +#include "ring.h" +#include "../grant_table.h" + +/* + ** + * Protocol version + ** + */ +#define XENCAMERA_PROTOCOL_VERSION "1" + +/* + ** + * Feature and Parameter Negotiation + ** + * + * Front->back notifications: when enqueuing a new request, sending a + * notification can be made conditional on xencamera_req (i.e., the generic + * hold-off mechanism provided by the ring macros). Backends must set + * xencamera_req appropriately (e.g., using RING_FINAL_CHECK_FOR_REQUESTS()). + * + * Back->front notifications: when enqueuing a new response, sending a + * notification can be made conditional on xencamera_resp (i.e., the generic + * hold-off mechanism provided by the ring macros). Frontends must set + * xencamera_resp appropriately (e.g., using RING_FINAL_CHECK_FOR_RESPONSES()). + * + * The two halves of a para-virtual camera driver utilize nodes within + * XenStore to communicate capabilities and to negotiate operating parameters. + * This section enumerates these nodes which reside in the respective front and + * backend portions of XenStore, following the XenBus convention. + * + * All data in XenStore is stored as strings. Nodes specifying numeric + * values are encoded in decimal. Integer value ranges listed below are + * expressed as fixed sized integer types capable of storing the conversion + * of a properly formatted node string, without loss of information. + * + ** + *Example configuration + ** + * + * This is an example of backend and frontend configuration: + * + *- Backend --- + * + * /local/domain/0/backend/vcamera/1/0/frontend-id = "1" + * /local/domain/0/backend/vcamera/1/0/frontend = "/local/domain/1/device/vcamera/0" + * /local/domain/0/backend/vcamera/1/0/state = "4" + * /local/domain/0/backend/vcamera/1/0/versions = "1,2" + * + *- Frontend
Re: [Xen-devel] [PATCH] drm/xen-front: Make shmem backed display buffer coherent
On 12/13/18 5:48 PM, Daniel Vetter wrote: On Thu, Dec 13, 2018 at 12:17:54PM +0200, Oleksandr Andrushchenko wrote: Daniel, could you please comment? Cross-revieweing someone else's stuff would scale better, fair enough I don't think I'll get around to anything before next year. I put you on CC explicitly because you had comments on other patch [1] and this one tries to solve the issue raised (I tried to figure out at [2] if this is the way to go, but it seems I have no alternative here). While at it [3] (I hope) addresses your comments and the series just needs your single ack/nack to get in: all the rest ack/r-b are already there. Do you mind looking at it? -Daniel Thank you very much for your time, Oleksandr Thank you On 11/27/18 12:32 PM, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko When GEM backing storage is allocated with drm_gem_get_pages the backing pages may be cached, thus making it possible that the backend sees only partial content of the buffer which may lead to screen artifacts. Make sure that the frontend's memory is coherent and the backend always sees correct display buffer content. Fixes: c575b7eeb89f ("drm/xen-front: Add support for Xen PV display frontend") Signed-off-by: Oleksandr Andrushchenko --- drivers/gpu/drm/xen/xen_drm_front_gem.c | 62 +++-- 1 file changed, 48 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c b/drivers/gpu/drm/xen/xen_drm_front_gem.c index 47ff019d3aef..c592735e49d2 100644 --- a/drivers/gpu/drm/xen/xen_drm_front_gem.c +++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c @@ -33,8 +33,11 @@ struct xen_gem_object { /* set for buffers allocated by the backend */ bool be_alloc; - /* this is for imported PRIME buffer */ - struct sg_table *sgt_imported; + /* +* this is for imported PRIME buffer or the one allocated via +* drm_gem_get_pages. +*/ + struct sg_table *sgt; }; static inline struct xen_gem_object * @@ -77,10 +80,21 @@ static struct xen_gem_object *gem_create_obj(struct drm_device *dev, return xen_obj; } +struct sg_table *xen_drm_front_gem_get_sg_table(struct drm_gem_object *gem_obj) +{ + struct xen_gem_object *xen_obj = to_xen_gem_obj(gem_obj); + + if (!xen_obj->pages) + return ERR_PTR(-ENOMEM); + + return drm_prime_pages_to_sg(xen_obj->pages, xen_obj->num_pages); +} + static struct xen_gem_object *gem_create(struct drm_device *dev, size_t size) { struct xen_drm_front_drm_info *drm_info = dev->dev_private; struct xen_gem_object *xen_obj; + struct address_space *mapping; int ret; size = round_up(size, PAGE_SIZE); @@ -113,10 +127,14 @@ static struct xen_gem_object *gem_create(struct drm_device *dev, size_t size) xen_obj->be_alloc = true; return xen_obj; } + /* * need to allocate backing pages now, so we can share those * with the backend */ + mapping = xen_obj->base.filp->f_mapping; + mapping_set_gfp_mask(mapping, GFP_USER | __GFP_DMA32); + xen_obj->num_pages = DIV_ROUND_UP(size, PAGE_SIZE); xen_obj->pages = drm_gem_get_pages(_obj->base); if (IS_ERR_OR_NULL(xen_obj->pages)) { @@ -125,8 +143,27 @@ static struct xen_gem_object *gem_create(struct drm_device *dev, size_t size) goto fail; } + xen_obj->sgt = xen_drm_front_gem_get_sg_table(_obj->base); + if (IS_ERR_OR_NULL(xen_obj->sgt)){ + ret = PTR_ERR(xen_obj->sgt); + xen_obj->sgt = NULL; + goto fail_put_pages; + } + + if (!dma_map_sg(dev->dev, xen_obj->sgt->sgl, xen_obj->sgt->nents, + DMA_BIDIRECTIONAL)) { + ret = -EFAULT; + goto fail_free_sgt; + } + return xen_obj; +fail_free_sgt: + sg_free_table(xen_obj->sgt); + xen_obj->sgt = NULL; +fail_put_pages: + drm_gem_put_pages(_obj->base, xen_obj->pages, true, false); + xen_obj->pages = NULL; fail: DRM_ERROR("Failed to allocate buffer with size %zu\n", size); return ERR_PTR(ret); @@ -149,7 +186,7 @@ void xen_drm_front_gem_free_object_unlocked(struct drm_gem_object *gem_obj) struct xen_gem_object *xen_obj = to_xen_gem_obj(gem_obj); if (xen_obj->base.import_attach) { - drm_prime_gem_destroy(_obj->base, xen_obj->sgt_imported); + drm_prime_gem_destroy(_obj->base, xen_obj->sgt); gem_free_pages_array(xen_obj); } else { if (xen_obj->pages) { @@ -158,6 +195,13 @@ void xen_drm_front_gem_free_object_unlocked(struct drm_gem_object *gem_obj) xen_obj->pages); gem_free_pages_array(xen_obj); } else { +
[Xen-devel] [qemu-mainline test] 131285: regressions - FAIL
flight 131285 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/131285/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 131172 test-armhf-armhf-libvirt-raw 7 xen-boot fail REGR. vs. 131172 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 131172 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 131172 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 131172 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 131172 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass version targeted for testing: qemuu4b3aab204204ca742836219b97b538d90584f4f2 baseline version: qemuu4f818e7b7f8ecb5c166d093b8859fec2ddeca2ef Last test of basis 131172 2018-12-09 12:08:42 Z4 days Failing since131240 2018-12-11 17:37:01 Z2 days2 attempts Testing same since 131285 2018-12-13 00:27:23 Z1 days1 attempts People who touched revisions under test: Aleksandar Markovic Alex Williamson Corey Minyard David Gibson David Hildenbrand Dongli Zhang Eduardo Habkost Eric Blake Fabrice Desclaux fabrice.descl...@cea.fr Fam Zheng Fam Zheng Gerd Hoffmann Kashyap Chamarthy Laurent Vivier Li Qiang Li Qiang Marc-André Lureau Peter Maydell Philippe Mathieu-Daudé Thomas Huth Yuval Shaia Zhang Yi jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl pass
Re: [Xen-devel] [PATCH 2/2] avoid TABs in files that only contain a few
On Thu, Dec 13, 2018 at 11:37:37PM +0100, Paolo Bonzini wrote: > Most files that have TABs only contain a handful of them. Change > them to spaces so that we don't confuse people. > > disas, standard-headers, linux-headers and libdecnumber are imported > from other projects and probably should be exempted from the check. For sure for standard-headers, linux-headers since if someone does contribute a patch we want them to contribue upstream. > Outside those, after this patch the following files still contain both > 8-space and TAB sequences at the beginning of the line. Many of them > have a majority of TABs, or were initially committed with all tabs. > > bsd-user/i386/target_syscall.h > bsd-user/x86_64/target_syscall.h > crypto/aes.c > hw/audio/fmopl.c > hw/audio/fmopl.h > hw/block/tc58128.c > hw/display/cirrus_vga.c > hw/display/xenfb.c > hw/dma/etraxfs_dma.c > hw/intc/sh_intc.c > hw/misc/mst_fpga.c > hw/net/pcnet.c > hw/sh4/sh7750.c > hw/timer/m48t59.c > hw/timer/sh_timer.c > include/crypto/aes.h > include/disas/bfd.h > include/hw/sh4/sh.h > libdecnumber/decNumber.c > linux-headers/asm-generic/unistd.h > linux-headers/linux/kvm.h > linux-user/alpha/target_syscall.h > linux-user/arm/nwfpe/double_cpdo.c > linux-user/arm/nwfpe/fpa11_cpdt.c > linux-user/arm/nwfpe/fpa11_cprt.c > linux-user/arm/nwfpe/fpa11.h > linux-user/flat.h > linux-user/flatload.c > linux-user/i386/target_syscall.h > linux-user/ppc/target_syscall.h > linux-user/sparc/target_syscall.h > linux-user/syscall.c > linux-user/syscall_defs.h > linux-user/x86_64/target_syscall.h > slirp/cksum.c > slirp/if.c > slirp/ip.h > slirp/ip_icmp.c > slirp/ip_icmp.h > slirp/ip_input.c > slirp/ip_output.c > slirp/mbuf.c > slirp/misc.c > slirp/sbuf.c > slirp/socket.c > slirp/socket.h > slirp/tcp_input.c > slirp/tcpip.h > slirp/tcp_output.c > slirp/tcp_subr.c > slirp/tcp_timer.c > slirp/tftp.c > slirp/udp.c > slirp/udp.h > target/cris/cpu.h > target/cris/mmu.c > target/cris/op_helper.c > target/sh4/helper.c > target/sh4/op_helper.c > target/sh4/translate.c > tcg/sparc/tcg-target.inc.c > tests/tcg/cris/check_addo.c > tests/tcg/cris/check_moveq.c > tests/tcg/cris/check_swap.c > tests/tcg/multiarch/test-mmap.c > ui/vnc-enc-hextile-template.h > ui/vnc-enc-zywrle.h > util/envlist.c > util/readline.c > > The following have only TABs: > > bsd-user/i386/target_signal.h > bsd-user/sparc64/target_signal.h > bsd-user/sparc64/target_syscall.h > bsd-user/sparc/target_signal.h > bsd-user/sparc/target_syscall.h > bsd-user/x86_64/target_signal.h > crypto/desrfb.c > hw/audio/intel-hda-defs.h > hw/core/uboot_image.h > hw/sh4/sh7750_regnames.c > hw/sh4/sh7750_regs.h > include/hw/cris/etraxfs_dma.h > linux-user/alpha/termbits.h > linux-user/arm/nwfpe/fpopcode.h > linux-user/arm/nwfpe/fpsr.h > linux-user/arm/syscall_nr.h > linux-user/arm/target_signal.h > linux-user/cris/target_signal.h > linux-user/i386/target_signal.h > linux-user/linux_loop.h > linux-user/m68k/target_signal.h > linux-user/microblaze/target_signal.h > linux-user/mips64/target_signal.h > linux-user/mips/target_signal.h > linux-user/mips/target_syscall.h > linux-user/mips/termbits.h > linux-user/ppc/target_signal.h > linux-user/sh4/target_signal.h > linux-user/sh4/termbits.h > linux-user/sparc64/target_syscall.h > linux-user/sparc/target_signal.h > linux-user/x86_64/target_signal.h > linux-user/x86_64/termbits.h > pc-bios/optionrom/optionrom.h > slirp/mbuf.h > slirp/misc.h > slirp/sbuf.h > slirp/tcp.h > slirp/tcp_timer.h > slirp/tcp_var.h > target/i386/svm.h > target/sparc/asi.h > target/xtensa/core-dc232b/xtensa-modules.inc.c > target/xtensa/core-dc233c/xtensa-modules.inc.c > target/xtensa/core-de212/core-isa.h > target/xtensa/core-de212/xtensa-modules.inc.c > target/xtensa/core-fsf/xtensa-modules.inc.c > target/xtensa/core-sample_controller/core-isa.h > target/xtensa/core-sample_controller/xtensa-modules.inc.c > target/xtensa/core-test_kc705_be/core-isa.h > target/xtensa/core-test_kc705_be/xtensa-modules.inc.c > tests/tcg/cris/check_abs.c > tests/tcg/cris/check_addc.c > tests/tcg/cris/check_addcm.c > tests/tcg/cris/check_addoq.c > tests/tcg/cris/check_bound.c > tests/tcg/cris/check_ftag.c > tests/tcg/cris/check_int64.c > tests/tcg/cris/check_lz.c > tests/tcg/cris/check_openpf5.c > tests/tcg/cris/check_sigalrm.c > tests/tcg/cris/crisutils.h > tests/tcg/cris/sys.c > tests/tcg/i386/test-i386-ssse3.c > ui/vgafont.h > > Signed-off-by: Paolo Bonzini > --- > block/bochs.c
[Xen-devel] [PATCH v2 6/5] tools/docs: Remove PVRDTSCP remnants
PVRDTSCP is believed-unused, and its implementation has adverse consequences on unrelated functionality in the hypervisor. As a result, support has been removed. Modify libxl to provide a slightly more helpful error message if it encounters PVRDTSCP being selected. While adjusting TSC handling, make libxl check for errors from the set_tsc hypercall. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Wei Liu CC: Roger Pau Monné --- docs/man/xen-tscmode.pod.7| 94 +--- docs/man/xl.cfg.pod.5.in | 9 +- docs/misc/pvrdtscp.c | 307 -- tools/libxl/libxl_x86.c | 13 +- tools/python/xen/lowlevel/xc/xc.c | 2 +- 5 files changed, 19 insertions(+), 406 deletions(-) delete mode 100644 docs/misc/pvrdtscp.c diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7 index 819c61d..1d81a3f 100644 --- a/docs/man/xen-tscmode.pod.7 +++ b/docs/man/xen-tscmode.pod.7 @@ -77,9 +77,7 @@ highest performance is required. =item * B (PVRDTSCP). -High-TSC-frequency apps may be paravirtualized (modified) to -obtain both correctness and highest performance; any unmodified -apps must be TSC-resilient. +This mode has been removed. =back @@ -215,30 +213,6 @@ is emulated. Note that, though emulated, the "apparent" TSC frequency will be the TSC frequency of the initial physical machine, even after migration. -For environments where both TSC-safeness AND highest performance -even across migration is a requirement, application code can be specially -modified to use an algorithm explicitly designed into Xen for this purpose. -This mode (tsc_mode==3) is called PVRDTSCP, because it requires -app paravirtualization (awareness by the app that it may be running -on top of Xen), and utilizes a variation of the rdtsc instruction -called rdtscp that is available on most recent generation processors. -(The rdtscp instruction differs from the rdtsc instruction in that it -reads not only the TSC but an additional register set by system software.) -When a pvrdtscp-modified app is running on a processor that is both TSC-safe -and supports the rdtscp instruction, information can be obtained -about migration and TSC frequency/offset adjustment to allow the -vast majority of timestamps to be obtained at top performance; when -running on a TSC-unsafe processor or a processor that doesn't support -the rdtscp instruction, rdtscp is emulated. - -PVRDTSCP (tsc_mode==3) has two limitations. First, it applies to -all apps running in this virtual machine. This means that all -apps must either be TSC-resilient or pvrdtscp-modified. Second, -highest performance is only obtained on TSC-safe machines that -support the rdtscp instruction; when running on older machines, -rdtscp is emulated and thus slower. For more information on PVRDTSCP, -see below. - Finally, tsc_mode==1 always enables TSC emulation, regardless of the underlying physical hardware. The "apparent" TSC frequency will be the TSC frequency of the initial physical machine, even after migration. @@ -287,56 +261,7 @@ have been replaced by a paravirtualized equivalent of the cpuid instruction ("pvcpuid") and also trap to Xen. But apps in a PV guest that use a cpuid instruction execute it directly, without a trap to Xen. As a result, an app may directly examine the physical TSC Invariant cpuid bit and make -decisions based on that bit. This is still an unsolved problem, though -a workaround exists as part of the PVRDTSCP tsc_mode for apps that -can be modified. - -=head1 MORE ON PVRDTSCP - -Paravirtualized OS's use the "pvclock" algorithm to manage the passing -of time. This sophisticated algorithm obtains information from a memory -page shared between Xen and the OS and selects information from this -page based on the current virtual CPU (vcpu) in order to properly adapt to -TSC-unsafe systems and changes that occur across migration. Neither -this shared page nor the vcpu information is available to a userland -app so the pvclock algorithm cannot be directly used by an app, at least -without performance degradation roughly equal to the cost of just -emulating an rdtsc. - -As a result, as of 4.0, Xen provides capabilities for a userland app -to obtain key time values similar to the information accessible -to the PV OS pvclock algorithm. The app uses the rdtscp instruction -which is defined in recent processors to obtain both the TSC and an -auxiliary value called TSC_AUX. Xen is responsible for setting TSC_AUX -to the same value on all vcpus running any domain with tsc_mode==3; -further, Xen tools are responsible for monotonically incrementing TSC_AUX -anytime the domain is restored/migrated (thus changing key time values); -and, when the domain is running on a physical machine that either -is not TSC-safe or does not support the rdtscp instruction, Xen -is responsible for emulating the rdtscp instruction and for setting -TSC_AUX to zero on all processors. - -Xen also
Re: [Xen-devel] [PATCH 2/2] avoid TABs in files that only contain a few
On Thu, Dec 13, 2018 at 11:37:37PM +0100, Paolo Bonzini wrote: > Most files that have TABs only contain a handful of them. Change > them to spaces so that we don't confuse people. > > disas, standard-headers, linux-headers and libdecnumber are imported > from other projects and probably should be exempted from the check. > Outside those, after this patch the following files still contain both > 8-space and TAB sequences at the beginning of the line. Many of them > have a majority of TABs, or were initially committed with all tabs. > > bsd-user/i386/target_syscall.h > bsd-user/x86_64/target_syscall.h > crypto/aes.c > hw/audio/fmopl.c > hw/audio/fmopl.h > hw/block/tc58128.c > hw/display/cirrus_vga.c > hw/display/xenfb.c > hw/dma/etraxfs_dma.c > hw/intc/sh_intc.c > hw/misc/mst_fpga.c > hw/net/pcnet.c > hw/sh4/sh7750.c > hw/timer/m48t59.c > hw/timer/sh_timer.c > include/crypto/aes.h > include/disas/bfd.h > include/hw/sh4/sh.h > libdecnumber/decNumber.c > linux-headers/asm-generic/unistd.h > linux-headers/linux/kvm.h > linux-user/alpha/target_syscall.h > linux-user/arm/nwfpe/double_cpdo.c > linux-user/arm/nwfpe/fpa11_cpdt.c > linux-user/arm/nwfpe/fpa11_cprt.c > linux-user/arm/nwfpe/fpa11.h > linux-user/flat.h > linux-user/flatload.c > linux-user/i386/target_syscall.h > linux-user/ppc/target_syscall.h > linux-user/sparc/target_syscall.h > linux-user/syscall.c > linux-user/syscall_defs.h > linux-user/x86_64/target_syscall.h > slirp/cksum.c > slirp/if.c > slirp/ip.h > slirp/ip_icmp.c > slirp/ip_icmp.h > slirp/ip_input.c > slirp/ip_output.c > slirp/mbuf.c > slirp/misc.c > slirp/sbuf.c > slirp/socket.c > slirp/socket.h > slirp/tcp_input.c > slirp/tcpip.h > slirp/tcp_output.c > slirp/tcp_subr.c > slirp/tcp_timer.c > slirp/tftp.c > slirp/udp.c > slirp/udp.h > target/cris/cpu.h > target/cris/mmu.c > target/cris/op_helper.c > target/sh4/helper.c > target/sh4/op_helper.c > target/sh4/translate.c > tcg/sparc/tcg-target.inc.c > tests/tcg/cris/check_addo.c > tests/tcg/cris/check_moveq.c > tests/tcg/cris/check_swap.c > tests/tcg/multiarch/test-mmap.c > ui/vnc-enc-hextile-template.h > ui/vnc-enc-zywrle.h > util/envlist.c > util/readline.c > > The following have only TABs: > > bsd-user/i386/target_signal.h > bsd-user/sparc64/target_signal.h > bsd-user/sparc64/target_syscall.h > bsd-user/sparc/target_signal.h > bsd-user/sparc/target_syscall.h > bsd-user/x86_64/target_signal.h > crypto/desrfb.c > hw/audio/intel-hda-defs.h > hw/core/uboot_image.h > hw/sh4/sh7750_regnames.c > hw/sh4/sh7750_regs.h > include/hw/cris/etraxfs_dma.h > linux-user/alpha/termbits.h > linux-user/arm/nwfpe/fpopcode.h > linux-user/arm/nwfpe/fpsr.h > linux-user/arm/syscall_nr.h > linux-user/arm/target_signal.h > linux-user/cris/target_signal.h > linux-user/i386/target_signal.h > linux-user/linux_loop.h > linux-user/m68k/target_signal.h > linux-user/microblaze/target_signal.h > linux-user/mips64/target_signal.h > linux-user/mips/target_signal.h > linux-user/mips/target_syscall.h > linux-user/mips/termbits.h > linux-user/ppc/target_signal.h > linux-user/sh4/target_signal.h > linux-user/sh4/termbits.h > linux-user/sparc64/target_syscall.h > linux-user/sparc/target_signal.h > linux-user/x86_64/target_signal.h > linux-user/x86_64/termbits.h > pc-bios/optionrom/optionrom.h > slirp/mbuf.h > slirp/misc.h > slirp/sbuf.h > slirp/tcp.h > slirp/tcp_timer.h > slirp/tcp_var.h > target/i386/svm.h > target/sparc/asi.h > target/xtensa/core-dc232b/xtensa-modules.inc.c > target/xtensa/core-dc233c/xtensa-modules.inc.c > target/xtensa/core-de212/core-isa.h > target/xtensa/core-de212/xtensa-modules.inc.c > target/xtensa/core-fsf/xtensa-modules.inc.c > target/xtensa/core-sample_controller/core-isa.h > target/xtensa/core-sample_controller/xtensa-modules.inc.c > target/xtensa/core-test_kc705_be/core-isa.h > target/xtensa/core-test_kc705_be/xtensa-modules.inc.c > tests/tcg/cris/check_abs.c > tests/tcg/cris/check_addc.c > tests/tcg/cris/check_addcm.c > tests/tcg/cris/check_addoq.c > tests/tcg/cris/check_bound.c > tests/tcg/cris/check_ftag.c > tests/tcg/cris/check_int64.c > tests/tcg/cris/check_lz.c > tests/tcg/cris/check_openpf5.c > tests/tcg/cris/check_sigalrm.c > tests/tcg/cris/crisutils.h > tests/tcg/cris/sys.c > tests/tcg/i386/test-i386-ssse3.c > ui/vgafont.h > > Signed-off-by: Paolo Bonzini ppc parts Acked-by: David Gibson > --- > block/bochs.c | 22 ++--- > block/file-posix.c | 2 +- >
Re: [Xen-devel] [PATCH 2/2] avoid TABs in files that only contain a few
On 12/13/18 4:37 PM, Paolo Bonzini wrote: > Most files that have TABs only contain a handful of them. Change > them to spaces so that we don't confuse people. Acked-by: Richard Henderson r~ ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH 2/2] avoid TABs in files that only contain a few
Most files that have TABs only contain a handful of them. Change them to spaces so that we don't confuse people. disas, standard-headers, linux-headers and libdecnumber are imported from other projects and probably should be exempted from the check. Outside those, after this patch the following files still contain both 8-space and TAB sequences at the beginning of the line. Many of them have a majority of TABs, or were initially committed with all tabs. bsd-user/i386/target_syscall.h bsd-user/x86_64/target_syscall.h crypto/aes.c hw/audio/fmopl.c hw/audio/fmopl.h hw/block/tc58128.c hw/display/cirrus_vga.c hw/display/xenfb.c hw/dma/etraxfs_dma.c hw/intc/sh_intc.c hw/misc/mst_fpga.c hw/net/pcnet.c hw/sh4/sh7750.c hw/timer/m48t59.c hw/timer/sh_timer.c include/crypto/aes.h include/disas/bfd.h include/hw/sh4/sh.h libdecnumber/decNumber.c linux-headers/asm-generic/unistd.h linux-headers/linux/kvm.h linux-user/alpha/target_syscall.h linux-user/arm/nwfpe/double_cpdo.c linux-user/arm/nwfpe/fpa11_cpdt.c linux-user/arm/nwfpe/fpa11_cprt.c linux-user/arm/nwfpe/fpa11.h linux-user/flat.h linux-user/flatload.c linux-user/i386/target_syscall.h linux-user/ppc/target_syscall.h linux-user/sparc/target_syscall.h linux-user/syscall.c linux-user/syscall_defs.h linux-user/x86_64/target_syscall.h slirp/cksum.c slirp/if.c slirp/ip.h slirp/ip_icmp.c slirp/ip_icmp.h slirp/ip_input.c slirp/ip_output.c slirp/mbuf.c slirp/misc.c slirp/sbuf.c slirp/socket.c slirp/socket.h slirp/tcp_input.c slirp/tcpip.h slirp/tcp_output.c slirp/tcp_subr.c slirp/tcp_timer.c slirp/tftp.c slirp/udp.c slirp/udp.h target/cris/cpu.h target/cris/mmu.c target/cris/op_helper.c target/sh4/helper.c target/sh4/op_helper.c target/sh4/translate.c tcg/sparc/tcg-target.inc.c tests/tcg/cris/check_addo.c tests/tcg/cris/check_moveq.c tests/tcg/cris/check_swap.c tests/tcg/multiarch/test-mmap.c ui/vnc-enc-hextile-template.h ui/vnc-enc-zywrle.h util/envlist.c util/readline.c The following have only TABs: bsd-user/i386/target_signal.h bsd-user/sparc64/target_signal.h bsd-user/sparc64/target_syscall.h bsd-user/sparc/target_signal.h bsd-user/sparc/target_syscall.h bsd-user/x86_64/target_signal.h crypto/desrfb.c hw/audio/intel-hda-defs.h hw/core/uboot_image.h hw/sh4/sh7750_regnames.c hw/sh4/sh7750_regs.h include/hw/cris/etraxfs_dma.h linux-user/alpha/termbits.h linux-user/arm/nwfpe/fpopcode.h linux-user/arm/nwfpe/fpsr.h linux-user/arm/syscall_nr.h linux-user/arm/target_signal.h linux-user/cris/target_signal.h linux-user/i386/target_signal.h linux-user/linux_loop.h linux-user/m68k/target_signal.h linux-user/microblaze/target_signal.h linux-user/mips64/target_signal.h linux-user/mips/target_signal.h linux-user/mips/target_syscall.h linux-user/mips/termbits.h linux-user/ppc/target_signal.h linux-user/sh4/target_signal.h linux-user/sh4/termbits.h linux-user/sparc64/target_syscall.h linux-user/sparc/target_signal.h linux-user/x86_64/target_signal.h linux-user/x86_64/termbits.h pc-bios/optionrom/optionrom.h slirp/mbuf.h slirp/misc.h slirp/sbuf.h slirp/tcp.h slirp/tcp_timer.h slirp/tcp_var.h target/i386/svm.h target/sparc/asi.h target/xtensa/core-dc232b/xtensa-modules.inc.c target/xtensa/core-dc233c/xtensa-modules.inc.c target/xtensa/core-de212/core-isa.h target/xtensa/core-de212/xtensa-modules.inc.c target/xtensa/core-fsf/xtensa-modules.inc.c target/xtensa/core-sample_controller/core-isa.h target/xtensa/core-sample_controller/xtensa-modules.inc.c target/xtensa/core-test_kc705_be/core-isa.h target/xtensa/core-test_kc705_be/xtensa-modules.inc.c tests/tcg/cris/check_abs.c tests/tcg/cris/check_addc.c tests/tcg/cris/check_addcm.c tests/tcg/cris/check_addoq.c tests/tcg/cris/check_bound.c tests/tcg/cris/check_ftag.c tests/tcg/cris/check_int64.c tests/tcg/cris/check_lz.c tests/tcg/cris/check_openpf5.c tests/tcg/cris/check_sigalrm.c tests/tcg/cris/crisutils.h tests/tcg/cris/sys.c tests/tcg/i386/test-i386-ssse3.c ui/vgafont.h Signed-off-by: Paolo Bonzini --- block/bochs.c | 22 ++--- block/file-posix.c | 2 +- block/file-win32.c | 8 +- block/linux-aio.c | 4 +- block/qcow2-cluster.c | 2 +- block/vpc.c| 2 +- bsd-user/elfload.c | 2 +- contrib/elf2dmp/main.c | 2 +- hw/alpha/typhoon.c | 12 +-- hw/arm/stellaris.c
Re: [Xen-devel] [PATCH for-4.12] dt: fix integer check in dt_device_is_compatible
On Thu, 13 Dec 2018, Julien Grall wrote: > Hi Stefano, > > On 12/12/18 11:53 PM, Stefano Stabellini wrote: > > On Wed, 12 Dec 2018, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 11/12/2018 18:46, Stefano Stabellini wrote: > > > > cplen is unsigned, thus, it can never be < 0. Use a signed integer local > > > > variable instead. > > > > > > The current code checks > 0. Looking at the code I don't think it can ever > > > be > > > negative. So can you details what is the problem you are trying to > > > resolve? > > > > Yes, it can be "negative" (not actually negative because it is defined > > as unsigned), in fact it happens with the nodes added dynamically by grub > > at boot. This patches fixes booting from grub. > > > > Specifically ilen is initialed to 16, but strlen+1 is 17, so length > > becomes -1. The length of the property generated by grub seems to be > > short by 1. > Such explanation should have been in the commit message, it helps to > diagnostics where the problem is coming from. > > Looking at the specification [1] section 2.3.1, a compatible property is a > concatenated list of null terminated strings. So the current code is actually > correct and there is a bug in GRUB. > > This bug was actually fixed by commit ae5817f1d "arm64/xen_boot: Fix Xen boot > using GRUB2 on AARCH64". > > I assume you are using Grub 2.02 which does not contain the full support for > Xen. So I would not even consider to work-around it in Xen. > > Instead, I would recommend you to upgrade to a more recent GRUB. It would be > more likely staging as there are no new GRUB version. > > Another solution is to use chainloading. it is true that it was a "temporary" bug in grub ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 1/3] x86/svm: Simplify svm_get_insn_len()
The existing __get_instruction_length_from_list() has a single user which uses the list functionality. That user however should be looking specifically for INVD or WBINVD, as reported by the vmexit exit reason. Modify svm_vmexit_do_invalidate_cache() to ask for the correct instruction, and drop all list functionality from the helper. Take the opportunity to rename it to svm_get_insn_len(), and drop the IOIO length handling whch has never been used. Signed-off-by: Andrew Cooper --- CC: Jan Beulich CC: Wei Liu CC: Roger Pau Monné CC: Boris Ostrovsky CC: Suravee Suthikulpanit CC: Brian Woods v2: * New --- xen/arch/x86/hvm/svm/emulate.c| 65 --- xen/arch/x86/hvm/svm/nestedsvm.c | 9 ++--- xen/arch/x86/hvm/svm/svm.c| 34 +- xen/include/asm-x86/hvm/svm/emulate.h | 9 + 4 files changed, 51 insertions(+), 66 deletions(-) diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c index 3d04af0..3f695b9 100644 --- a/xen/arch/x86/hvm/svm/emulate.c +++ b/xen/arch/x86/hvm/svm/emulate.c @@ -83,13 +83,12 @@ static const struct { [INSTR_CPUID] = { X86EMUL_OPC(0x0f, 0xa2) }, }; -int __get_instruction_length_from_list(struct vcpu *v, -const enum instruction_index *list, unsigned int list_count) +int svm_get_insn_len(struct vcpu *v, enum instruction_index insn) { struct vmcb_struct *vmcb = v->arch.hvm.svm.vmcb; struct hvm_emulate_ctxt ctxt; struct x86_emulate_state *state; -unsigned long inst_len, j; +unsigned long nrip_len, emul_len; unsigned int modrm_rm, modrm_reg; int modrm_mod; @@ -98,13 +97,10 @@ int __get_instruction_length_from_list(struct vcpu *v, * hardware. */ #ifdef NDEBUG -if ( (inst_len = svm_nextrip_insn_length(v)) > MAX_INST_LEN ) -gprintk(XENLOG_WARNING, "NRip reported inst_len %lu\n", inst_len); -else if ( inst_len != 0 ) -return inst_len; - -if ( vmcb->exitcode == VMEXIT_IOIO ) -return vmcb->exitinfo2 - vmcb->rip; +if ( (nrip_len = svm_nextrip_insn_length(v)) > MAX_INST_LEN ) +gprintk(XENLOG_WARNING, "NRip reported inst_len %lu\n", nrip_len); +else if ( nrip_len != 0 ) +return nrip_len; #endif ASSERT(v == current); @@ -114,46 +110,43 @@ int __get_instruction_length_from_list(struct vcpu *v, if ( IS_ERR_OR_NULL(state) ) return 0; -inst_len = x86_insn_length(state, ); +emul_len = x86_insn_length(state, ); modrm_mod = x86_insn_modrm(state, _rm, _reg); x86_emulate_free_state(state); + #ifndef NDEBUG -if ( vmcb->exitcode == VMEXIT_IOIO ) -j = vmcb->exitinfo2 - vmcb->rip; -else -j = svm_nextrip_insn_length(v); -if ( j && j != inst_len ) +nrip_len = svm_nextrip_insn_length(v); +if ( nrip_len && nrip_len != emul_len ) { gprintk(XENLOG_WARNING, "insn-len[%02x]=%lu (exp %lu)\n", -ctxt.ctxt.opcode, inst_len, j); -return j; +ctxt.ctxt.opcode, nrip_len, emul_len); +return nrip_len; } #endif -for ( j = 0; j < list_count; j++ ) +if ( (unsigned int)insn >= ARRAY_SIZE(opc_tab) ) { -unsigned int instr = list[j]; - -if ( instr >= ARRAY_SIZE(opc_tab) ) -{ -ASSERT_UNREACHABLE(); -break; -} -if ( opc_tab[instr].opcode == ctxt.ctxt.opcode ) -{ -if ( !opc_tab[instr].modrm.mod ) -return inst_len; - -if ( modrm_mod == opc_tab[instr].modrm.mod && - (modrm_rm & 7) == opc_tab[instr].modrm.rm && - (modrm_reg & 7) == opc_tab[instr].modrm.reg ) -return inst_len; -} +gdprintk(XENLOG_ERR, "insn %d out of range\n", insn); +ASSERT_UNREACHABLE(); +goto out; +} + +if ( opc_tab[insn].opcode == ctxt.ctxt.opcode ) +{ +if ( !opc_tab[insn].modrm.mod ) +return emul_len; + +if ( modrm_mod == opc_tab[insn].modrm.mod && + (modrm_rm & 7) == opc_tab[insn].modrm.rm && + (modrm_reg & 7) == opc_tab[insn].modrm.reg ) +return emul_len; } gdprintk(XENLOG_WARNING, "%s: Mismatch between expected and actual instruction: " "eip = %lx\n", __func__, (unsigned long)vmcb->rip); + + out: hvm_inject_hw_exception(TRAP_gp_fault, 0); return 0; } diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index 9660202..35c1a04 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -743,8 +743,9 @@ nsvm_vcpu_vmrun(struct vcpu *v, struct cpu_user_regs *regs) struct nestedvcpu *nv = _nestedhvm(v); struct nestedsvm *svm = _nestedsvm(v); -inst_len = __get_instruction_length(v, INSTR_VMRUN); -if (inst_len == 0) { +inst_len = svm_get_insn_len(v, INSTR_VMRUN); +if ( inst_len == 0 ) +{
[Xen-devel] [PATCH v2 3/3] x86/hvm: Corrections to RDTSCP intercept handling
For both VT-x and SVM, the RDTSCP intercept will trigger if the pipeline supports the instruction, but the guest may have not have rdtscp in its featureset. Bring the vmexit handlers in line with the main emulator behaviour by optionally handing back #UD. Next on the AMD side, if RDTSCP actually ends up being intercepted on a debug build, we first update regs->rcx, then call __get_instruction_length() asking for RDTSC. As the two instructions are different (and indeed, different lengths!), __get_instruction_length_from_list() fails and hands back a #GP fault. This can demonstrated by putting a guest into tsc_mode="always emulate" and executing an rdtscp instruction: (d1) --- Xen Test Framework --- (d1) Environment: HVM 64bit (Long mode 4 levels) (d1) Test rdtscp (d1) TSC mode 1 (XEN) emulate.c:147:d1v0 svm_get_insn_len: Mismatch between expected and actual instruction: (XEN) emulate.c:152:d1v0 insn_index 8, opcode 0xf0031 modrm 0 (XEN) emulate.c:154:d1v0 rip 0x10475f, nextrip 0x104762, len 3 (XEN) SVM insn len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01 f9 0f 31 5b 31 ff 31 c0 e9 c2 db ff ff 00 (d1) ** (d1) PANIC: Unhandled exception at 0008:0010475f (d1) Vec 13 #GP[] (d1) ** First, teach svm_get_insn_len() to cope with RDTSCP, and improve svm_vmexit_do_rdtsc() to ask for the correct instruction. Move the regs->rcx adjustment into this function to ensure it gets done after we are done potentially raising faults. Reported-by: Paul Durrant Signed-off-by: Andrew Cooper Reviewed-by: Brian Woods Reviewed-by: Jan Beulich Reviewed-by: Kevin Tian --- CC: Wei Liu CC: Roger Pau Monné CC: Paul Durrant v2: * Rebase --- xen/arch/x86/hvm/svm/emulate.c| 1 + xen/arch/x86/hvm/svm/svm.c| 22 +- xen/arch/x86/hvm/vmx/vmx.c| 8 xen/include/asm-x86/hvm/svm/emulate.h | 1 + 4 files changed, 27 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c index 73cef5b..0778dc7 100644 --- a/xen/arch/x86/hvm/svm/emulate.c +++ b/xen/arch/x86/hvm/svm/emulate.c @@ -75,6 +75,7 @@ static const struct { [INSTR_STGI]= { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 4) }, [INSTR_CLGI]= { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 5) }, [INSTR_INVLPGA] = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 3, 7) }, +[INSTR_RDTSCP] = { X86EMUL_OPC(0x0f, 0x01), MODRM(3, 7, 1) }, [INSTR_INVD]= { X86EMUL_OPC(0x0f, 0x08) }, [INSTR_WBINVD] = { X86EMUL_OPC(0x0f, 0x09) }, [INSTR_WRMSR] = { X86EMUL_OPC(0x0f, 0x30) }, diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index f8b7e8b..d2c0d64 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2279,14 +2279,28 @@ static void svm_vmexit_do_hlt(struct vmcb_struct *vmcb, hvm_hlt(regs->eflags); } -static void svm_vmexit_do_rdtsc(struct cpu_user_regs *regs) +static void svm_vmexit_do_rdtsc(struct cpu_user_regs *regs, bool rdtscp) { +struct vcpu *curr = current; +const struct domain *currd = curr->domain; +enum instruction_index insn = rdtscp ? INSTR_RDTSCP : INSTR_RDTSC; unsigned int inst_len; -if ( (inst_len = svm_get_insn_len(current, INSTR_RDTSC)) == 0 ) +if ( rdtscp && !currd->arch.cpuid->extd.rdtscp && + currd->arch.tsc_mode != TSC_MODE_PVRDTSCP ) +{ +hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC); return; +} + +if ( (inst_len = svm_get_insn_len(curr, insn)) == 0 ) +return; + __update_guest_eip(regs, inst_len); +if ( rdtscp ) +regs->rcx = hvm_msr_tsc_aux(curr); + hvm_rdtsc_intercept(regs); } @@ -2966,10 +2980,8 @@ void svm_vmexit_handler(struct cpu_user_regs *regs) break; case VMEXIT_RDTSCP: -regs->rcx = hvm_msr_tsc_aux(v); -/* fall through */ case VMEXIT_RDTSC: -svm_vmexit_do_rdtsc(regs); +svm_vmexit_do_rdtsc(regs, exit_reason == VMEXIT_RDTSCP); break; case VMEXIT_MONITOR: diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 7f77d1f..2166b0d 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3589,6 +3589,7 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0; unsigned int vector = 0, mode; struct vcpu *v = current; +struct domain *currd = v->domain; __vmread(GUEST_RIP,>rip); __vmread(GUEST_RSP,>rsp); @@ -3956,6 +3957,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) vmx_invlpg_intercept(exit_qualification); break; case EXIT_REASON_RDTSCP: +if ( !currd->arch.cpuid->extd.rdtscp && + currd->arch.tsc_mode != TSC_MODE_PVRDTSCP ) +{ +hvm_inject_hw_exception(TRAP_invalid_op, X86_EVENT_NO_EC); +
[Xen-devel] [PATCH v2 2/3] x86/svm: Improve diagnostics when svm_get_insn_len() fails
Sadly, a lone: (XEN) emulate.c:156:d2v0 svm_get_insn_len: Mismatch between expected and actual instruction: eip = f804564139c0 on the console is of no use trying to identify what went wrong. Dump as much state as we can to help identify what went wrong. Reported-by: Paul Durrant Signed-off-by: Andrew Cooper Acked-by: Brian Woods --- CC: Jan Beulich CC: Wei Liu CC: Roger Pau Monné CC: Paul Durrant CC: Boris Ostrovsky CC: Suravee Suthikulpanit CC: Brian Woods v2: * Drop anonymous union * Rebase --- xen/arch/x86/hvm/svm/emulate.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/svm/emulate.c b/xen/arch/x86/hvm/svm/emulate.c index 3f695b9..73cef5b 100644 --- a/xen/arch/x86/hvm/svm/emulate.c +++ b/xen/arch/x86/hvm/svm/emulate.c @@ -143,8 +143,17 @@ int svm_get_insn_len(struct vcpu *v, enum instruction_index insn) } gdprintk(XENLOG_WARNING, - "%s: Mismatch between expected and actual instruction: " - "eip = %lx\n", __func__, (unsigned long)vmcb->rip); + "%s: Mismatch between expected and actual instruction:\n", + __func__); +gdprintk(XENLOG_WARNING, + " insn_index %d, opcode %#x modrm %#x\n", + insn, opc_tab[insn].opcode, ((opc_tab[insn].modrm.rm << 6) | + (opc_tab[insn].modrm.reg << 3) | + (opc_tab[insn].modrm.mod))); +gdprintk(XENLOG_WARNING, " rip %#lx, nextrip %#lx, len %lu\n", + vmcb->rip, vmcb->nextrip, vmcb->nextrip - vmcb->rip); +hvm_dump_emulation_state(XENLOG_G_WARNING, "SVM Insn len", + , X86EMUL_UNHANDLEABLE); out: hvm_inject_hw_exception(TRAP_gp_fault, 0); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v2 0/3] Fixes to RDTSCP interception
Updates in v2: * Rework svm_get_insn_len() to be more simple. * Drop anonymous union which will surely fail to compile on RHEL 6.x vintage compilers. * Rebase other changes Andrew Cooper (3): x86/svm: Simplify svm_get_insn_len() x86/svm: Improve diagnostics when svm_get_insn_len() fails x86/hvm: Corrections to RDTSCP intercept handling xen/arch/x86/hvm/svm/emulate.c| 79 ++- xen/arch/x86/hvm/svm/nestedsvm.c | 9 ++-- xen/arch/x86/hvm/svm/svm.c| 54 ++-- xen/arch/x86/hvm/vmx/vmx.c| 8 xen/include/asm-x86/hvm/svm/emulate.h | 10 + 5 files changed, 88 insertions(+), 72 deletions(-) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [linux-next test] 131263: regressions - FAIL
flight 131263 linux-next real [real] http://logs.test-lab.xenproject.org/osstest/logs/131263/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemuu-debianhvm-amd64 7 xen-bootfail REGR. vs. 131190 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemuu-win10-i386 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-credit2 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemut-ws16-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemuu-ws16-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-freebsd10-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-qemuu-nested-intel 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-amd64-pvgrub 7 xen-bootfail REGR. vs. 131190 test-amd64-amd64-xl-shadow7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-i386-pvgrub 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemut-win7-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-qemuu-rhel6hvm-intel 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemuu-ovmf-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-freebsd10-i386 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-qemuu-nested-amd 7 xen-bootfail REGR. vs. 131190 test-amd64-amd64-xl-qemut-win10-i386 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemut-debianhvm-amd64 7 xen-bootfail REGR. vs. 131190 test-amd64-i386-qemut-rhel6hvm-amd 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemut-ws16-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-xsm 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-pvshim7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-libvirt 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemuu-ws16-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemuu-win10-i386 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-qemuu-ovmf-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-pvshim 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemut-win7-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-i386-xl-qemuu-win7-amd64 7 xen-boot fail REGR. vs. 131190 test-amd64-amd64-xl-pvhv2-amd 7 xen-bootfail REGR. vs. 131190 test-amd64-amd64-xl 7 xen-boot fail REGR. vs. 131190 test-armhf-armhf-libvirt-raw 7 xen-boot fail REGR. vs. 131190 test-armhf-armhf-xl-vhd 7 xen-boot fail REGR. vs. 131190 test-armhf-armhf-examine 8 reboot fail REGR. vs. 131190 test-armhf-armhf-xl-credit1 7 xen-boot fail REGR. vs. 131190 test-armhf-armhf-xl-cubietruck 7 xen-boot fail REGR. vs. 131190 test-armhf-armhf-xl-multivcpu 7 xen-bootfail REGR. vs. 131190 test-armhf-armhf-libvirt 7 xen-boot fail REGR. vs. 131190 test-armhf-armhf-xl-credit2 7 xen-boot fail REGR. vs. 131190 test-armhf-armhf-xl 7 xen-boot fail REGR. vs. 131190 test-armhf-armhf-xl-arndale 7 xen-boot fail REGR. vs. 131190 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 7 xen-boot fail REGR. vs. 131190 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvhv2-intel 7 xen-boot fail like 131190 test-amd64-amd64-libvirt-vhd 7 xen-boot fail like 131190 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 7 xen-boot fail like 131190 test-amd64-i386-rumprun-i386 7 xen-boot fail like
Re: [Xen-devel] arm: xl vcpu-pin leads to oom-killer slashing processes
On 12/13/18 3:13 PM, Andrii Anisov wrote: Hello All, OK, I've discovered a mechanism of the issue. It is because of `d->max_pages = ~0U;` in a `construct_dom0()`. When I do vcpu-pin, libxl updates memory nodes in xenstore for Dom0. Then kernel watch sees those changes and trying to set new target for ballon, but the target becomes extremely high, and baloon sucks all the pages. In my kernel (4.14) in `watch_target()` function there is a code: target_diff = xen_pv_domain() ? 0 : static_max - balloon_stats.target_pages; Here we have `xen_pv_domain()` equal to zero, so `target_diff` big. Then, few lines below: balloon_set_new_target(new_target - target_diff); `balloon_set_new_target()` receives a value wrapped over 64bit what kills the system. Now I'm looking for an appropriate kernel patch for the kernel, to fix that. Any suggestions? You should use linux kernel commit 3596924a233e45aa918. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 6/7] xen/arm: zynqmp: implement zynqmp_eemi
Hi, On 12/12/18 11:56 PM, Stefano Stabellini wrote: On Wed, 12 Dec 2018, Julien Grall wrote: On 11/12/2018 22:23, Stefano Stabellini wrote: On Tue, 11 Dec 2018, Julien Grall wrote: Furthermore, you are forwarding unsanitized values to the firmware. For instance, what would happen if the number of parameters of the call are increased? How are you sure this will not open a hole? EEMI is backward compatible and the implementation is tested with Xen regularly. A change like the one you describe should be considered a backward compatibility breakage. I disagree, you can still make backward compatible. For instance, the new parameters could be gated by a flag in an existing parameter. Another way is Xilinx promise that non-existing arguments should always be 0 and then re-purpose the value for non-zero case. While it is backward compatible, you may end up passing unsanitized value to the firmware. Not sure this is what we want... Backward compatibility is not only about avoiding breaking existing call parameters and return values. It is also about not breaking the semantics of the calls. If a call is deemed guest safe (when a guest has a device assigned) so Xen forwards the call to firmware, but then firmware introduces a new parameter (before it was ignored) and with it, allows more extensive operations that go beyong a single device, I think that is semantics breakage. In any case, this is almost impossible to do with EEMI because calls take a node_id parameter or similar that identifies the area of the effect of the operation, and we already check on the node_id to see if the guest is allowed to make the call. Please write it down in the code so we know why we don't need to worry on parameters. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 5/7] xen/arm: zynqmp: eemi access control
Hi Stefano, On 12/12/18 11:57 PM, Stefano Stabellini wrote: On Tue, 11 Dec 2018, Julien Grall wrote: Hi, On 03/12/2018 21:03, Stefano Stabellini wrote: From: "Edgar E. Iglesias" From: Edgar E. Iglesias Introduce data structs to implement basic access controls. Introduce the following three functions: domain_has_node_access: check access to the node domain_has_reset_access: check access to the reset line domain_has_mmio_access: check access to the register Signed-off-by: Edgar E. Iglesias Signed-off-by: Stefano Stabellini --- Statically defines: - pm_node_access It encodes the relationship between a node id and the start of the MMIO region of a device in the corresponding power domain. It is used for permission checking. Although the MMIO region start address is available on device tree and could be derived from there (we plan to improve that in the future), the relationship between a node id and corresponding devices is not described and needs to be hardcoded. - pm_reset_access Same as pm_node_access for reset lines. --- Changes in v5: - improve in-code comments - use mfn_t in struct pm_access - remove mmio_access table Changes in v4: - add #include as needed - add #if 0 for bisectability - use mfn_t in pm_check_access - add wrap-around ASSERT in domain_has_mmio_access - use GENMASK in domain_has_mmio_access - proper bound checks (== ARRAY_SIZE is out of bound) --- xen/arch/arm/platforms/xilinx-zynqmp-eemi.c | 348 1 file changed, 348 insertions(+) diff --git a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c index 369bb3f..92a02df 100644 --- a/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c +++ b/xen/arch/arm/platforms/xilinx-zynqmp-eemi.c @@ -16,9 +16,357 @@ * GNU General Public License for more details. */ +/* + * EEMI Power Management API access + * + * Refs: + * https://www.xilinx.com/support/documentation/user_guides/ug1200-eemi-api.pdf + * + * Background: + * The ZynqMP has a subsystem named the PMU with a CPU and special devices + * dedicated to running Power Management Firmware. Other masters in the + * system need to send requests to the PMU in order to for example: + * * Manage power state + * * Configure clocks + * * Program bitstreams for the programmable logic + * * etc + * + * Although the details of the setup are configurable, in the common case + * the PMU lives in the Secure world. NS World cannot directly communicate + * with it and must use proxy services from ARM Trusted Firmware to reach + * the PMU. + * + * Power Management on the ZynqMP is implemented in a layered manner. + * The PMU knows about various masters and will enforce access controls + * based on a pre-configured partitioning. This configuration dictates + * which devices are owned by the various masters and the PMU FW makes sure + * that a given master cannot turn off a device that it does not own or that + * is in use by other masters. + * + * The PMU is not aware of multiple execution states in masters. + * For example, it treats the ARMv8 cores as single units and does not + * distinguish between Secure vs NS OS's nor does it know about Hypervisors + * and multiple guests. It is up to software on the ARMv8 cores to present + * a unified view of its power requirements. + * + * To implement this unified view, ARM Trusted Firmware at EL3 provides + * access to the PM API via SMC calls. ARM Trusted Firmware is responsible + * for mediating between the Secure and the NS world, rejecting SMC calls + * that request changes that are not allowed. + * + * Xen running above ATF owns the NS world and is responsible for presenting + * unified PM requests taking all guests and the hypervisor into account. + * + * Implementation: + * The PM API contains different classes of calls. + * Certain calls are harmless to expose to any guest. + * These include calls to get the PM API Version, or to read out the version + * of the chip we're running on. + * + * In order to correctly virtualize these calls, we need to know if + * guests issuing these calls have ownership of the given device. + * The approach taken here is to map PM API Nodes identifying + * a device into base addresses for registers that belong to that + * same device. + * + * If the guest has access to devices registers, we give the guest + * access to PM API calls that affect that device. This is implemented + * by pm_node_access and domain_has_node_access(). + */ + +#include +#include #include #include +#if 0 +struct pm_access +{ +mfn_t mfn; +bool hwdom_access;/* HW domain gets access regardless. */ +}; + +/* + * This table maps a node into a memory address. Some of the nodes below don't have memory address. So this comment has to be updated. Yes, I'll update and improve the comment + * If a guest has access to the address, it has enough control + * over the node to grant it access to EEMI calls for that node. + */ +static const struct
Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests
>>> On 13.12.18 at 16:34, wrote: > On Thu, Dec 13, 2018 at 07:52:16AM -0700, Jan Beulich wrote: >> >>> On 13.12.18 at 15:14, wrote: >> > On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote: >> >> >>> On 13.12.18 at 12:39, wrote: >> >> > Well, Just keeping correct order between each domain locks should be >> >> > enough? >> >> > >> >> > Ie: exactly the same that Xen currently does but on a per-domain >> >> > basis. This is feasible, but each CPU would need to store the lock >> >> > order of each possible domain: >> >> > >> >> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]); >> >> > >> >> > This would consume ~32KB per CPU, which is not that much but seems a >> >> > waste when most of the time a single entry will be used. >> >> >> >> Well, tracking by domain ID wouldn't help you - the controlling >> >> domain may well have a higher ID than the being controlled one, >> >> i.e. the nesting you want needs to be independent of domain ID. >> > >> > It's not tracking the domain ID, but rather tracking the lock level of >> > each different domain, hence the need for the array in the pcpu >> > structure. The lock checker would take a domain id and a level, and >> > perform the check as: >> > >> > if ( mm_lock_level[domid] > level ) >> > panic >> >> But this would open things up for deadlocks because of intermixed >> lock usage between the calling domain's and the subject one's. >> There needs to be a linear sequence of locks (of all involved >> domains) describing the one and only order in which they may be >> acquired. > > Well, my plan was to only check for deadlocks between the locks of the > same domain, without taking into account intermixed domain locking. > > I guess at this point I will need some input from Tim and George about > how to proceed, because I'm not sure how to weight locks when using > intermixed domain locks, neither what is the correct order. The order > in paging_log_dirty_op looks like a valid order that we want to > support, but are there any others? > > Is it possible to have multiple valid interdomain lock orders that > cannot be expressed using the current weighted lock ordering? Well, first of all I'm afraid I didn't look closely enough at you original mail: We're not talking about the paging lock of two domains here, but about he paging lock of the subject domain and dom0's p2m lock. Second I then notice that (XEN) mm locking order violation: 64 > 16 indicates that it might not have complained when two similar locks of different domains were acquired in a nested fashion, which I'd call a shortcoming that would be nice to eliminate at this same occasion. And third, to answer your question, I can't see anything conceptually wrong with an arbitrary intermix of locks from two different domains, as long their inner-domain ordering is correct. E.g. a Dom0 hypercall may find a need to acquire - the subject domain's p2m lock - its own p2m lock - the subject domain's PoD lock - its own paging lock Of course it may be possible to determine that "own" locks are not supposed to be acquired outside of any "subject domain" ones, in which case we'd have a workable hierarchy (along the lines of what you had earlier suggested). But I'm not sure how expensive (in terms of code auditing) such determination is going be, which is why for the moment I'm trying to think of a solution (ordering criteria) for the general case. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v5 3/7] xen/arm: zynqmp: introduce zynqmp specific defines
Hi Stefano, On 12/12/18 11:55 PM, Stefano Stabellini wrote: On Wed, 12 Dec 2018, Julien Grall wrote: Hi Stefano, On 11/12/2018 19:22, Stefano Stabellini wrote: On Tue, 11 Dec 2018, Julien Grall wrote: Hi, On 03/12/2018 21:03, Stefano Stabellini wrote: What is the plan there? The plan is to extract the node_id from a power-domain node on device tree. Each device would have a phandler to link to the right power-domain node which contains a power-domain-id attribute. The power-domain-id attribute is the node_id here. The power-domain-id changes to the Xilinx MPSoC device tree are under discussion with the device tree community. If I understand correctly, we will never be able to remove the hardcoded values in Xen. This is because some device-tree may not have the bindings. Am I correct? If we want to support running on existing hardware and firmware releases, then you are correct. We won't be able to remove the hardcoded values. That ship has sailed, not much we can do about it. If in the future we decide to drop support for older firmware releases and ask users to update their firmare/devicetrees, then we'll be able to remove the hardcoded values. I think it is something we can consider. In fact, I would rather break compatibility than not providing support for basic power management functionalities at all. With your suggestion this means that Xen and the firmware are tied together. So you can't update Xen without updating the firmware. We already ruled out that behavior for x-gene. See the partial revert you did 420596c868 to revert back the quirk. I don't think this is very different here. We are introducing a feature, that we know will be broken afterwards unless you update your firmware. It is not like it was not planned... We should aim to support most of the official firmware unless there are a strong argument not to do. Basic power management for other than Dom0 is not a valid enough reason for me to break compatibility. I think it is better to introduce basic EEMI support now, even if in a couple of Xen releases from now we'll make the decision to drop the hardcoded values and require new Xilinx device trees. Xilinx users are used to updating firmware on these boards every 6 months, and it would beneficial for them to be able to take a Xen Project release rather than be forced to use a Xilinx Xen release to have support for power management. When the new bindings become available, as we introduce support for them in Xen, we can decide whether to keep the hardcoded values or whether we should take the opportunity to drop them. This is the right moment to discuss about it. It will be too late once we get merged. I don't want to see 2 solutions in Xen if we know that bindings are work-in-progress. I am happy to consider EEMI for Dom0. For the guests, it will have to wait the device-tree bindings. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] drm/xen-front: Make shmem backed display buffer coherent
On Thu, Dec 13, 2018 at 12:17:54PM +0200, Oleksandr Andrushchenko wrote: > Daniel, could you please comment? Cross-revieweing someone else's stuff would scale better, I don't think I'll get around to anything before next year. -Daniel > > Thank you > > On 11/27/18 12:32 PM, Oleksandr Andrushchenko wrote: > > From: Oleksandr Andrushchenko > > > > When GEM backing storage is allocated with drm_gem_get_pages > > the backing pages may be cached, thus making it possible that > > the backend sees only partial content of the buffer which may > > lead to screen artifacts. Make sure that the frontend's > > memory is coherent and the backend always sees correct display > > buffer content. > > > > Fixes: c575b7eeb89f ("drm/xen-front: Add support for Xen PV display > > frontend") > > > > Signed-off-by: Oleksandr Andrushchenko > > --- > > drivers/gpu/drm/xen/xen_drm_front_gem.c | 62 +++-- > > 1 file changed, 48 insertions(+), 14 deletions(-) > > > > diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c > > b/drivers/gpu/drm/xen/xen_drm_front_gem.c > > index 47ff019d3aef..c592735e49d2 100644 > > --- a/drivers/gpu/drm/xen/xen_drm_front_gem.c > > +++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c > > @@ -33,8 +33,11 @@ struct xen_gem_object { > > /* set for buffers allocated by the backend */ > > bool be_alloc; > > - /* this is for imported PRIME buffer */ > > - struct sg_table *sgt_imported; > > + /* > > +* this is for imported PRIME buffer or the one allocated via > > +* drm_gem_get_pages. > > +*/ > > + struct sg_table *sgt; > > }; > > static inline struct xen_gem_object * > > @@ -77,10 +80,21 @@ static struct xen_gem_object *gem_create_obj(struct > > drm_device *dev, > > return xen_obj; > > } > > +struct sg_table *xen_drm_front_gem_get_sg_table(struct drm_gem_object > > *gem_obj) > > +{ > > + struct xen_gem_object *xen_obj = to_xen_gem_obj(gem_obj); > > + > > + if (!xen_obj->pages) > > + return ERR_PTR(-ENOMEM); > > + > > + return drm_prime_pages_to_sg(xen_obj->pages, xen_obj->num_pages); > > +} > > + > > static struct xen_gem_object *gem_create(struct drm_device *dev, size_t > > size) > > { > > struct xen_drm_front_drm_info *drm_info = dev->dev_private; > > struct xen_gem_object *xen_obj; > > + struct address_space *mapping; > > int ret; > > size = round_up(size, PAGE_SIZE); > > @@ -113,10 +127,14 @@ static struct xen_gem_object *gem_create(struct > > drm_device *dev, size_t size) > > xen_obj->be_alloc = true; > > return xen_obj; > > } > > + > > /* > > * need to allocate backing pages now, so we can share those > > * with the backend > > */ > > + mapping = xen_obj->base.filp->f_mapping; > > + mapping_set_gfp_mask(mapping, GFP_USER | __GFP_DMA32); > > + > > xen_obj->num_pages = DIV_ROUND_UP(size, PAGE_SIZE); > > xen_obj->pages = drm_gem_get_pages(_obj->base); > > if (IS_ERR_OR_NULL(xen_obj->pages)) { > > @@ -125,8 +143,27 @@ static struct xen_gem_object *gem_create(struct > > drm_device *dev, size_t size) > > goto fail; > > } > > + xen_obj->sgt = xen_drm_front_gem_get_sg_table(_obj->base); > > + if (IS_ERR_OR_NULL(xen_obj->sgt)){ > > + ret = PTR_ERR(xen_obj->sgt); > > + xen_obj->sgt = NULL; > > + goto fail_put_pages; > > + } > > + > > + if (!dma_map_sg(dev->dev, xen_obj->sgt->sgl, xen_obj->sgt->nents, > > + DMA_BIDIRECTIONAL)) { > > + ret = -EFAULT; > > + goto fail_free_sgt; > > + } > > + > > return xen_obj; > > +fail_free_sgt: > > + sg_free_table(xen_obj->sgt); > > + xen_obj->sgt = NULL; > > +fail_put_pages: > > + drm_gem_put_pages(_obj->base, xen_obj->pages, true, false); > > + xen_obj->pages = NULL; > > fail: > > DRM_ERROR("Failed to allocate buffer with size %zu\n", size); > > return ERR_PTR(ret); > > @@ -149,7 +186,7 @@ void xen_drm_front_gem_free_object_unlocked(struct > > drm_gem_object *gem_obj) > > struct xen_gem_object *xen_obj = to_xen_gem_obj(gem_obj); > > if (xen_obj->base.import_attach) { > > - drm_prime_gem_destroy(_obj->base, xen_obj->sgt_imported); > > + drm_prime_gem_destroy(_obj->base, xen_obj->sgt); > > gem_free_pages_array(xen_obj); > > } else { > > if (xen_obj->pages) { > > @@ -158,6 +195,13 @@ void xen_drm_front_gem_free_object_unlocked(struct > > drm_gem_object *gem_obj) > > xen_obj->pages); > > gem_free_pages_array(xen_obj); > > } else { > > + if (xen_obj->sgt) { > > + dma_unmap_sg(xen_obj->base.dev->dev, > > +xen_obj->sgt->sgl, > > +xen_obj->sgt->nents, > > +
Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests
On Thu, Dec 13, 2018 at 07:52:16AM -0700, Jan Beulich wrote: > >>> On 13.12.18 at 15:14, wrote: > > On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote: > >> >>> On 13.12.18 at 12:39, wrote: > >> > Well, Just keeping correct order between each domain locks should be > >> > enough? > >> > > >> > Ie: exactly the same that Xen currently does but on a per-domain > >> > basis. This is feasible, but each CPU would need to store the lock > >> > order of each possible domain: > >> > > >> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]); > >> > > >> > This would consume ~32KB per CPU, which is not that much but seems a > >> > waste when most of the time a single entry will be used. > >> > >> Well, tracking by domain ID wouldn't help you - the controlling > >> domain may well have a higher ID than the being controlled one, > >> i.e. the nesting you want needs to be independent of domain ID. > > > > It's not tracking the domain ID, but rather tracking the lock level of > > each different domain, hence the need for the array in the pcpu > > structure. The lock checker would take a domain id and a level, and > > perform the check as: > > > > if ( mm_lock_level[domid] > level ) > > panic > > But this would open things up for deadlocks because of intermixed > lock usage between the calling domain's and the subject one's. > There needs to be a linear sequence of locks (of all involved > domains) describing the one and only order in which they may be > acquired. Well, my plan was to only check for deadlocks between the locks of the same domain, without taking into account intermixed domain locking. I guess at this point I will need some input from Tim and George about how to proceed, because I'm not sure how to weight locks when using intermixed domain locks, neither what is the correct order. The order in paging_log_dirty_op looks like a valid order that we want to support, but are there any others? Is it possible to have multiple valid interdomain lock orders that cannot be expressed using the current weighted lock ordering? Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] HVM driver domains do not appear to be usable with stubdomains
Jason Andryuk writes: >> So if I understand correctly, the problem is that PCI passthrough >> doesn't work with stubdomains, unless qemu is patched? > > Hi, Chris. > > I pulled in the QEMU patch because I found that my Intel wired > ethernet device didn't work without it. I believe my Intel wireless > NIC did though. Maybe it was the opposite... I should have documented > it more. However the device was passed-through - it just wasn't > operational. > > What device is :05:00.0? Is it listed by `xl pci-assignable-list`? Hi Jason, :05:00.0 was an Intel wired ethernet device. Yes it showed up in xl pci-assignable-list. In fact, my driver domain worked fine without a stubdomain; the card was passed through and I had it passing traffic with no problem at all. -- Chris ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events
On 12/13/18 4:58 PM, Jan Beulich wrote: On 13.12.18 at 14:18, wrote: >> So, long story short, on VMX we first send out the vm_event, while >> processing it an interrupt / exception may become pending, before >> resuming the VCPU that has sent out the vm_event there's a Xen function >> that picks up the pending interrupt and schedules it (writes it in the >> VMCS), and only then we attempt the emulation, which may overwrite it >> (because there's only one place we can write to schedule interrupts / >> exceptions). > > So perhaps the solution is indeed to change the order of how things > get done, instead of blocking interrupts? You seem to think this way > too, as per ... > >> 2. Interrupts are not blocked indefinitely - only until the emulation is >> done. It could be argued that that's really the proper place for them to >> be processed anyway - on an instruction boundary, _after_ the >> in-progress instruction has finished executing. It's just that with the >> vm_event introspection thing you could say that executing the current >> instruction may take a bit longer. > > ... this. Quite possibly, following the lead of the singlestepping code just seemed like the most straightforward way out of the problem. Of course an alternative way of handling interrupts would probably be preferrable, however we'd need a bit of guidance on how to go about it and in the meantime I don't see how this small fix would hurt. I remember Andrew suggesting taking something like this on at the Xen Developer Summit in Budapest but then of course much more important things happened with Meltdown & al. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 16/18] xen: automatically create XenBlockDevice-s
Am 13.12.2018 um 13:44 hat Paul Durrant geschrieben: > > Essentially, what I'm wondering is whether we have anything that could > > be treated more or less like another monitor besides QMP and HMP, which > > would internally work similar to HMP, i.e. map (almost) everything to > > QMP commands. > > Yes, it would be possible to have a separate 'compatibility' daemon to > watch xenstore and then formulate the correct sequence of QMP commands > to instantiate the backend, but that is more complicated and the right > answer of course is to have the toolstack send the QMP commands in the > first place. Okay, if someone is working on actually using QMP instead of xenstore, that's even better, of course. Disregard this point then. > > > +drive_new(drive_opts, IF_NONE, _err); > > > +if (local_err) { > > > +error_propagate_prepend(errp, local_err, > > > +"failed to create drive: "); > > > +goto done; > > > +} > > > > The other major point is that you're using the legacy drive_*() > > infrastructure, which should not only go away as soon as we can, but > > which is also full of magic and nasty surprises. > > > > I think the best way would be to create only a block node > > (BlockDriverState) here, and get an automatically created anonymous > > BlockBackend from the qdev drive property. > > > > There are two ways to achieve this: qmp_blockdev_add() would be optimal > > because that's a stable external interface. It would require you to > > specify a node-name (you already have the id parameter), and you'd use > > this node-name for the qdev drive property. > > > > qmp_blockdev_add() requires a BlockdevOptions object, which you can > > either construct manually in C or use a visitor to convert from an > > options QDict. Maybe in this case, converting from a QDict is better > > because otherwise you need special code for each block driver. > > > > I was using the legacy interfaces because this code is, as I said > above, supposed to be a mechanism only required for compatibility with > the way toolstacks currently operate (and so is essentially 'legacy') > but using the top-level QMP entry point to construct the does sound > do-able as long as the underlying file locking can still be avoided > with that mechanism. Since BlockdevOptions seems to be an > auto-generated structure, figuring out how to fill it in manually is > somewhat tricky so the QDict approach is preferable but I'll have to > figure out how to use a visitor to do the translation. You can grep for qobject_input_visitor_new() for examples. Some block drivers use this internally to convert .bdrv_create options to a QAPI object, it looks like this: /* Now get the QAPI type BlockdevCreateOptions */ v = qobject_input_visitor_new_flat_confused(qdict, errp); if (!v) { ret = -EINVAL; goto finish; } visit_type_BlockdevCreateOptions(v, NULL, _options, _err); visit_free(v); if (local_err) { error_propagate(errp, local_err); ret = -EINVAL; goto finish; } Obviously, you'd use visit_type_BlockdevOptions instead. You also might not need the _flat_confused variant, which is about QDicts where we don't know whether values are stored as their actual type or as strings. In your code, you have control over the types in the QDict, so this shouldn't be a problem. > > The other way would be calling bdrv_open() directly, which gives you a > > BlockDriverState, but it risks using legacy functionality that will be > > deprecated soon. Again, you'd take the node-name and pass it to the qdev > > drive option below. > > Yes, xen_disk does things this way but then we end up with legacy > block device and still fall foul of the assertions buried in the code. Yes and no. xen_disk is better in that it avoids the drive_* things (which internally call bdrv_open() anyway), but it's worse in that it directly assigns blkdev->blk instead of using the qdev property. What I meant here is that you create the BDS with bdrv_open(), but then you wouldn't assign it directly to some field in the device state, but just put the node-name of the BDS into the qdev property 'drive'. This would be almost like qmp_blockdev_add(), except without validation against the QAPI schema. But if using qmp_blockdev_add() is easy enough, that's preferable. > > > > > + > > > +done: > > > +g_free(drive_optstr); > > > +g_free(format); > > > +g_free(file); > > > +} > > > + > > > +static void xen_block_device_create(BusState *bus, const char *name, > > > +QDict *opts, Error **errp) > > > +{ > > > +unsigned long number; > > > +const char *vdev, *device_type; > > > +BlockBackend *blk = NULL; > > > +IOThread *iothread = NULL; > > > +DeviceState *dev = NULL; > > > +Error *local_err = NULL; > > > +const char *type; > > > +XenBlockDevice *blockdev; > > > + > > > +
Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events
>>> On 13.12.18 at 14:18, wrote: > So, long story short, on VMX we first send out the vm_event, while > processing it an interrupt / exception may become pending, before > resuming the VCPU that has sent out the vm_event there's a Xen function > that picks up the pending interrupt and schedules it (writes it in the > VMCS), and only then we attempt the emulation, which may overwrite it > (because there's only one place we can write to schedule interrupts / > exceptions). So perhaps the solution is indeed to change the order of how things get done, instead of blocking interrupts? You seem to think this way too, as per ... > 2. Interrupts are not blocked indefinitely - only until the emulation is > done. It could be argued that that's really the proper place for them to > be processed anyway - on an instruction boundary, _after_ the > in-progress instruction has finished executing. It's just that with the > vm_event introspection thing you could say that executing the current > instruction may take a bit longer. ... this. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests
>>> On 13.12.18 at 15:14, wrote: > On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote: >> >>> On 13.12.18 at 12:39, wrote: >> > Well, Just keeping correct order between each domain locks should be >> > enough? >> > >> > Ie: exactly the same that Xen currently does but on a per-domain >> > basis. This is feasible, but each CPU would need to store the lock >> > order of each possible domain: >> > >> > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]); >> > >> > This would consume ~32KB per CPU, which is not that much but seems a >> > waste when most of the time a single entry will be used. >> >> Well, tracking by domain ID wouldn't help you - the controlling >> domain may well have a higher ID than the being controlled one, >> i.e. the nesting you want needs to be independent of domain ID. > > It's not tracking the domain ID, but rather tracking the lock level of > each different domain, hence the need for the array in the pcpu > structure. The lock checker would take a domain id and a level, and > perform the check as: > > if ( mm_lock_level[domid] > level ) > panic But this would open things up for deadlocks because of intermixed lock usage between the calling domain's and the subject one's. There needs to be a linear sequence of locks (of all involved domains) describing the one and only order in which they may be acquired. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable-smoke test] 131293: tolerable all pass - PUSHED
flight 131293 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/131293/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass version targeted for testing: xen 5c08550ff4f3804df471b12c29ae170de981fc13 baseline version: xen 00c96d77422a4b84247bec5dadf434363d312cac Last test of basis 131284 2018-12-13 00:00:41 Z0 days Testing same since 131293 2018-12-13 12:00:46 Z0 days1 attempts People who touched revisions under test: Brian Woods Paul Durrant jobs: build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-amd64-amd64-xl-qemuu-debianhvm-i386 pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 00c96d7742..5c08550ff4 5c08550ff4f3804df471b12c29ae170de981fc13 -> smoke ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function
>>> On 13.12.18 at 15:20, wrote: > On Thu, Dec 13, 2018 at 03:17:05AM -0700, Jan Beulich wrote: >> >>> On 13.12.18 at 10:14, wrote: >> > On Thu, Dec 13, 2018 at 12:45:07AM -0700, Jan Beulich wrote: >> >> >>> On 12.12.18 at 18:05, wrote: >> >> > On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote: >> >> >> The MMIO side of things of course still remains unclear. >> >> > >> >> > Right, for the MMIO and the handling of grant and foreign mappings it's >> >> > not clear how we want to proceed. >> >> > >> >> > Maybe account for all host RAM (total_pages) plus MMIO BARs? >> >> >> >> Well, I thought we've already settled on it being impossible to >> >> account for all MMIO BARs at this point. >> > >> > Well, I could iterate over all the registered PCI devices and size >> > the BARs (without VF BARs at least initially). This is quite >> > cumbersome, my other option would be using max_page and hope that >> > there are enough holes to make up for BAR MMIO regions. >> >> Well, maybe we could live with this for now. I certainly would >> prefer to have a 3rd opinion though, as I continue to feel uneasy >> with this rather imprecise estimation (i.e. I'd much prefer a more >> dynamic / on-demand approach). > > I agree it's not a perfect solution, but I think what's currently done > is even worse, and we already had bug reports of users seeing Xen > panic at PVH Dom0 build time if no dom0_mem parameter is specified. > > Would you be OK with using max_page then? I'm not going to say yes or no here without having seen a (qualified) 3rd opinion. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation
>>> On 13.12.18 at 14:25, wrote: > The question is, how much drift can be tolerated even without my patch. This depends on what a system is used for. A few seconds off may be fine for one purpose, but a significant problem for another. Similarly a sudden (however small) change to the TSC tick rate may be a problem for one purpose, but not for another. If otoh (and just to give an example) it was determined that the TSC tick rate on a physical system varies over time within certain boundaries, then leveraging this to change (slowly, i.e. with no steeper a gradient than might be observable on bare hardware) the rate to match the new host's might be an option. Emulation then could be turned off after a certain period of time. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 25/25] argo: implement the get_config op to query notification config
>>> On 01.12.18 at 02:33, wrote: > --- a/xen/common/argo.c > +++ b/xen/common/argo.c > @@ -1656,6 +1656,46 @@ argo_sendv(struct domain *src_d, const argo_addr_t > *src_addr, > return ( ret < 0 ) ? ret : len; > } > > +static void > +argo_get_config(struct domain *d, argo_get_config_t *get_config) > +{ > +unsigned int method = argo_signal_method(d); > + > +get_config->signal_method = method; > + > +switch ( method ) > +{ > +case ARGO_SIGNAL_METHOD_EVTCHN: > +{ > +read_lock(_lock); > +read_lock(>argo->lock); > + > +get_config->signal.evtchn = d->argo->evtchn_port; > + > +read_unlock(>argo->lock); > +read_unlock(_lock); > + > +argo_dprintk("signal for dom:%d evtchn %u\n", d->domain_id, > + get_config->signal.evtchn); > + > +break; > +} > +case ARGO_SIGNAL_METHOD_VIRQ: > +{ > +get_config->signal.virq = VIRQ_ARGO; > + > +argo_dprintk("signal for dom:%d virq %u\n", d->domain_id, > + get_config->signal.virq); > +break; > +} > +default: > +{ > +BUG(); > +break; > +} There are quite a few stray braces here. > +typedef struct argo_get_config > +{ > +uint32_t signal_method; > +union > +{ > +evtchn_port_t evtchn; > +uint32_t virq; > +} signal; > +uint32_t reserved; Judging from the description, did you perhaps mean to put uint32_t reserved[2] inside the union? Then again "get_config" sounds much more generic than just obtaining the notification method. > @@ -244,6 +257,21 @@ struct argo_ring_message_header > */ > #define ARGO_MESSAGE_OP_notify 4 > > +/* > + * ARGO_MESSAGE_OP_get_config > + * > + * Queries Xen for argo configuration values. > + * > + * Used by a guest to obtain the signal method in use for Argo notifications > + * and the event channel port or isa irq in use. ISA IRQ? It's a vIRQ that you have as alternative to bare event-channel signaling. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [ovmf test] 131283: regressions - FAIL
flight 131283 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/131283/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-i3866 xen-buildfail REGR. vs. 129475 build-i386-xsm6 xen-buildfail REGR. vs. 129475 build-amd64 6 xen-buildfail REGR. vs. 129475 build-amd64-xsm 6 xen-buildfail REGR. vs. 129475 Tests which did not succeed, but are not blocking: build-amd64-libvirt 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a build-i386-libvirt1 build-check(1) blocked n/a version targeted for testing: ovmf 0d68ce514b922f887da28c2a12b8d37cf23903f0 baseline version: ovmf 5ae3184d8c59f7bbb84bad482df6b8020ba58188 Last test of basis 129475 2018-11-05 21:13:11 Z 37 days Failing since129526 2018-11-06 20:49:26 Z 36 days 153 attempts Testing same since 131283 2018-12-12 23:03:20 Z0 days1 attempts People who touched revisions under test: Achin Gupta Ard Biesheuvel Bob Feng bob.c.f...@intel.com BobCF Chasel Chiu Chasel, Chiu Chen A Chen Dandan Bi David Wei Eric Dong Feng, Bob C Fu Siyuan Gary Lin Hao Wu Jaben Carsey Jeff Brasen Jian J Wang Jiaxin Wu Jiewen Yao Laszlo Ersek Leif Lindholm Liming Gao Liu Yu Marc Zyngier Marcin Wojtas Ming Huang Pedroa Liu Ruiyu Ni shenglei Shenglei Zhang Star Zeng Sughosh Ganu Sumit Garg Sun, Zailiang Thomas Abraham Tomasz Michalec Vijayenthiran Subramaniam Wang BinX A Wu Jiaxin Yonghong Zhu yuchenlin Zailiang Sun Zhang, Chao B Zhao, ZhiqiangX ZhiqiangX Zhao zwei4 jobs: build-amd64-xsm fail build-i386-xsm fail build-amd64 fail build-i386 fail build-amd64-libvirt blocked build-i386-libvirt blocked build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 blocked test-amd64-i386-xl-qemuu-ovmf-amd64 blocked sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 3421 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 24/25] argo: unmap rings on suspend and send signal to ring-owners on resume
>>> On 01.12.18 at 02:33, wrote: > so that the guest may re-register the rings on resume with current mappings. Is this something guests really need help with, rather than managing it on their own? What does "current mappings" here mean, i.e. why do rings need re-registration in the first place? > +void > +argo_resume(struct domain *d) > +{ > +bool send_wakeup; > + > +if ( !d ) > +return; > + > +if ( !get_domain(d) ) > +return; > + > +read_lock(_lock); > + > +read_lock(>argo->lock); > +send_wakeup = ( d->argo->ring_count > 0 ); > +read_unlock(>argo->lock); > + > +if ( send_wakeup ) > +argo_signal_domain(d); > + > +read_unlock(_lock); > + > +put_domain(d); > +} domain_resume() also gets called from domain_soft_reset(). Do you really want such handling in that case as well, when after a soft-reset the domain is supposed to be "blank"? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function
On Thu, Dec 13, 2018 at 03:17:05AM -0700, Jan Beulich wrote: > >>> On 13.12.18 at 10:14, wrote: > > On Thu, Dec 13, 2018 at 12:45:07AM -0700, Jan Beulich wrote: > >> >>> On 12.12.18 at 18:05, wrote: > >> > On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote: > >> >> The MMIO side of things of course still remains unclear. > >> > > >> > Right, for the MMIO and the handling of grant and foreign mappings it's > >> > not clear how we want to proceed. > >> > > >> > Maybe account for all host RAM (total_pages) plus MMIO BARs? > >> > >> Well, I thought we've already settled on it being impossible to > >> account for all MMIO BARs at this point. > > > > Well, I could iterate over all the registered PCI devices and size > > the BARs (without VF BARs at least initially). This is quite > > cumbersome, my other option would be using max_page and hope that > > there are enough holes to make up for BAR MMIO regions. > > Well, maybe we could live with this for now. I certainly would > prefer to have a 3rd opinion though, as I continue to feel uneasy > with this rather imprecise estimation (i.e. I'd much prefer a more > dynamic / on-demand approach). I agree it's not a perfect solution, but I think what's currently done is even worse, and we already had bug reports of users seeing Xen panic at PVH Dom0 build time if no dom0_mem parameter is specified. Would you be OK with using max_page then? This is the less complex option to implement ATM, and BAR sizing can be added later together with a more dynamic p2m memory management. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
>>> On 01.12.18 at 02:33, wrote: > * x86 PV domains are notified via event channel. > > PV guests are known to have the event channel software present in the guest > kernel, so it is fine to depend on and use it. > > * x86 HVM domains and all ARM domains are notified via VIRQ. > > The intent is to remove the requirement for event channel software to be > installed within these guests in order to use Argo. VIRQ signalling is also > the method that has been in use for the longest period with this hypercall > in both XenClient and OpenXT. I'm afraid I don't follow: send_guest_global_virq() uses, well, evtchn_port_set_pending(), just like evtchn_send() does. Therefore how does sending a vIRQ help with a guest without event channel awareness? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests
On Thu, Dec 13, 2018 at 05:51:51AM -0700, Jan Beulich wrote: > >>> On 13.12.18 at 12:39, wrote: > > On Wed, Dec 12, 2018 at 09:07:14AM -0700, Jan Beulich wrote: > >> >>> On 12.12.18 at 15:54, wrote: > >> > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d, > >> > bytes = (unsigned int)((sc->pages - pages + 7) >> > >> > 3); > >> > if ( likely(peek) ) > >> > { > >> > +if ( paging_mode_enabled(current->domain) ) > >> > +/* > >> > + * Drop the target p2m lock, or else Xen will > >> > panic > >> > + * when trying to acquire the p2m lock of the > >> > caller > >> > + * due to invalid lock order. Note that there > >> > are no > >> > + * lock ordering issues here, and the panic is > >> > due to > >> > + * the fact that the lock level tracking > >> > doesn't record > >> > + * the domain the lock belongs to. > >> > + */ > >> > +paging_unlock(d); > >> > >> This makes it sound as if tracking the domain would help. It doesn't, > >> at least not as long as there is not also some sort of ordering or > >> hierarchy between domains. IOW I'd prefer if the "doesn't" became > >> "can't". > > > > Well, Just keeping correct order between each domain locks should be > > enough? > > > > Ie: exactly the same that Xen currently does but on a per-domain > > basis. This is feasible, but each CPU would need to store the lock > > order of each possible domain: > > > > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]); > > > > This would consume ~32KB per CPU, which is not that much but seems a > > waste when most of the time a single entry will be used. > > Well, tracking by domain ID wouldn't help you - the controlling > domain may well have a higher ID than the being controlled one, > i.e. the nesting you want needs to be independent of domain ID. It's not tracking the domain ID, but rather tracking the lock level of each different domain, hence the need for the array in the pcpu structure. The lock checker would take a domain id and a level, and perform the check as: if ( mm_lock_level[domid] > level ) panic > >> Now before we go this route I think we need to consider whether > >> this really is the only place where the two locks get into one > >> another's way. That's because I don't think we want to introduce > >> more of such, well, hackery, and hence we'd otherwise need a > >> better solution. For example the locking model could be adjusted > >> to allow such nesting in the general case: Dom0 and any domain > >> whose ->target matches the subject domain here could be given > >> a slightly different "weight" in the lock order violation check logic. > > > > So locks from domains != current would be given a lower order, let's > > say: > > > > #define MM_LOCK_ORDER_nestedp2m 8 > > #define MM_LOCK_ORDER_p2m16 > > #define MM_LOCK_ORDER_per_page_sharing 24 > > #define MM_LOCK_ORDER_altp2mlist 32 > > #define MM_LOCK_ORDER_altp2m 40 > > #define MM_LOCK_ORDER_pod48 > > #define MM_LOCK_ORDER_page_alloc 56 > > #define MM_LOCK_ORDER_paging 64 > > #define MM_LOCK_ORDER_MAXMM_LOCK_ORDER_paging > > > > If domain != current, the above values are used. If domain == current > > the values above are used + MM_LOCK_ORDER_MAX? So in that case the > > order of MM_LOCK_ORDER_p2m against the current domain would be 16 + 64 > > = 80? > > > > This has the slight inconvenience that not all mm lock call sites have > > the target domain available, but can be solved. > > No, I wasn't envisioning a really generic approach, i.e. > permitting this for all of the locks. Do we need this? Till now > I was rather considering to do this for just the paging lock, > with Dom0 and the controlling domains using just a small > (+/- 2 and +/- 1) offset from the normal paging one. > However, I now realize that indeed there may be more of > such nesting, and you've merely hit a specific case of it. So far that's the only case I've hit, but I'm quite sure there are a non-trivial amount of hypercalls that where only used by a PV Dom0, so I don't discard finding other such instances. > In any event it seems to me that you've got the ordering > inversed, i.e. domain == current (or really: the general case, > i.e. after excluding the two special cases) would use base > values, domain == currd->target would use some offset, and > Dom0 would use double the offset. > > With it extended to all locks I'm no longer sure though that > the end result would be safe. I think it would be helpful to > rope in whoever did originally design this locking model. I'm adding Tim which I think is the original author of
Re: [Xen-devel] [PATCH 22/25] xen/evtchn: expose send_guest_global_virq for use within Xen
>>> On 01.12.18 at 02:33, wrote: > To be used by Argo for delivery of notifications to some guests. Better not to make this a separate patch: By folding it into where it's needed it is easier for everyone to judge whether the exposure is indeed necessary, and it also eliminates the risk of the series getting committed up to here, and the function then being pointlessly non- static for an extended period of time. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] arm: xl vcpu-pin leads to oom-killer slashing processes
Hello All, OK, I've discovered a mechanism of the issue. It is because of `d->max_pages = ~0U;` in a `construct_dom0()`. When I do vcpu-pin, libxl updates memory nodes in xenstore for Dom0. Then kernel watch sees those changes and trying to set new target for ballon, but the target becomes extremely high, and baloon sucks all the pages. In my kernel (4.14) in `watch_target()` function there is a code: target_diff = xen_pv_domain() ? 0 : static_max - balloon_stats.target_pages; Here we have `xen_pv_domain()` equal to zero, so `target_diff` big. Then, few lines below: balloon_set_new_target(new_target - target_diff); `balloon_set_new_target()` receives a value wrapped over 64bit what kills the system. Now I'm looking for an appropriate kernel patch for the kernel, to fix that. Any suggestions? -- Sincerely, Andrii Anisov. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 21/25] argo: add array_index_nospec to guard the result of the hash func
>>> On 01.12.18 at 02:33, wrote: > This is out of an abundance of caution, since this is a very basic hash > function, chosen more for its bucket distribution properties to cluster > related > rings rather than for cryptographic strength or any uniformness of output, > and it operates upon values supplied by the guest just before being used as an > array index. Same here: Better to put this in place right away for new code than to incrementally add it later. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 18/25] argo: limit the max number of rings that a domain may register.
>>> On 01.12.18 at 02:32, wrote: > Very basic implementation: a fixed limit of 128. Such restrictions to limit resource use would better be implemented right away for code that can be used (in a limited way) already with just the initial parts of the series applied. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 16/25] argo: implement the notify op
>>> On 01.12.18 at 02:32, wrote: > +static uint32_t > +argo_ringbuf_payload_space(struct domain *d, struct argo_ring_info > *ring_info) > +{ > +argo_ring_t ring; > +int32_t ret; > + > +ASSERT(spin_is_locked(_info->lock)); > + > +ring.len = ring_info->len; > +if ( !ring.len ) > +return 0; > + > +ring.tx_ptr = ring_info->tx_ptr; > + > +if ( argo_ringbuf_get_rx_ptr(ring_info, _ptr) ) > +return 0; > + > +argo_dprintk("argo_ringbuf_payload_space: tx_ptr=%d rx_ptr=%d\n", > + ring.tx_ptr, ring.rx_ptr); > + > +if ( ring.rx_ptr == ring.tx_ptr ) > +return ring.len - sizeof(struct argo_ring_message_header); > + > +ret = ring.rx_ptr - ring.tx_ptr; > +if ( ret < 0 ) > +ret += ring.len; Seeing these two if()-s - how is an empty ring distinguished from a completely full one? I'm getting the impression that ring.rx_ptr == ring.tx_ptr in both cases. > +ret -= sizeof(struct argo_ring_message_header); > +ret -= ARGO_ROUNDUP(1); Wouldn't you instead better round ret to a suitable multiple of whatever granularity you try to arrange for here? Otherwise what is this extra subtraction supposed to do? > @@ -627,6 +679,43 @@ argo_pending_remove_all(struct argo_ring_info *ring_info) > } > } > > +static void > +argo_pending_notify(struct hlist_head *to_notify) > +{ > +struct hlist_node *node, *next; > +struct argo_pending_ent *pending_ent; > + > +ASSERT(rw_is_locked(_lock)); > + > +hlist_for_each_entry_safe(pending_ent, node, next, to_notify, node) > +{ > +hlist_del(_ent->node); > +argo_signal_domid(pending_ent->id); > +xfree(pending_ent); > +} > +} > + > +static void > +argo_pending_find(const struct domain *d, struct argo_ring_info *ring_info, > + uint32_t payload_space, struct hlist_head *to_notify) > +{ > +struct hlist_node *node, *next; > +struct argo_pending_ent *ent; > + > +ASSERT(rw_is_locked(>argo->lock)); > + > +spin_lock(_info->lock); > +hlist_for_each_entry_safe(ent, node, next, _info->pending, node) > +{ > +if ( payload_space >= ent->len ) > +{ > +hlist_del(>node); > +hlist_add_head(>node, to_notify); > +} > +} So if there's space available to fit e.g. just the first pending entry, you'd continue the loop and also signal all others, provided their lengths aren't too big? What good does producing such a burst of notifications do, when only one of the interested parties is actually going to be able to put something on the ring? > @@ -705,6 +812,107 @@ argo_ring_remove_info(struct domain *d, struct > argo_ring_info *ring_info) > xfree(ring_info); > } > > +/*ring data*/ Now this comment is malformed in any event. > +static int > +argo_fill_ring_data(struct domain *src_d, const? > +static long > +argo_notify(struct domain *d, > +XEN_GUEST_HANDLE_PARAM(argo_ring_data_t) ring_data_hnd) > +{ > +argo_ring_data_t ring_data; > +int ret = 0; > + > +read_lock(_lock); > + > +if ( !d->argo ) > +{ > +read_unlock(_lock); > +argo_dprintk("!d->argo, ENODEV\n"); > +return -ENODEV; > +} > + > +argo_notify_check_pending(d); > + > +do { > +if ( !guest_handle_is_null(ring_data_hnd) ) > +{ > +/* Quick sanity check on ring_data_hnd */ > +ret = copy_field_from_guest_errno(_data, ring_data_hnd, > magic); > +if ( ret ) > +break; > + > +if ( ring_data.magic != ARGO_RING_DATA_MAGIC ) > +{ > +argo_dprintk( > +"ring.magic(%"PRIx64") != ARGO_RING_MAGIC(%llx), > EINVAL\n", > +ring_data.magic, ARGO_RING_MAGIC); > +ret = -EINVAL; > +break; > +} > + > +ret = copy_from_guest_errno(_data, ring_data_hnd, 1); > +if ( ret ) > +break; > + > +{ > +/* > + * This is a guest pointer passed as a field in a struct > + * so XEN_GUEST_HANDLE is used. > + */ > +XEN_GUEST_HANDLE(argo_ring_data_ent_t) ring_data_ent_hnd; > +ring_data_ent_hnd = guest_handle_for_field(ring_data_hnd, > + > argo_ring_data_ent_t, > + data[0]); > +ret = argo_fill_ring_data_array(d, ring_data.nent, > +ring_data_ent_hnd); > +} Stray braces and bogus indentation: The declaration can move up and then you don't need the braces and the extra level of indentation. > @@ -103,6 +104,40 @@ typedef struct argo_ring > */ > #define ARGO_ROUNDUP(a) (((a) + 0xf) & ~(typeof(a))0xf) > > +/* > + * Notify flags > + */ > +/* Ring is empty */ > +#define
Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation
Am Thu, 13 Dec 2018 03:41:44 -0700 schrieb "Jan Beulich" : > And further argumentation that everyone is using NTP anyway > doesn't make it any better, when it's no-where written down that > Xen is unusable with NTP running in all guests (I'm exaggerating > here just to get the point over). Don't forget that e.g. with > XenoLinux'es independent-wallclock setting defaulting to false, we've > been suggesting that people _don't_ need to use NTP inside their > (admittedly PV) guests. IOW - your change may not break people > not using NTP. Hence I don't think the mode you introduce can be > a default-on one, which in turn means a per-domain or at least > global enable control is needed (as over previous iterations we > seem to have been agreeing). Regarding the possible unexpected drift if NTP is not used, and the domU uses TSC as clocksource anyway: If one host is on one edge of the assumed cpu_khz value and the other host is on the opposite edge, and the range will be 200 as it is in my patch, the daily drift on a 2.3GHz host would be 7.4seconds per day. This number is based on the fact that 200kz happen within a time span of 86us, which I think is correct. The question is, how much drift can be tolerated even without my patch. Looking through 5 different hosts I use, the range is between -11 and +22, and it happens to be +56 on my Laptop. So even that host with a calculated drift of +22 would be off by 2 seconds per day if ntpd would not run. Olaf pgpizy58joV2T.pgp Description: Digitale Signatur von OpenPGP ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events
On 12/13/18 2:39 PM, Julien Grall wrote: > Hi, > > On 12/13/18 12:15 PM, Razvan Cojocaru wrote: >> On 12/13/18 2:04 PM, Julien Grall wrote: >>> Hi, >>> >>> On 12/13/18 8:03 AM, Razvan Cojocaru wrote: On 12/13/18 8:54 AM, Tian, Kevin wrote: >> From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com] >> Sent: Tuesday, December 11, 2018 8:33 PM >> >>> In any case, I think you want to rename the function and/or document >>> more that expected behavior. >> >> You're right, I should probably rename that function / variable to >> better reflect what it signifies - that sync vm_event processing >> is in >> progress. For VMX and SVM, that simply means that interrupts will be >> blocked, and the value of the variable will be correct and possibly >> useful for ARM as well. >> > > what about vm_event_block_interrupt_injection? in that case > it's injection instead of interrupt itself being blocked. blocking > injection should mean same thing cross archs? >>> >>> Why would you want to block all interrupts injections? When I looked at >>> the details, it feels more you want to block exceptions. >>> >>> I can see use for blocking exception on Arm, blocking all the interrupts >>> is likely going to bring more issues than solving anything. >>> >>> So a better name would be vm_event_block_exception_injection. >> >> I'd like to block the writing of anything, by vmx_intr_assist(), into >> VM_ENTRY_INTR_INFO, because an emulation attempt that happens >> post-vmx_intr_assist() (because the vm_event client application has >> requested it) may write an exception of its own there. >> >> Since vmx_intr_assist() is called on VMX between the time of sending out >> the vm_event and the emulation (which happens in >> hvm_vm_event_do_resume()), we want to block everything that it may write >> in the VMCS until the emulation is done. I think that's more than just >> exceptions. > > I don't know in details how x86 virtualization works, so it is a bit > hard to comment on that. However, it feels to me that you are > introducing in common code a function that will workaround an > architecture specific problem. > > Can you try to explain it in agnostic word? I'll certainly do my best. :) Assume the following scenario: 1. A guest instruction tries to write into read-only memory (as set in the EPT), with monitoring active for the domain. 2. An EPT violation exit occurs, and in the course of it, the VCPU that was running the code that produced the violation is paused and a vm_event is sent to an introspection application. 3. The introspection application is processing the vm_event. During this time, some event may occur (which will remain pending). 4. The introspection agent replies with "please emulate" the current instruction - since the emulator (currently) does not care about EPT restrictions, so this is a cheap way of proceeding without lifting them. 5. With x86, we have {vmx,svm}_intr_assist(), which is guaranteed to be called _before_ the VCPU is woken up again. This is the designated "pick up pending interrupts" function on x86. Perhaps this (real-life) backtrace is helpful: (XEN) Xen call trace: (XEN)[] vmx.c#__vmx_inject_exception+0xa1/0xda (XEN)[] vmx_inject_extint+0x94/0x9f (XEN)[] vmx_intr_assist+0x4ee/0x5ad (XEN)[] vmx_asm_vmexit_handler+0xff/0x270 On x86, there can (currently) be only one scheduled interrupt / exception, and that is written into VM_ENTRY_INTR_INFO in the VMCS on Intel. 6. _After_ vmx_intr_assist() has run, we are now trying to emulate the current instruction, which may cause an exception, which will overwrite the pending interrupt. So, long story short, on VMX we first send out the vm_event, while processing it an interrupt / exception may become pending, before resuming the VCPU that has sent out the vm_event there's a Xen function that picks up the pending interrupt and schedules it (writes it in the VMCS), and only then we attempt the emulation, which may overwrite it (because there's only one place we can write to schedule interrupts / exceptions). > To expand what I said above, I think it is reasonable to request > blocking exception (e.g page-fault...) because they can be generated by > an instruction. However, all interrupts generated by the interrupt > controller (e.g device, IPI..) should not be blocked. > > AFAIU your description, it is the same path to handle the two on x86, > right? Pretty much, yes. Technically speaking there are two that I am aware of, the second of them being the IDT vectoring case just before the EPT fault exit - but that's outside the scope of this patch. Just mentioning it for completeness. Also, it is worth mentioning that: 1. This is the exact same strategy employed by the single-stepping functionality on VMX / Intel. In fact, if you look at xen/arch/x86/hvm/vmx/intr.c, in vmx_intr_assist(), you'll see an early return blocking injection of interrupts for the duration of
Re: [Xen-devel] [PATCH 05/25] argo: Add initial argo_init and argo_destroy
>>> On 01.12.18 at 02:32, wrote: > --- /dev/null > +++ b/xen/include/public/argo.h > @@ -0,0 +1,55 @@ > +/** > + * Argo : Hypervisor-Mediated data eXchange > + * > + * Derived from v4v, the version 2 of v2v. > + * > + * Copyright (c) 2010, Citrix Systems > + * Copyright (c) 2018, BAE Systems > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of the GNU General Public License > + * along with this program; if not, write to the Free Software > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA > + */ > + > +#ifndef __XEN_PUBLIC_ARGO_H__ > +#define __XEN_PUBLIC_ARGO_H__ > + > +#include "xen.h" > + > +typedef struct argo_addr > +{ > +uint32_t port; > +domid_t domain_id; > +uint16_t pad; > +} argo_addr_t; > + > +typedef struct argo_ring_id > +{ > +struct argo_addr addr; > +domid_t partner; > +uint16_t pad; > +} argo_ring_id_t; > + > +typedef struct argo_ring > +{ > +uint64_t magic; > +argo_ring_id_t id; > +uint32_t len; > +/* Guests should use atomic operations to access rx_ptr */ > +uint32_t rx_ptr; > +/* Guests should use atomic operations to access tx_ptr */ > +uint32_t tx_ptr; > +uint8_t reserved[32]; > +#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L > +uint8_t ring[]; > +#elif defined(__GNUC__) > +uint8_t ring[0]; > +#endif > +} argo_ring_t; Btw, for all structure types you define, and with your desire to avoid compat mode translation, you should add ?-prefixed entries to xen/include/xlat.lst and invoke the produced CHECK_* macros from somewhere. If, for reference, you'd look at existing instances, you'll then also find another reason why all these would better have xen_ prefixes. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v9 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 12/12/18 21:39, Borislav Petkov wrote: > On Tue, Dec 11, 2018 at 11:29:21AM -0800, Maran Wilson wrote: >> Is your question about what options you need to provide to Qemu? Or is your >> question about the SW implementation choices? >> >> Assuming the former... > Yeah, that's what I wanted to know. But looking at it, I'm booting > bzImage here just as quickly and as flexible so I don't see the > advantage of this new method for my use case here of booting kernels > in qemu. It's not firmware that is slow, decompression is. Unlike Xen, which is using PVH with a regular bzImage and decompression in the host, KVM is using PVH to boot a vmlinux with no decompression at all. Paolo > But maybe there's a good use case where firmware is slow and one doesn't > really wanna noodle through it or when one does start a gazillion VMs > per second or whatever... ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v8 0/7] KVM: x86: Allow Qemu/KVM to use PVH entry point
On 12/12/18 19:09, Maran Wilson wrote: > On 12/6/2018 1:21 PM, Paolo Bonzini wrote: >> On 06/12/18 07:02, Maran Wilson wrote: >>> For certain applications it is desirable to rapidly boot a KVM virtual >>> machine. In cases where legacy hardware and software support within the >>> guest is not needed, Qemu should be able to boot directly into the >>> uncompressed Linux kernel binary without the need to run firmware. >>> >>> There already exists an ABI to allow this for Xen PVH guests and the ABI >>> is supported by Linux and FreeBSD: >>> >>> https://xenbits.xen.org/docs/unstable/misc/pvh.html >>> >>> This patch series would enable Qemu to use that same entry point for >>> booting KVM guests. >> Thanks! I should be able to post a Tested-by next Monday. Boris, are >> you going to pick it up for 4.21? > > Hi Paolo, > > Are you still planning on running some testing of your own for these > patches? Should Boris wait to hear from you before moving forward? He can go ahead with v9, I was just curious about it. Paolo ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen: xen-pciback: Reset MSI-X state when exposing a device
On Thu, Dec 13, 2018 at 12:54:52AM -0700, Jan Beulich wrote: On 13.12.18 at 04:46, wrote: >> On Wed, Dec 12, 2018 at 08:21:39AM -0700, Jan Beulich wrote: >> On 12.12.18 at 16:18, wrote: On Wed, Dec 12, 2018 at 01:51:01AM -0700, Jan Beulich wrote: On 12.12.18 at 08:06, wrote: >> On Wed, Dec 05, 2018 at 09:01:33AM -0500, Boris Ostrovsky wrote: >>>On 12/5/18 4:32 AM, Roger Pau Monné wrote: On Wed, Dec 05, 2018 at 10:19:17AM +0800, Chao Gao wrote: > I find some pass-thru devices don't work any more across guest reboot. > Assigning it to another guest also meets the same issue. And the only > way to make it work again is un-binding and binding it to pciback. > Someone reported this issue one year ago [1]. More detail also can be > found in [2]. > > The root-cause is Xen's internal MSI-X state isn't reset properly > during reboot or re-assignment. In the above case, Xen set maskall bit > to mask all MSI interrupts after it detected a potential security > issue. Even after device reset, Xen didn't reset its internal maskall > bit. As a result, maskall bit would be set again in next write to > MSI-X message control register. > > Given that PHYSDEVOPS_prepare_msix() also triggers Xen resetting MSI-X > internal state of a device, we employ it to fix this issue rather than > introducing another dedicated sub-hypercall. > > Note that PHYSDEVOPS_release_msix() will fail if the mapping between > the device's msix and pirq has been created. This limitation prevents > us calling this function when detaching a device from a guest during > guest shutdown. Thus it is called right before calling > PHYSDEVOPS_prepare_msix(). s/PHYSDEVOPS/PHYSDEVOP/ (no final S). And then I would also drop the () at the end of the hypercall name since it's not a function. I'm also wondering why the release can't be done when the device is detached from the guest (or the guest has been shut down). This makes me worry about the raciness of the attach/detach procedure: if there's a state where pciback assumes the device has been detached from the guest, but there are still pirqs bound, an attempt to attach to another guest in such state will fail. >>> >>>I wonder whether this additional reset functionality could be done out >>>of xen_pcibk_xenbus_remove(). We first do a (best effort) device reset >>>and then do the extra things that are not properly done there. >> >> No. It cannot be done in xen_pcibk_xenbus_remove() without modifying >> the handler of PHYSDEVOP_release_msix. To do a successful Xen internal >> MSI-X state reset, PHYSDEVOP_{release, prepare}_msix should be finished >> without error. But ATM, xen expects that no msi is bound to pirq when >> doing PHYSDEVOP_release_msix. Otherwise it fails with error code -EBUSY. >> However, the expectation isn't guaranteed in xen_pcibk_xenbus_remove(). >> In some cases, if qemu fails to unmap MSIs, MSIs are unmapped by Xen >> at last minute, which happens after device reset in >> xen_pcibk_xenbus_remove(). > >But that may need taking care of: I don't think it is a good idea to have >anything left from the prior owning domain when the device gets reset. >I.e. left over IRQ bindings should perhaps be forcibly cleared before >invoking the reset; Agree. How about pciback to track the established IRQ bindings? Then pciback can clear irq binding before invoking the reset. >>> >>>How would pciback even know of those mappings, when it's qemu >>>who establishes (and manages) them? >> >> I meant to expose some interfaces from pciback. And pciback serves >> as the proxy of IRQ (un)binding APIs. > >If at all possible we should avoid having to change more parties (qemu, >libxc, kernel, hypervisor) than really necessary. Remember that such >a bug fix may want backporting, and making sure affected people have >all relevant components updated is increasingly difficult with their >number growing. > >in fact I'd expect this to happen in the course of >domain destruction, and I'd expect the device reset to come after the >domain was cleaned up. Perhaps simply an ordering issue in the tool >stack? I don't think reversing the sequences of device reset and domain destruction would be simple. Furthermore, during device hot-unplug, device reset is done when the owner is alive. So if we use domain destruction to enforce all irq binding cleared, in theory, it won't be applicable to hot-unplug case (if qemu's hot-unplug logic is compromised). >>> >>>Even in the hot-unplug case the tool stack could issue unbind >>>requests, behind the back of the possibly compromised qemu, >>>once
Re: [Xen-devel] [PATCH 23/25] argo: signal x86 HVM and ARM via VIRQ
I think there are two issues: 1) VIRQ vs some other sort of event channel For PV guests we originally chose a VIRQ in order to have a well known number against which the kernel driver could bind, so that it wasn't dependent on any of the other interdomain communication systems (such as xenstore) to find the correct channel. VIRQs and events in general make sense for PV domains since the up call mechanism fits well into the way argo expects scheduling to work. I dont see any pressing reason to not use a VIRQ for PV or PVH domains, perhaps I've missed something. 2) VIRQ vs direct injection of vector in hvm case. for HVM guests - you can make the argument for injection via a (potentially hardware) emulated LAPIC. In this case the best performance is obtained by not having to clear the interrupt (requiring another VMEXIT). The only two sorts of interrupts that have that property are 1) MSIs, and 2) edge interrupts via the IOAPIC: Not all operating systems had/have good MSI support, PCI doesn't [really] support edge interrupts on LNK[ABCD], and even if you go the PCI route almost no OSes do the right thing anyway. However it is possible to specify a device via ACPI with a fixed GSI with an edge interrupt. Windows from at least XP, and linux both handle that well and it's the perfect fit for sending the idempotent "you have work to do" type interrupt that argo needs. One of the design goals was that multiple independent actors in a VM should be able to run without knowledge of each other, and using a standard GSI allows something like an EDK II, grub and linux to all use it without having to hand forward state. J. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
Hi Liam, in order to support PVH also with SeaBIOS, I'm going to work on a new option rom (like linuxboot/multiboot) that can be used in this case. I'll keep you updated on it! Cheers, Stefano On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick wrote: > > These changes (along with corresponding qboot and Linux kernel changes) > enable a guest to be booted using the x86/HVM direct boot ABI. > > This commit adds a load_elfboot() routine to pass the size and > location of the kernel entry point to qboot (which will fill in > the start_info struct information needed to to boot the guest). > Having loaded the ELF binary, load_linux() will run qboot > which continues the boot. > > The address for the kernel entry point has already been read > from an ELF Note in the uncompressed kernel binary earlier > in pc_memory_init(). > > Signed-off-by: George Kennedy > Signed-off-by: Liam Merwick > --- > hw/i386/pc.c | 72 > > 1 file changed, 72 insertions(+) > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index 056aa46d99b9..d3012cbd8597 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -54,6 +54,7 @@ > #include "sysemu/qtest.h" > #include "kvm_i386.h" > #include "hw/xen/xen.h" > +#include "hw/xen/start_info.h" > #include "ui/qemu-spice.h" > #include "exec/memory.h" > #include "exec/address-spaces.h" > @@ -1098,6 +1099,50 @@ done: > return pvh_start_addr != 0; > } > > +static bool load_elfboot(const char *kernel_filename, > + int kernel_file_size, > + uint8_t *header, > + size_t pvh_xen_start_addr, > + FWCfgState *fw_cfg) > +{ > +uint32_t flags = 0; > +uint32_t mh_load_addr = 0; > +uint32_t elf_kernel_size = 0; > +uint64_t elf_entry; > +uint64_t elf_low, elf_high; > +int kernel_size; > + > +if (ldl_p(header) != 0x464c457f) { > +return false; /* no elfboot */ > +} > + > +bool elf_is64 = header[EI_CLASS] == ELFCLASS64; > +flags = elf_is64 ? > +((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags; > + > +if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */ > +error_report("elfboot unsupported flags = %x", flags); > +exit(1); > +} > + > +kernel_size = load_elf(kernel_filename, NULL, NULL, _entry, > + _low, _high, 0, I386_ELF_MACHINE, > + 0, 0); > + > +if (kernel_size < 0) { > +error_report("Error while loading elf kernel"); > +exit(1); > +} > +mh_load_addr = elf_low; > +elf_kernel_size = elf_high - elf_low; > + > +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr); > +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr); > +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size); > + > +return true; > +} > + > static void load_linux(PCMachineState *pcms, > FWCfgState *fw_cfg) > { > @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms, > if (ldl_p(header+0x202) == 0x53726448) { > protocol = lduw_p(header+0x206); > } else { > +/* If the kernel address for using the x86/HVM direct boot ABI has > + * been saved then proceed with booting the uncompressed kernel */ > +if (pvh_start_addr) { > +if (load_elfboot(kernel_filename, kernel_size, > + header, pvh_start_addr, fw_cfg)) { > +struct hvm_modlist_entry ramdisk_mod = { 0 }; > + > +fclose(f); > + > +fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, > +strlen(kernel_cmdline) + 1); > +fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, > kernel_cmdline); > + > +assert(machine->device_memory != NULL); > +ramdisk_mod.paddr = machine->device_memory->base; > +ramdisk_mod.size = > +memory_region_size(>device_memory->mr); > + > +fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, _mod, > + sizeof(ramdisk_mod)); > +fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header)); > +fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, > + header, sizeof(header)); > + > +return; > +} > +} > /* This looks like a multiboot kernel. If it is, let's stop > treating it like a Linux kernel. */ > if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename, > -- > 1.8.3.1 > -- Stefano Garzarella Red Hat ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
On Tue, Dec 11, 2018 at 7:35 PM Maran Wilson wrote: > > On 12/11/2018 9:11 AM, Stefano Garzarella wrote: > > Hi Liam, > > in order to support PVH also with SeaBIOS, I'm going to work on a new > > option rom (like linuxboot/multiboot) that can be used in this case. > > That is awesome. Yes, please keep us posted when you have something working. Yes, I'll keep you updated! > > Just FYI, before switching over to using Qemu+qboot, we had been using a > Qemu only solution (but not using an option rom) internally that worked > very well using no FW at all. We had Qemu simply parse the ELF file and > jump to the PVH entry point if one is found. The only gotcha was that we > had to include a pair of patches that were originally written by folks > at Intel as part of the clear containers work. Specifically, in order to > be able to skip firmware entirely, we had to do 2 additional things: (1) > ACPI tables generated by Qemu are usually patched up by FW. Since we > were running no FW, we needed to do that patching up of the ACPI tables > in Qemu when it was detected that we were going to enter the OS via the > PVH entry point. (2) We also needed to add a patch to Qemu to enable a > few PM registers -- something typically done by FW. I had a look of qemu-lite, are you referring to this? > > But if SeaBIOS is involved in the solution you are working on, I guess > you won't really need those extra patches. Just figured I'd mention it > so you have the full picture. Thank you very much to share with me these details! Cheers, Stefano > > Thanks, > -Maran > > > I'll keep you updated on it! > > > > Cheers, > > Stefano > > On Wed, Dec 5, 2018 at 11:38 PM Liam Merwick > > wrote: > >> These changes (along with corresponding qboot and Linux kernel changes) > >> enable a guest to be booted using the x86/HVM direct boot ABI. > >> > >> This commit adds a load_elfboot() routine to pass the size and > >> location of the kernel entry point to qboot (which will fill in > >> the start_info struct information needed to to boot the guest). > >> Having loaded the ELF binary, load_linux() will run qboot > >> which continues the boot. > >> > >> The address for the kernel entry point has already been read > >> from an ELF Note in the uncompressed kernel binary earlier > >> in pc_memory_init(). > >> > >> Signed-off-by: George Kennedy > >> Signed-off-by: Liam Merwick > >> --- > >> hw/i386/pc.c | 72 > >> > >> 1 file changed, 72 insertions(+) > >> > >> diff --git a/hw/i386/pc.c b/hw/i386/pc.c > >> index 056aa46d99b9..d3012cbd8597 100644 > >> --- a/hw/i386/pc.c > >> +++ b/hw/i386/pc.c > >> @@ -54,6 +54,7 @@ > >> #include "sysemu/qtest.h" > >> #include "kvm_i386.h" > >> #include "hw/xen/xen.h" > >> +#include "hw/xen/start_info.h" > >> #include "ui/qemu-spice.h" > >> #include "exec/memory.h" > >> #include "exec/address-spaces.h" > >> @@ -1098,6 +1099,50 @@ done: > >> return pvh_start_addr != 0; > >> } > >> > >> +static bool load_elfboot(const char *kernel_filename, > >> + int kernel_file_size, > >> + uint8_t *header, > >> + size_t pvh_xen_start_addr, > >> + FWCfgState *fw_cfg) > >> +{ > >> +uint32_t flags = 0; > >> +uint32_t mh_load_addr = 0; > >> +uint32_t elf_kernel_size = 0; > >> +uint64_t elf_entry; > >> +uint64_t elf_low, elf_high; > >> +int kernel_size; > >> + > >> +if (ldl_p(header) != 0x464c457f) { > >> +return false; /* no elfboot */ > >> +} > >> + > >> +bool elf_is64 = header[EI_CLASS] == ELFCLASS64; > >> +flags = elf_is64 ? > >> +((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags; > >> + > >> +if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */ > >> +error_report("elfboot unsupported flags = %x", flags); > >> +exit(1); > >> +} > >> + > >> +kernel_size = load_elf(kernel_filename, NULL, NULL, _entry, > >> + _low, _high, 0, I386_ELF_MACHINE, > >> + 0, 0); > >> + > >> +if (kernel_size < 0) { > >> +error_report("Error while loading elf kernel"); > >> +exit(1); > >> +} > >> +mh_load_addr = elf_low; > >> +elf_kernel_size = elf_high - elf_low; > >> + > >> +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr); > >> +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr); > >> +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size); > >> + > >> +return true; > >> +} > >> + > >> static void load_linux(PCMachineState *pcms, > >> FWCfgState *fw_cfg) > >> { > >> @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms, > >> if (ldl_p(header+0x202) == 0x53726448) { > >> protocol = lduw_p(header+0x206); > >> } else { > >> +/* If the kernel address for using the x86/HVM direct boot ABI has > >> +
Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests
>>> On 13.12.18 at 12:39, wrote: > On Wed, Dec 12, 2018 at 09:07:14AM -0700, Jan Beulich wrote: >> >>> On 12.12.18 at 15:54, wrote: >> > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d, >> > bytes = (unsigned int)((sc->pages - pages + 7) >> 3); >> > if ( likely(peek) ) >> > { >> > +if ( paging_mode_enabled(current->domain) ) >> > +/* >> > + * Drop the target p2m lock, or else Xen will >> > panic >> > + * when trying to acquire the p2m lock of the >> > caller >> > + * due to invalid lock order. Note that there are >> > no >> > + * lock ordering issues here, and the panic is >> > due to >> > + * the fact that the lock level tracking doesn't >> > record >> > + * the domain the lock belongs to. >> > + */ >> > +paging_unlock(d); >> >> This makes it sound as if tracking the domain would help. It doesn't, >> at least not as long as there is not also some sort of ordering or >> hierarchy between domains. IOW I'd prefer if the "doesn't" became >> "can't". > > Well, Just keeping correct order between each domain locks should be > enough? > > Ie: exactly the same that Xen currently does but on a per-domain > basis. This is feasible, but each CPU would need to store the lock > order of each possible domain: > > DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]); > > This would consume ~32KB per CPU, which is not that much but seems a > waste when most of the time a single entry will be used. Well, tracking by domain ID wouldn't help you - the controlling domain may well have a higher ID than the being controlled one, i.e. the nesting you want needs to be independent of domain ID. >> Now before we go this route I think we need to consider whether >> this really is the only place where the two locks get into one >> another's way. That's because I don't think we want to introduce >> more of such, well, hackery, and hence we'd otherwise need a >> better solution. For example the locking model could be adjusted >> to allow such nesting in the general case: Dom0 and any domain >> whose ->target matches the subject domain here could be given >> a slightly different "weight" in the lock order violation check logic. > > So locks from domains != current would be given a lower order, let's > say: > > #define MM_LOCK_ORDER_nestedp2m 8 > #define MM_LOCK_ORDER_p2m16 > #define MM_LOCK_ORDER_per_page_sharing 24 > #define MM_LOCK_ORDER_altp2mlist 32 > #define MM_LOCK_ORDER_altp2m 40 > #define MM_LOCK_ORDER_pod48 > #define MM_LOCK_ORDER_page_alloc 56 > #define MM_LOCK_ORDER_paging 64 > #define MM_LOCK_ORDER_MAXMM_LOCK_ORDER_paging > > If domain != current, the above values are used. If domain == current > the values above are used + MM_LOCK_ORDER_MAX? So in that case the > order of MM_LOCK_ORDER_p2m against the current domain would be 16 + 64 > = 80? > > This has the slight inconvenience that not all mm lock call sites have > the target domain available, but can be solved. No, I wasn't envisioning a really generic approach, i.e. permitting this for all of the locks. Do we need this? Till now I was rather considering to do this for just the paging lock, with Dom0 and the controlling domains using just a small (+/- 2 and +/- 1) offset from the normal paging one. However, I now realize that indeed there may be more of such nesting, and you've merely hit a specific case of it. In any event it seems to me that you've got the ordering inversed, i.e. domain == current (or really: the general case, i.e. after excluding the two special cases) would use base values, domain == currd->target would use some offset, and Dom0 would use double the offset. With it extended to all locks I'm no longer sure though that the end result would be safe. I think it would be helpful to rope in whoever did originally design this locking model. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 16/18] xen: automatically create XenBlockDevice-s
> -Original Message- > From: Kevin Wolf [mailto:kw...@redhat.com] > Sent: 13 December 2018 11:52 > To: Paul Durrant > Cc: qemu-de...@nongnu.org; qemu-bl...@nongnu.org; xen- > de...@lists.xenproject.org; Max Reitz ; Stefano > Stabellini > Subject: Re: [PATCH v4 16/18] xen: automatically create XenBlockDevice-s > > Am 11.12.2018 um 16:57 hat Paul Durrant geschrieben: > > This patch adds a creator function for XenBlockDevice-s so that they can > > be created automatically when the Xen toolstack instantiates a new > > PV backend. When the XenBlockDevice is created this way it is also > > necessary to create a drive which matches the configuration that the Xen > > toolstack has written into xenstore. This drive is marked 'auto_del' so > > that it will be removed when the XenBlockDevice is destroyed. Also, for > > compatibility with the legacy 'xen_disk' implementation, an iothread > > is automatically created for the new XenBlockDevice. This will also be > > removed when the XenBlockDevice is destroyed. > > > > Correspondingly the legacy backend scan for 'qdisk' is removed. > > > > After this patch is applied the legacy 'xen_disk' code is redundant. It > > will be removed by a subsequent patch. > > > > Signed-off-by: Paul Durrant > > Reviewed-by: Anthony Perard > > So I have two points for this patch. > > The first is that devices creating their own backends feels so wrong. Indeed it does feel wrong, but this is the only way to deal with existing Xen toolstacks. Once toolstacks have been ported to using QMP to instantiate backends then this code can go away. > I > know that the old xen_disk did the same, and fixing it might neither be > easy nor directly related to the qdevification, so requiring that from > you would probably be unfair. But I still have to make the note, and > hopefully we can get to it eventually (or maybe it is even easy enough > that we can indeed address it in this series). > No, it's not easy but once this series is committed the work can be started. > My problem here is that I don't really understand the Xen mechanisms. > Could you give me a very high-level overview of how adding a disk works > and which component communicates with which other component to get the > information down to QEMU and eventually the newly added > xen_block_device_create()? Xen toolstacks instantiate PV backends by just writing values into xenstore. It is up to the entity implementing the backend to set 'watches' so that it gets notified when these values appear. Currently that entity may be QEMU, or it may be a kernel driver such as Linux xen-blkback.ko. > > Essentially, what I'm wondering is whether we have anything that could > be treated more or less like another monitor besides QMP and HMP, which > would internally work similar to HMP, i.e. map (almost) everything to > QMP commands. Yes, it would be possible to have a separate 'compatibility' daemon to watch xenstore and then formulate the correct sequence of QMP commands to instantiate the backend, but that is more complicated and the right answer of course is to have the toolstack send the QMP commands in the first place. > I see that there is this XenWatch infrastructure to get > notified about changes (which would be treated like monitor commands), > but I'm not sure if everything would be covered by this mechanism or > whether some things must be fetched explicitly. > > Anyway, this is probably for later. > > > +static void xen_block_drive_create(const char *id, const char > *device_type, > > + QDict *opts, Error **errp) > > +{ > > +const char *params = qdict_get_try_str(opts, "params"); > > +const char *mode = qdict_get_try_str(opts, "mode"); > > +const char *direct_io_safe = qdict_get_try_str(opts, "direct-io- > safe"); > > +const char *discard_enable = qdict_get_try_str(opts, "discard- > enable"); > > +char *format = NULL; > > +char *file = NULL; > > +char *drive_optstr = NULL; > > +QemuOpts *drive_opts; > > +Error *local_err = NULL; > > + > > +if (params) { > > +char **v = g_strsplit(params, ":", 2); > > + > > +if (v[1] == NULL) { > > +file = g_strdup(v[0]); > > +} else { > > +if (strcmp(v[0], "aio") == 0) { > > +format = g_strdup("raw"); > > +} else if (strcmp(v[0], "vhd") == 0) { > > +format = g_strdup("vpc"); > > +} else { > > +format = g_strdup(v[0]); > > +} > > +file = g_strdup(v[1]); > > +} > > + > > +g_strfreev(v); > > +} > > + > > +if (!file) { > > +error_setg(errp, "no file parameter"); > > +return; > > +} > > + > > +drive_optstr = g_strdup_printf("id=%s", id); > > +drive_opts = drive_def(drive_optstr); > > +if (!drive_opts) { > > +error_setg(errp, "failed to create drive options"); > > +goto done; > > +} > > + > >
Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events
Hi, On 12/13/18 12:15 PM, Razvan Cojocaru wrote: On 12/13/18 2:04 PM, Julien Grall wrote: Hi, On 12/13/18 8:03 AM, Razvan Cojocaru wrote: On 12/13/18 8:54 AM, Tian, Kevin wrote: From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com] Sent: Tuesday, December 11, 2018 8:33 PM In any case, I think you want to rename the function and/or document more that expected behavior. You're right, I should probably rename that function / variable to better reflect what it signifies - that sync vm_event processing is in progress. For VMX and SVM, that simply means that interrupts will be blocked, and the value of the variable will be correct and possibly useful for ARM as well. what about vm_event_block_interrupt_injection? in that case it's injection instead of interrupt itself being blocked. blocking injection should mean same thing cross archs? Why would you want to block all interrupts injections? When I looked at the details, it feels more you want to block exceptions. I can see use for blocking exception on Arm, blocking all the interrupts is likely going to bring more issues than solving anything. So a better name would be vm_event_block_exception_injection. I'd like to block the writing of anything, by vmx_intr_assist(), into VM_ENTRY_INTR_INFO, because an emulation attempt that happens post-vmx_intr_assist() (because the vm_event client application has requested it) may write an exception of its own there. Since vmx_intr_assist() is called on VMX between the time of sending out the vm_event and the emulation (which happens in hvm_vm_event_do_resume()), we want to block everything that it may write in the VMCS until the emulation is done. I think that's more than just exceptions. I don't know in details how x86 virtualization works, so it is a bit hard to comment on that. However, it feels to me that you are introducing in common code a function that will workaround an architecture specific problem. Can you try to explain it in agnostic word? To expand what I said above, I think it is reasonable to request blocking exception (e.g page-fault...) because they can be generated by an instruction. However, all interrupts generated by the interrupt controller (e.g device, IPI..) should not be blocked. AFAIU your description, it is the same path to handle the two on x86, right? On Arm, there are 2 distinct paths, interrupt generated by the interrupt controller are queued. The exceptions will be generated using multiple different paths. Yet exceptions can still override each other. I can't see any reason for Arm to block interrupt generated by the interrupt controller. This would actually be dangerous due to the way we handle them in Xen currently. Instead we may want to block the exception as they can be generated by an instruction. I've probably been confusing when I was talking about the exceptions that emulating the current instruction may trigger - we don't want to block those. I understood that bit. Thanks, Razvan -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v3 0/3] Performance improvements for xen_disk^Wxen-block
On Wed, Dec 12, 2018 at 11:16:23AM +, Paul Durrant wrote: > This series is a re-base of Tim's v2 series [1] on top of my series [2]. > > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg00243.html > [2] https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02271.html For the series: Acked-by: Anthony PERARD And I've pushed that here: https://xenbits.xen.org/gitweb/?p=people/aperard/qemu-dm.git;a=shortlog;h=refs/heads/xen-next -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events
On 12/13/18 2:04 PM, Julien Grall wrote: > Hi, > > On 12/13/18 8:03 AM, Razvan Cojocaru wrote: >> On 12/13/18 8:54 AM, Tian, Kevin wrote: From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com] Sent: Tuesday, December 11, 2018 8:33 PM > In any case, I think you want to rename the function and/or document > more that expected behavior. You're right, I should probably rename that function / variable to better reflect what it signifies - that sync vm_event processing is in progress. For VMX and SVM, that simply means that interrupts will be blocked, and the value of the variable will be correct and possibly useful for ARM as well. >>> >>> what about vm_event_block_interrupt_injection? in that case >>> it's injection instead of interrupt itself being blocked. blocking >>> injection should mean same thing cross archs? > > Why would you want to block all interrupts injections? When I looked at > the details, it feels more you want to block exceptions. > > I can see use for blocking exception on Arm, blocking all the interrupts > is likely going to bring more issues than solving anything. > > So a better name would be vm_event_block_exception_injection. I'd like to block the writing of anything, by vmx_intr_assist(), into VM_ENTRY_INTR_INFO, because an emulation attempt that happens post-vmx_intr_assist() (because the vm_event client application has requested it) may write an exception of its own there. Since vmx_intr_assist() is called on VMX between the time of sending out the vm_event and the emulation (which happens in hvm_vm_event_do_resume()), we want to block everything that it may write in the VMCS until the emulation is done. I think that's more than just exceptions. I've probably been confusing when I was talking about the exceptions that emulating the current instruction may trigger - we don't want to block those. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Qemu-devel] [PATCH for-3.2 v5 07/19] hw: apply accel compat properties without touching globals
On Wed, 12 Dec 2018 16:00:06 +0400 Marc-André Lureau wrote: > Hi > On Mon, Dec 10, 2018 at 8:55 PM Igor Mammedov wrote: > > > > On Mon, 10 Dec 2018 17:45:22 +0100 > > Igor Mammedov wrote: > > > > > On Tue, 4 Dec 2018 18:20:11 +0400 > > > Marc-André Lureau wrote: > > > > > > > Instead of registering compat properties as globals, let's keep them > > > > in their own array, to avoid mixing with user globals. > > > > > > > > Introduce object_apply_global_props() function, to apply compatibility > > > > properties from a GPtrArray. > > > > > > > > Signed-off-by: Marc-André Lureau > > > > --- > > > > include/hw/qdev-core.h | 10 ++ > > > > include/qom/object.h | 3 +++ > > > > include/sysemu/accel.h | 4 +--- > > > > accel/accel.c | 12 > > > > hw/core/qdev.c | 9 + > > > > hw/xen/xen-common.c| 9 ++--- > > > > qom/object.c | 25 + > > > > vl.c | 1 - > > > > 8 files changed, 54 insertions(+), 19 deletions(-) > > > > > > > [...] > > > > diff --git a/hw/xen/xen-common.c b/hw/xen/xen-common.c > > > > index 6ec14c73ca..4532aa8632 100644 > > > > --- a/hw/xen/xen-common.c > > > > +++ b/hw/xen/xen-common.c > > > > @@ -174,18 +174,21 @@ static GlobalProperty xen_compat_props[] = { > > > > .driver = "migration", > > > > .property = "send-section-footer", > > > > .value = "off", > > > > -}, > > > > -{ /* end of list */ }, > > > > +} > > > > }; > > > > > > > > static void xen_accel_class_init(ObjectClass *oc, void *data) > > > > { > > > > AccelClass *ac = ACCEL_CLASS(oc); > > > > + > > > > ac->name = "Xen"; > > > > ac->init_machine = xen_init; > > > > ac->setup_post = xen_setup_post; > > > > ac->allowed = _allowed; > > > > -ac->global_props = xen_compat_props; > > > > +ac->compat_props = g_ptr_array_new(); > > > where is matching free for that? > > can we at least annotate it somehow so that valgrind won't complain about > > this leak? > > If you check my commits on qemu, you should see that I do care (too > much?) about leaks :) > > In this case though, I don't see valgrind or asan complaining, I guess > it's still a reachable reference. > Do you think a FIXME comment would be helpful? I've looked at other cases were we leak, and well we leak a lot so it's probably futile exercise. But the comment won't hurt and will work as remainder. > (/me wish we had a proper object system, GObject, but that ship as > long sailed..) > > > > > > > > > > + > > > > +compat_props_add(ac->compat_props, > > > > + xen_compat_props, G_N_ELEMENTS(xen_compat_props)); > > > > } > > > > > > > > #define TYPE_XEN_ACCEL ACCEL_CLASS_NAME("xen") > > > > diff --git a/qom/object.c b/qom/object.c > > > > index 17921c0a71..dbdab0aead 100644 > > > > --- a/qom/object.c > > > > +++ b/qom/object.c > > > > @@ -370,6 +370,31 @@ static void object_post_init_with_type(Object > > > > *obj, TypeImpl *ti) > > > > } > > > > } > > > > > > > > +void object_apply_global_props(Object *obj, const GPtrArray *props, > > > > Error **errp) > > > > +{ > > > > +Error *err = NULL; > > > > +int i; > > > > + > > > > +if (!props) { > > > > +return; > > > > +} > > > > + > > > > +for (i = 0; i < props->len; i++) { > > > > +GlobalProperty *p = g_ptr_array_index(props, i); > > > > + > > > > +if (object_dynamic_cast(obj, p->driver) == NULL) { > > > > +continue; > > > > +} > > > > +p->used = true; > > > > +object_property_parse(obj, p->value, p->property, ); > > > > +if (err != NULL) { > > > > +error_prepend(, "can't apply global %s.%s=%s: ", > > > > + p->driver, p->property, p->value); > > > > +error_propagate(errp, err); > > > > +} > > > > +} > > > > +} > > > > + > > > > static void object_initialize_with_type(void *data, size_t size, > > > > TypeImpl *type) > > > > { > > > > Object *obj = data; > > > > diff --git a/vl.c b/vl.c > > > > index a5ae5f23d2..88ba658572 100644 > > > > --- a/vl.c > > > > +++ b/vl.c > > > > @@ -2968,7 +2968,6 @@ static void user_register_global_props(void) > > > > */ > > > > static void register_global_properties(MachineState *ms) > > > > { > > > > -accel_register_compat_props(ms->accelerator); > > > > machine_register_compat_props(ms); > > > > user_register_global_props(); > > > > } > > > > > > > > > > > > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events
Hi, On 12/13/18 8:03 AM, Razvan Cojocaru wrote: On 12/13/18 8:54 AM, Tian, Kevin wrote: From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com] Sent: Tuesday, December 11, 2018 8:33 PM In any case, I think you want to rename the function and/or document more that expected behavior. You're right, I should probably rename that function / variable to better reflect what it signifies - that sync vm_event processing is in progress. For VMX and SVM, that simply means that interrupts will be blocked, and the value of the variable will be correct and possibly useful for ARM as well. what about vm_event_block_interrupt_injection? in that case it's injection instead of interrupt itself being blocked. blocking injection should mean same thing cross archs? Why would you want to block all interrupts injections? When I looked at the details, it feels more you want to block exceptions. I can see use for blocking exception on Arm, blocking all the interrupts is likely going to bring more issues than solving anything. So a better name would be vm_event_block_exception_injection. Cheers, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] xen_disk: fix memory leak
> -Original Message- > From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf > Of Anthony PERARD > Sent: 13 December 2018 11:34 > To: Olaf Hering > Cc: Kevin Wolf ; Stefano Stabellini > ; open list:Block layer core bl...@nongnu.org>; qemu-de...@nongnu.org; Max Reitz ; > open list:X86 > Subject: Re: [Xen-devel] [PATCH v1] xen_disk: fix memory leak > > On Tue, Dec 11, 2018 at 05:02:24PM +0100, Olaf Hering wrote: > > There are some code paths that clobber ioreq->buf, which leads to a huge > > memory leak after a few hours of runtime. One code path is > > qemu_aio_complete, which might be called recursive. Another one is > > I think it's s/recursive/recursively/. > > > ioreq_reset, which might clobber ioreq->buf as well. > > > > Add wrappers to free ioreq->buf before reassignment. > > > > Signed-off-by: Olaf Hering > > That patch seems fine, with a few coding style issues, and is going to > be needed to be forward ported to Paul's reimplementation (not yet > merged). I already posted a patch from Tim Smith (re-based to the new xen-block datapath) that should fix this issue. Paul > > > --- > > hw/block/xen_disk.c | 22 +- > > 1 file changed, 17 insertions(+), 5 deletions(-) > > > > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c > > index 36eff94f84..e15eefe625 100644 > > --- a/hw/block/xen_disk.c > > +++ b/hw/block/xen_disk.c > > @@ -103,12 +103,24 @@ struct XenBlkDev { > > > > /* - */ > > > > +static void ioreq_buf_alloc(struct ioreq *ioreq, size_t alignment) > > You have the parameter `alignment` but don't actually use it, I don't > think it's needed. > > > +{ > > +if (ioreq->buf) > > +qemu_vfree(ioreq->buf); > > You could call ioreq_buf_free here instead of duplicating the code. > > > +ioreq->buf = qemu_memalign(XC_PAGE_SIZE, ioreq->size); > > +} > > +static void ioreq_buf_free(struct ioreq *ioreq) > > +{ > > +if (ioreq->buf) > > +qemu_vfree(ioreq->buf); > > +ioreq->buf = NULL; > > +} > > Thanks, > > -- > Anthony PERARD > > ___ > Xen-devel mailing list > Xen-devel@lists.xenproject.org > https://lists.xenproject.org/mailman/listinfo/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 16/18] xen: automatically create XenBlockDevice-s
Am 11.12.2018 um 16:57 hat Paul Durrant geschrieben: > This patch adds a creator function for XenBlockDevice-s so that they can > be created automatically when the Xen toolstack instantiates a new > PV backend. When the XenBlockDevice is created this way it is also > necessary to create a drive which matches the configuration that the Xen > toolstack has written into xenstore. This drive is marked 'auto_del' so > that it will be removed when the XenBlockDevice is destroyed. Also, for > compatibility with the legacy 'xen_disk' implementation, an iothread > is automatically created for the new XenBlockDevice. This will also be > removed when the XenBlockDevice is destroyed. > > Correspondingly the legacy backend scan for 'qdisk' is removed. > > After this patch is applied the legacy 'xen_disk' code is redundant. It > will be removed by a subsequent patch. > > Signed-off-by: Paul Durrant > Reviewed-by: Anthony Perard So I have two points for this patch. The first is that devices creating their own backends feels so wrong. I know that the old xen_disk did the same, and fixing it might neither be easy nor directly related to the qdevification, so requiring that from you would probably be unfair. But I still have to make the note, and hopefully we can get to it eventually (or maybe it is even easy enough that we can indeed address it in this series). My problem here is that I don't really understand the Xen mechanisms. Could you give me a very high-level overview of how adding a disk works and which component communicates with which other component to get the information down to QEMU and eventually the newly added xen_block_device_create()? Essentially, what I'm wondering is whether we have anything that could be treated more or less like another monitor besides QMP and HMP, which would internally work similar to HMP, i.e. map (almost) everything to QMP commands. I see that there is this XenWatch infrastructure to get notified about changes (which would be treated like monitor commands), but I'm not sure if everything would be covered by this mechanism or whether some things must be fetched explicitly. Anyway, this is probably for later. > +static void xen_block_drive_create(const char *id, const char *device_type, > + QDict *opts, Error **errp) > +{ > +const char *params = qdict_get_try_str(opts, "params"); > +const char *mode = qdict_get_try_str(opts, "mode"); > +const char *direct_io_safe = qdict_get_try_str(opts, "direct-io-safe"); > +const char *discard_enable = qdict_get_try_str(opts, "discard-enable"); > +char *format = NULL; > +char *file = NULL; > +char *drive_optstr = NULL; > +QemuOpts *drive_opts; > +Error *local_err = NULL; > + > +if (params) { > +char **v = g_strsplit(params, ":", 2); > + > +if (v[1] == NULL) { > +file = g_strdup(v[0]); > +} else { > +if (strcmp(v[0], "aio") == 0) { > +format = g_strdup("raw"); > +} else if (strcmp(v[0], "vhd") == 0) { > +format = g_strdup("vpc"); > +} else { > +format = g_strdup(v[0]); > +} > +file = g_strdup(v[1]); > +} > + > +g_strfreev(v); > +} > + > +if (!file) { > +error_setg(errp, "no file parameter"); > +return; > +} > + > +drive_optstr = g_strdup_printf("id=%s", id); > +drive_opts = drive_def(drive_optstr); > +if (!drive_opts) { > +error_setg(errp, "failed to create drive options"); > +goto done; > +} > + > +qemu_opt_set(drive_opts, "file", file, _err); > +if (local_err) { > +error_propagate_prepend(errp, local_err, "failed to set 'file': "); > +goto done; > +} > + > +qemu_opt_set(drive_opts, "media", device_type, _err); > +if (local_err) { > +error_propagate_prepend(errp, local_err, > +"failed to set 'media': "); > +goto done; > +} > + > +if (format) { > +qemu_opt_set(drive_opts, "format", format, _err); > +if (local_err) { > +error_propagate_prepend(errp, local_err, > +"failed to set 'format': "); > +goto done; > +} > +} > + > +if (mode && *mode != 'w') { > +qemu_opt_set_bool(drive_opts, BDRV_OPT_READ_ONLY, true, _err); > +if (local_err) { > +error_propagate_prepend(errp, local_err, "failed to set '%s': ", > +BDRV_OPT_READ_ONLY); > +goto done; > +} > +} > + > +/* > + * It is necessary to turn file locking off as an emulated device > + * my have already opened the same image file. > + */ > +qemu_opt_set(drive_opts, "file.locking", "off", _err); > +if (local_err) { > +error_propagate_prepend(errp, local_err, > +"failed to set
Re: [Xen-devel] [PATCH] x86: fix paging_log_dirty_op to work with paging guests
On Wed, Dec 12, 2018 at 09:07:14AM -0700, Jan Beulich wrote: > >>> On 12.12.18 at 15:54, wrote: > > Fix this by releasing the target paging lock before attempting to > > perform the copy of the dirty bitmap, and then forcing a restart of > > the whole process in case there have been changes to the dirty bitmap > > tables. > > I'm afraid it's not that simple: The writer side (paging_mark_pfn_dirty()) > uses the same lock, and I think the assumption is that while the copying > takes place no updates to the bitmap can occur. Then again the subject > domain gets paused anyway, so updates probably can't happen (the > "probably" of course needs to be eliminated). I've looked into this, and I think you are right, the copy must be done with the paging lock held. Even if the domain is paused, the device model can still mark gfns as dirty using XEN_DMOP_modified_memory, so the only option would be to copy to a temporary page while holding the lock, release the lock and copy the temporary page to the called buffer. > > @@ -488,6 +490,16 @@ static int paging_log_dirty_op(struct domain *d, > > bytes = (unsigned int)((sc->pages - pages + 7) >> 3); > > if ( likely(peek) ) > > { > > +if ( paging_mode_enabled(current->domain) ) > > +/* > > + * Drop the target p2m lock, or else Xen will panic > > + * when trying to acquire the p2m lock of the > > caller > > + * due to invalid lock order. Note that there are > > no > > + * lock ordering issues here, and the panic is due > > to > > + * the fact that the lock level tracking doesn't > > record > > + * the domain the lock belongs to. > > + */ > > +paging_unlock(d); > > This makes it sound as if tracking the domain would help. It doesn't, > at least not as long as there is not also some sort of ordering or > hierarchy between domains. IOW I'd prefer if the "doesn't" became > "can't". Well, Just keeping correct order between each domain locks should be enough? Ie: exactly the same that Xen currently does but on a per-domain basis. This is feasible, but each CPU would need to store the lock order of each possible domain: DECLARE_PER_CPU(uint8_t, mm_lock_level[DOMID_FIRST_RESERVED]); This would consume ~32KB per CPU, which is not that much but seems a waste when most of the time a single entry will be used. > Now before we go this route I think we need to consider whether > this really is the only place where the two locks get into one > another's way. That's because I don't think we want to introduce > more of such, well, hackery, and hence we'd otherwise need a > better solution. For example the locking model could be adjusted > to allow such nesting in the general case: Dom0 and any domain > whose ->target matches the subject domain here could be given > a slightly different "weight" in the lock order violation check logic. So locks from domains != current would be given a lower order, let's say: #define MM_LOCK_ORDER_nestedp2m 8 #define MM_LOCK_ORDER_p2m16 #define MM_LOCK_ORDER_per_page_sharing 24 #define MM_LOCK_ORDER_altp2mlist 32 #define MM_LOCK_ORDER_altp2m 40 #define MM_LOCK_ORDER_pod48 #define MM_LOCK_ORDER_page_alloc 56 #define MM_LOCK_ORDER_paging 64 #define MM_LOCK_ORDER_MAXMM_LOCK_ORDER_paging If domain != current, the above values are used. If domain == current the values above are used + MM_LOCK_ORDER_MAX? So in that case the order of MM_LOCK_ORDER_p2m against the current domain would be 16 + 64 = 80? This has the slight inconvenience that not all mm lock call sites have the target domain available, but can be solved. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] xen_disk: fix memory leak
On Tue, Dec 11, 2018 at 05:02:24PM +0100, Olaf Hering wrote: > There are some code paths that clobber ioreq->buf, which leads to a huge > memory leak after a few hours of runtime. One code path is > qemu_aio_complete, which might be called recursive. Another one is I think it's s/recursive/recursively/. > ioreq_reset, which might clobber ioreq->buf as well. > > Add wrappers to free ioreq->buf before reassignment. > > Signed-off-by: Olaf Hering That patch seems fine, with a few coding style issues, and is going to be needed to be forward ported to Paul's reimplementation (not yet merged). > --- > hw/block/xen_disk.c | 22 +- > 1 file changed, 17 insertions(+), 5 deletions(-) > > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c > index 36eff94f84..e15eefe625 100644 > --- a/hw/block/xen_disk.c > +++ b/hw/block/xen_disk.c > @@ -103,12 +103,24 @@ struct XenBlkDev { > > /* - */ > > +static void ioreq_buf_alloc(struct ioreq *ioreq, size_t alignment) You have the parameter `alignment` but don't actually use it, I don't think it's needed. > +{ > +if (ioreq->buf) > +qemu_vfree(ioreq->buf); You could call ioreq_buf_free here instead of duplicating the code. > +ioreq->buf = qemu_memalign(XC_PAGE_SIZE, ioreq->size); > +} > +static void ioreq_buf_free(struct ioreq *ioreq) > +{ > +if (ioreq->buf) > +qemu_vfree(ioreq->buf); > +ioreq->buf = NULL; > +} Thanks, -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1] xen: preserve COMPAT in CFLAGS
Hi, Ian, we have those XC_WANT_COMPAT_* #defines to allow consumers of Xen libs be able to use old interfaces. Do you think it's a good idea to have this consumers (QEMU here) #undef the flag when it has implemented the newer interface? I guess the issue we have here is that when libxc interface are re-implemented into a xen libs, the meaning of XC_WANT_COMPAT_* flags is changed. And the QEMU fails to build even with the flags supplied in cflags. There is another thread that Olaf have started here: <20181025140808.13eefc21.o...@aepfle.de> https://lists.xenproject.org/archives/html/xen-devel/2018-10/msg01950.html On Fri, Oct 26, 2018 at 12:10:16PM +0200, Olaf Hering wrote: > A given Qemu version can not predict what version of Xen it will run on. > There are some checks in configure to decide what Xen libraries and > functions are available. How exactly these functions must be accessed > has to be decided by configure and the user who is compiling Qemu. > In no way some random header file must override this decision. > > Remove the breakage introduced by commit 5eeb39c24b, which would always > hide the libxc interfaces the given version of Qemu knows about. > > The current symptom of such breakage is a build failure with qemu-2.9 > and older, in combination with Xen 4.12. > > Fixes: 5eeb39c24b7d4da5d129bfdd9c4fd21cfb3d28d6 > Signed-off-by: Olaf Hering > --- > include/hw/xen/xen_common.h | 9 - > 1 file changed, 9 deletions(-) > > diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h > index 5f1402b494..33fa2d3497 100644 > --- a/include/hw/xen/xen_common.h > +++ b/include/hw/xen/xen_common.h > @@ -1,15 +1,6 @@ > #ifndef QEMU_HW_XEN_COMMON_H > #define QEMU_HW_XEN_COMMON_H > > -/* > - * If we have new enough libxenctrl then we do not want/need these compat > - * interfaces, despite what the user supplied cflags might say. They > - * must be undefined before including xenctrl.h > - */ > -#undef XC_WANT_COMPAT_EVTCHN_API > -#undef XC_WANT_COMPAT_GNTTAB_API > -#undef XC_WANT_COMPAT_MAP_FOREIGN_API > - > #include > #include > #include -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation
Am Thu, 13 Dec 2018 03:41:44 -0700 schrieb "Jan Beulich" : > In a second step, let's consider what impact errors in calibration have. > Between two systems with exactly the same hardware crystal > frequency there of course is going to be some drift. The problem > though is - between the two calibrated clock values from two systems > you can't easily tell what part of the difference is a result of the > calibration being imprecise, and what part of it is because of the > crystals not providing the exact same frequency. Without knowing > the possible range of both errors, the argumentation of _one of > them_ being tolerable within a certain range to consider _both_ > systems sufficiently equal is at least questionable. We already have a knob for that, if a reference clock can not be provided: tsc_mode=always_emulate This would exactly cover the case there the assumed frequency that is available during the start of domU will be used after migration. Olaf pgpJRJytqkhcN.pgp Description: Digitale Signatur von OpenPGP ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.12] dt: fix integer check in dt_device_is_compatible
Hi Stefano, On 12/12/18 11:53 PM, Stefano Stabellini wrote: On Wed, 12 Dec 2018, Julien Grall wrote: Hi Stefano, On 11/12/2018 18:46, Stefano Stabellini wrote: cplen is unsigned, thus, it can never be < 0. Use a signed integer local variable instead. The current code checks > 0. Looking at the code I don't think it can ever be negative. So can you details what is the problem you are trying to resolve? Yes, it can be "negative" (not actually negative because it is defined as unsigned), in fact it happens with the nodes added dynamically by grub at boot. This patches fixes booting from grub. Specifically ilen is initialed to 16, but strlen+1 is 17, so length becomes -1. The length of the property generated by grub seems to be short by 1. Such explanation should have been in the commit message, it helps to diagnostics where the problem is coming from. Looking at the specification [1] section 2.3.1, a compatible property is a concatenated list of null terminated strings. So the current code is actually correct and there is a bug in GRUB. This bug was actually fixed by commit ae5817f1d "arm64/xen_boot: Fix Xen boot using GRUB2 on AARCH64". I assume you are using Grub 2.02 which does not contain the full support for Xen. So I would not even consider to work-around it in Xen. Instead, I would recommend you to upgrade to a more recent GRUB. It would be more likely staging as there are no new GRUB version. Another solution is to use chainloading. Cheers, [1] https://github.com/devicetree-org/devicetree-specification/releases/tag/v0.2 Signed-off-by: Stefano Stabellini --- xen/common/device_tree.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c index 8fc401d..df274cc 100644 --- a/xen/common/device_tree.c +++ b/xen/common/device_tree.c @@ -213,17 +213,20 @@ bool_t dt_device_is_compatible(const struct dt_device_node *device, { const char* cp; u32 cplen, l; +s64 ilen; cp = dt_get_property(device, "compatible", ); if ( cp == NULL ) return 0; -while ( cplen > 0 ) + +ilen = cplen; +while ( ilen > 0 ) { if ( dt_compat_cmp(cp, compat) == 0 ) return 1; l = strlen(cp) + 1; cp += l; -cplen -= l; +ilen -= l; } return 0; -- Julien Grall -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V11 4/5] p2m: Always use hostp2m when clipping rangesets
>>> On 13.12.18 at 11:22, wrote: > For my own part, I see no reason why not clipping end should not work > when updating the ranges only (as long as start continues to be <= > unclipped_end). > > Would that modification + testing of it help this series continue? I think so, at least as far as I'm concerned. But I think we really need George's opinion as well. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation
>>> On 13.12.18 at 10:04, wrote: > Am Thu, 13 Dec 2018 01:45:39 -0700 > schrieb "Jan Beulich" : > >> >>> On 13.12.18 at 09:18, wrote: >> > Am Wed, 12 Dec 2018 09:39:25 -0700 >> > schrieb "Jan Beulich" : >> > >> >> >>> On 12.12.18 at 16:20, wrote: >> >> > If a domU uses TSC as clocksoure it also must run NTP in some way to >> >> > avoid the potential drift what will most likely happen, independent of >> >> > any migration. >> >> Which drift? While anyone's well advised to run NTP, a completely >> >> isolated set of systems may have no need to, if their interactions don't >> >> depend on exactly matching time. >> > >> > If these hosts do not sync time to some reference host, the advancing of >> > time >> > is undefined before and after my change. >> >> I'm lost. I simply don't understand what you're trying to tell me, >> or how your answer relates to my question. > > Then please rephrase the question? I was asking the drift of what you are talking about. > I do not see how my path affects the > advancing of time in domUs running on isolated systems. If their hardware > clocks advance at the same speed, my patch will not affect it. And if they > do advance at a different speed, what can emulation do to help with > "correctness"? In a first step, let's assume clock calibration produces a precise result. In such a case, are we in agreement that emulation is needed after migration if clock speeds differ, even if just slightly? In a second step, let's consider what impact errors in calibration have. Between two systems with exactly the same hardware crystal frequency there of course is going to be some drift. The problem though is - between the two calibrated clock values from two systems you can't easily tell what part of the difference is a result of the calibration being imprecise, and what part of it is because of the crystals not providing the exact same frequency. Without knowing the possible range of both errors, the argumentation of _one of them_ being tolerable within a certain range to consider _both_ systems sufficiently equal is at least questionable. And further argumentation that everyone is using NTP anyway doesn't make it any better, when it's no-where written down that Xen is unusable with NTP running in all guests (I'm exaggerating here just to get the point over). Don't forget that e.g. with XenoLinux'es independent-wallclock setting defaulting to false, we've been suggesting that people _don't_ need to use NTP inside their (admittedly PV) guests. IOW - your change may not break people not using NTP. Hence I don't think the mode you introduce can be a default-on one, which in turn means a per-domain or at least global enable control is needed (as over previous iterations we seem to have been agreeing). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V11 4/5] p2m: Always use hostp2m when clipping rangesets
On 12/5/18 6:30 PM, Jan Beulich wrote: On 05.12.18 at 10:18, wrote: >> --- a/xen/arch/x86/mm/p2m.c >> +++ b/xen/arch/x86/mm/p2m.c >> @@ -1002,30 +1002,43 @@ int p2m_change_type_one(struct domain *d, unsigned >> long gfn_l, >> return rc; >> } >> >> -/* Modify the p2m type of a range of gfns from ot to nt. */ >> +/* Modify the p2m type of [start, end) from ot to nt. */ >> static void change_type_range(struct p2m_domain *p2m, >>unsigned long start, unsigned long end, >>p2m_type_t ot, p2m_type_t nt) >> { >> -unsigned long gfn = start; >> struct domain *d = p2m->domain; >> +const unsigned long host_max_pfn = p2m_get_hostp2m(d)->max_mapped_pfn; >> int rc = 0; >> >> -if ( unlikely(end > p2m->max_mapped_pfn) ) >> -{ >> -if ( !gfn ) >> -{ >> -p2m->change_entry_type_global(p2m, ot, nt); >> -gfn = end; >> -} >> -end = p2m->max_mapped_pfn + 1; >> -} >> -if ( gfn < end ) >> -rc = p2m->change_entry_type_range(p2m, ot, nt, gfn, end - 1); >> +--end; >> + >> +if ( start >= host_max_pfn ) >> +printk(XENLOG_G_WARNING "Dom%d logdirty rangeset clipped to >> max_mapped_pfn\n", >> + d->domain_id); >> + >> +/* Always clip the rangeset down to the host p2m. */ >> +if ( unlikely(end > host_max_pfn) ) >> +end = host_max_pfn; >> + >> +/* If the requested range is out of scope, return doing nothing. */ >> +if ( start > end ) >> +return; > > My prior comment remains: Even if there's no change in behavior > (and you avoid the assertion), especially due to the comment the > impression results (at least to me) that all is well here, when it > really is a (latent) bug to "do nothing" in this case. George, so far > this was a discussion between Razvan and me - do you have an > opinion either way here? Obviously I can't speak for George, but to reiterate my previous analysis, it looks like this patch has added the clipping: http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=437f54d3a33d3787a7cc485eb2b3451e8be49ca7 and this patch has added the global_logdirty ranges code: http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=90ac32559bfbd08127638ba13f99b5ed565cfc2b but left the clipping code in (possibly accidentally). You may have some insight into that, being their author, although it's been a few years since then. For my own part, I see no reason why not clipping end should not work when updating the ranges only (as long as start continues to be <= unclipped_end). Would that modification + testing of it help this series continue? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function
>>> On 13.12.18 at 10:14, wrote: > On Thu, Dec 13, 2018 at 12:45:07AM -0700, Jan Beulich wrote: >> >>> On 12.12.18 at 18:05, wrote: >> > On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote: >> >> The MMIO side of things of course still remains unclear. >> > >> > Right, for the MMIO and the handling of grant and foreign mappings it's >> > not clear how we want to proceed. >> > >> > Maybe account for all host RAM (total_pages) plus MMIO BARs? >> >> Well, I thought we've already settled on it being impossible to >> account for all MMIO BARs at this point. > > Well, I could iterate over all the registered PCI devices and size > the BARs (without VF BARs at least initially). This is quite > cumbersome, my other option would be using max_page and hope that > there are enough holes to make up for BAR MMIO regions. Well, maybe we could live with this for now. I certainly would prefer to have a 3rd opinion though, as I continue to feel uneasy with this rather imprecise estimation (i.e. I'd much prefer a more dynamic / on-demand approach). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 2/3] drm/xen-front: Use Xen common shared buffer implementation
bump On 12/5/18 10:20 AM, Oleksandr Andrushchenko wrote: Hello, Daniel! Could you please ack/nack the patch, so either we can merge the series or I can address your comments if any Thank you, Oleksandr On 11/30/18 9:42 AM, Oleksandr Andrushchenko wrote: From: Oleksandr Andrushchenko Use page directory based shared buffer implementation now available as common code for Xen frontend drivers. Remove flushing of shared buffer on page flip as this workaround needs a proper fix. Signed-off-by: Oleksandr Andrushchenko --- drivers/gpu/drm/xen/Kconfig | 1 + drivers/gpu/drm/xen/Makefile | 1 - drivers/gpu/drm/xen/xen_drm_front.c | 65 ++-- drivers/gpu/drm/xen/xen_drm_front_gem.c | 1 - drivers/gpu/drm/xen/xen_drm_front_shbuf.c | 414 -- drivers/gpu/drm/xen/xen_drm_front_shbuf.h | 64 6 files changed, 26 insertions(+), 520 deletions(-) delete mode 100644 drivers/gpu/drm/xen/xen_drm_front_shbuf.c delete mode 100644 drivers/gpu/drm/xen/xen_drm_front_shbuf.h diff --git a/drivers/gpu/drm/xen/Kconfig b/drivers/gpu/drm/xen/Kconfig index 4cca160782ab..f969d486855d 100644 --- a/drivers/gpu/drm/xen/Kconfig +++ b/drivers/gpu/drm/xen/Kconfig @@ -12,6 +12,7 @@ config DRM_XEN_FRONTEND select DRM_KMS_HELPER select VIDEOMODE_HELPERS select XEN_XENBUS_FRONTEND + select XEN_FRONT_PGDIR_SHBUF help Choose this option if you want to enable a para-virtualized frontend DRM/KMS driver for Xen guest OSes. diff --git a/drivers/gpu/drm/xen/Makefile b/drivers/gpu/drm/xen/Makefile index 712afff5ffc3..825905f67faa 100644 --- a/drivers/gpu/drm/xen/Makefile +++ b/drivers/gpu/drm/xen/Makefile @@ -4,7 +4,6 @@ drm_xen_front-objs := xen_drm_front.o \ xen_drm_front_kms.o \ xen_drm_front_conn.o \ xen_drm_front_evtchnl.o \ - xen_drm_front_shbuf.o \ xen_drm_front_cfg.o \ xen_drm_front_gem.o diff --git a/drivers/gpu/drm/xen/xen_drm_front.c b/drivers/gpu/drm/xen/xen_drm_front.c index 6b6d5ab82ec3..4d3d36fc3a5d 100644 --- a/drivers/gpu/drm/xen/xen_drm_front.c +++ b/drivers/gpu/drm/xen/xen_drm_front.c @@ -19,6 +19,7 @@ #include #include +#include #include #include "xen_drm_front.h" @@ -26,28 +27,20 @@ #include "xen_drm_front_evtchnl.h" #include "xen_drm_front_gem.h" #include "xen_drm_front_kms.h" -#include "xen_drm_front_shbuf.h" struct xen_drm_front_dbuf { struct list_head list; u64 dbuf_cookie; u64 fb_cookie; - struct xen_drm_front_shbuf *shbuf; + + struct xen_front_pgdir_shbuf shbuf; }; -static int dbuf_add_to_list(struct xen_drm_front_info *front_info, - struct xen_drm_front_shbuf *shbuf, u64 dbuf_cookie) +static void dbuf_add_to_list(struct xen_drm_front_info *front_info, + struct xen_drm_front_dbuf *dbuf, u64 dbuf_cookie) { - struct xen_drm_front_dbuf *dbuf; - - dbuf = kzalloc(sizeof(*dbuf), GFP_KERNEL); - if (!dbuf) - return -ENOMEM; - dbuf->dbuf_cookie = dbuf_cookie; - dbuf->shbuf = shbuf; list_add(>list, _info->dbuf_list); - return 0; } static struct xen_drm_front_dbuf *dbuf_get(struct list_head *dbuf_list, @@ -62,15 +55,6 @@ static struct xen_drm_front_dbuf *dbuf_get(struct list_head *dbuf_list, return NULL; } -static void dbuf_flush_fb(struct list_head *dbuf_list, u64 fb_cookie) -{ - struct xen_drm_front_dbuf *buf, *q; - - list_for_each_entry_safe(buf, q, dbuf_list, list) - if (buf->fb_cookie == fb_cookie) - xen_drm_front_shbuf_flush(buf->shbuf); -} - static void dbuf_free(struct list_head *dbuf_list, u64 dbuf_cookie) { struct xen_drm_front_dbuf *buf, *q; @@ -78,8 +62,8 @@ static void dbuf_free(struct list_head *dbuf_list, u64 dbuf_cookie) list_for_each_entry_safe(buf, q, dbuf_list, list) if (buf->dbuf_cookie == dbuf_cookie) { list_del(>list); - xen_drm_front_shbuf_unmap(buf->shbuf); - xen_drm_front_shbuf_free(buf->shbuf); + xen_front_pgdir_shbuf_unmap(>shbuf); + xen_front_pgdir_shbuf_free(>shbuf); kfree(buf); break; } @@ -91,8 +75,8 @@ static void dbuf_free_all(struct list_head *dbuf_list) list_for_each_entry_safe(buf, q, dbuf_list, list) { list_del(>list); - xen_drm_front_shbuf_unmap(buf->shbuf); - xen_drm_front_shbuf_free(buf->shbuf); + xen_front_pgdir_shbuf_unmap(>shbuf); + xen_front_pgdir_shbuf_free(>shbuf); kfree(buf); } } @@ -171,9 +155,9 @@ int xen_drm_front_dbuf_create(struct xen_drm_front_info *front_info, u32 bpp, u64 size, struct page **pages) { struct xen_drm_front_evtchnl *evtchnl; - struct xen_drm_front_shbuf *shbuf; + struct xen_drm_front_dbuf *dbuf; struct xendispl_req *req; - struct xen_drm_front_shbuf_cfg
[Xen-devel] [xen-4.10-testing test] 131257: regressions - FAIL
flight 131257 xen-4.10-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/131257/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stopfail REGR. vs. 129676 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemut-ws16-amd64 16 guest-localmigrate/x10 fail pass in 131223 Tests which did not succeed, but are not blocking: test-xtf-amd64-amd64-1 69 xtf/test-hvm64-xsa-278 fail blocked in 129676 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail in 131223 never pass test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass version targeted for testing: xen b6e203bc80e9d3e1dc7eb579d9665a77700d78cc baseline version: xen e907460fd61c350487ffee5d8aa375bef56bc81c Last test of basis 129676 2018-11-09 01:56:32 Z 34 days Testing same since 130611 2018-11-20 15:07:52 Z 22 days 13 attempts People who touched revisions under test: Andrew Cooper Jan Beulich Roger Pau Monné jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64-xtf pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass
Re: [Xen-devel] [PATCH v2 1/2] x86/dom0: rename paging function
On Thu, Dec 13, 2018 at 12:45:07AM -0700, Jan Beulich wrote: > >>> On 12.12.18 at 18:05, wrote: > > On Wed, Dec 12, 2018 at 09:15:09AM -0700, Jan Beulich wrote: > >> The MMIO side of things of course still remains unclear. > > > > Right, for the MMIO and the handling of grant and foreign mappings it's > > not clear how we want to proceed. > > > > Maybe account for all host RAM (total_pages) plus MMIO BARs? > > Well, I thought we've already settled on it being impossible to > account for all MMIO BARs at this point. Well, I could iterate over all the registered PCI devices and size the BARs (without VF BARs at least initially). This is quite cumbersome, my other option would be using max_page and hope that there are enough holes to make up for BAR MMIO regions. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation
Am Thu, 13 Dec 2018 01:45:39 -0700 schrieb "Jan Beulich" : > >>> On 13.12.18 at 09:18, wrote: > > Am Wed, 12 Dec 2018 09:39:25 -0700 > > schrieb "Jan Beulich" : > > > >> >>> On 12.12.18 at 16:20, wrote: > >> > If a domU uses TSC as clocksoure it also must run NTP in some way to > >> > avoid the potential drift what will most likely happen, independent of > >> > any migration. > >> Which drift? While anyone's well advised to run NTP, a completely > >> isolated set of systems may have no need to, if their interactions don't > >> depend on exactly matching time. > > > > If these hosts do not sync time to some reference host, the advancing of > > time > > is undefined before and after my change. > > I'm lost. I simply don't understand what you're trying to tell me, > or how your answer relates to my question. Then please rephrase the question? I do not see how my path affects the advancing of time in domUs running on isolated systems. If their hardware clocks advance at the same speed, my patch will not affect it. And if they do advance at a different speed, what can emulation do to help with "correctness"? Olaf pgpHzXPq_VPK0.pgp Description: Digitale Signatur von OpenPGP ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
> From: Roger Pau Monné [mailto:roger@citrix.com] > Sent: Thursday, December 13, 2018 4:40 PM > > On Thu, Dec 13, 2018 at 02:44:00AM +, Tian, Kevin wrote: > > btw can you also capture ISR/IRR/PPR right before local_irq_enable()? > > though I didn't see a reason why code in-between may impact those > > bits, it doesn't hurt to capture the context right before interrupt is > > raised. :-) > > I've done that and the result is the same as the ones that are > currently printed on the trace, there's no change to the registers > between the point where they are printed and the call to > local_irq_enable. > Thanks for confirmation. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
> From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: Thursday, December 13, 2018 4:37 PM > > >>> On 13.12.18 at 02:28, wrote: > > btw I checked your original mail: > > > > (XEN)[] mwait-idle.c#mwait_idle+0x2a5/0x381 > > xen/arch/x86/cpu/mwait-idle.c:802 > > > >788 if (cpu_is_haltable(cpu)) > >789 mwait_idle_with_hints(eax, > MWAIT_ECX_INTERRUPT_BREAK); > >790 > >791 after = cpuidle_get_tick(); > >792 > >793 cstate_restore_tsc(); > >794 trace_exit_reason(irq_traced); > >795 TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after, > >796 irq_traced[0], irq_traced[1], irq_traced[2], > irq_traced[3]); > >797 > >798 /* Now back in C0. */ > >799 update_idle_stats(power, cx, before, after); > >800 local_irq_enable(); > >801 > > -> 802 if (!(lapic_timer_reliable_states & (1 << cstate))) > >803 lapic_timer_on(); > >804 > >805 sched_tick_resume(); > >806 cpufreq_dbs_timer_resume(); > > > > Looks above code is different from staging: > > > > acpi_processor_idle: > > acpi_idle_do_entry: > > acpi_processor_ffh_cstate_enter: > > mwait_idle_with_hints > > > > there is no mwait_idle alone. and even with compiler optimization I didn't > > find code sequence like above... > > You're looking at two entirely different code paths, only one of > which can be in use in any particular case: Either the idle > entering routine used is acpi_processor_idle(), or (when the > processor is supported by that driver code) it is mwait_idle(). > See mwait_idle_init() for when the latter gets used; the former > may get installed at the point the Dom0 kernel reports ACPI > C-state data (and only when mwait_idle() is not in use). > yes, I missed the other path. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
On Thu, Dec 13, 2018 at 01:28:23AM +, Tian, Kevin wrote: > > From: Roger Pau Monné [mailto:roger@citrix.com] > > Sent: Wednesday, December 12, 2018 8:18 PM > > > > On Wed, Dec 12, 2018 at 11:48:52AM +, Tian, Kevin wrote: > > > > From: Roger Pau Monné [mailto:roger@citrix.com] > > > > Sent: Wednesday, December 12, 2018 7:25 PM > > > > > > > > On Wed, Dec 12, 2018 at 10:36:44AM +, Tian, Kevin wrote: > > > > > > From: Roger Pau Monné [mailto:roger@citrix.com] > > > > > > Sent: Monday, October 15, 2018 6:30 PM > > > > > > (XEN) [22642] POWERTYPE 4 > > > > > > (XEN) [22643] IDLE PPR 0x0020 > > > > > > (XEN)IRR > > > > > > > > > > > > 00 > > > > > > 00 > > > > > > (XEN)ISR > > > > > > > > > > > > 02 > > > > > > 00 > > > > > > (XEN) [22644] WAKE PPR 0x0020 > > > > > > (XEN)IRR > > > > > > > > > > > > 02 > > > > > > 00 > > > > > > (XEN)ISR > > > > > > > > > > > > 02 > > > > > > 00 > > > > > > > > > > looks pending IRR (0x21) doesn't always trigger a spurious interrupt? > > > > > > > > Yes, that's correct. Having a pending IRR and going idle doesn't > > > > always trigger the spurious interrupt re-injection. > > > > > > > > > is it a fixed pattern after how many rounds of Cstate enter/exit with > > > > > pending IRR(0x21) then you see assertion happened (in this example > > > > > it happens at 3rd time)? > > > > > > > > It's not a fixed pattern, here's another trace with IRR(0x21) being > > > > pending just once during the Cstate transitions: > > > > > > did you observe a case where such asset may occur when IRR(0x21) > > > is cleared but ISR (0x21) is set? > > > > No, I've always seen both ISR and IRR set when the interrupt injection > > happens. This of course doesn't mean it's not possible, but I have not > > seen any trace with ISR(0x21) set and IRR(0x21) clear. > > > > sorry but let me double confirm. You always see ISR[21]/IRR[21] being > set "before and after entering C3" to hit the problem, right? Yes, that's correct. > When > interrupt injection happens later, ISR[21] is set but IRR[21] is cleared (as > expected for normal interrupt delivery process). > > btw I checked your original mail: > > (XEN)[] mwait-idle.c#mwait_idle+0x2a5/0x381 > xen/arch/x86/cpu/mwait-idle.c:802 > >788if (cpu_is_haltable(cpu)) >789mwait_idle_with_hints(eax, > MWAIT_ECX_INTERRUPT_BREAK); >790 >791after = cpuidle_get_tick(); >792 >793cstate_restore_tsc(); >794trace_exit_reason(irq_traced); >795TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after, >796irq_traced[0], irq_traced[1], irq_traced[2], > irq_traced[3]); >797 >798/* Now back in C0. */ >799update_idle_stats(power, cx, before, after); >800local_irq_enable(); >801 > -> 802if (!(lapic_timer_reliable_states & (1 << cstate))) >803lapic_timer_on(); >804 >805sched_tick_resume(); >806cpufreq_dbs_timer_resume(); > > Looks above code is different from staging: The code matches staging at the point where I posted the original bug report, 2 months ago. > acpi_processor_idle: > acpi_idle_do_entry: > acpi_processor_ffh_cstate_enter: > mwait_idle_with_hints There's another caller of mwait_idle_with_hints, the mwait_idle function. This gets setup by mwait_idle_init. > there is no mwait_idle alone. There's a mwait_idle in current staging code, check: xen/arch/x86/cpu/mwait-idle.c:718 http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/cpu/mwait-idle.c;h=f89c52f2565e8deb7d4aec309124fe6fbd57b27d;hb=refs/heads/staging#l718 Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation
>>> On 13.12.18 at 09:18, wrote: > Am Wed, 12 Dec 2018 09:39:25 -0700 > schrieb "Jan Beulich" : > >> >>> On 12.12.18 at 16:20, wrote: >> > If a domU uses TSC as clocksoure it also must run NTP in some way to >> > avoid the potential drift what will most likely happen, independent of >> > any migration. >> Which drift? While anyone's well advised to run NTP, a completely >> isolated set of systems may have no need to, if their interactions don't >> depend on exactly matching time. > > If these hosts do not sync time to some reference host, the advancing of time > is undefined before and after my change. I'm lost. I simply don't understand what you're trying to tell me, or how your answer relates to my question. >> > The calculation of the drift is based on the time >> > returned by remote servers versus how fast the local clock advances. NTP >> > can handle a drift up to 500PPM. This means the local clocksource can >> > run up to 500us slower or faster. This calculation is based on the TSC >> > frequency of the host where the domU was started. Once a domU is >> > migrated to a host of a different class, like from "2.3GHz" to "2.4GHz", >> > the TSC frequency changes, but the domU kernel may not recalibrate >> > itself. >> >> That's why we switch to emulated (or hardware scaling) mode in that >> case. It's anyway not really clear to me what this entire ... >> >> > As a result, the drift will be larger and might be outside of >> > the 500 PPM range. In addition, the kernel may notice the change of >> > speed in which the TSC advances and could change the clocksource. All >> > this depends of course on the type of OS that is running in the domU. >> >> ... (up to here) paragraph is supposed to tell the reader. >> >> > @@ -1885,6 +1888,16 @@ void __init early_time_init(void) >> > printk("Detected %lu.%03lu MHz processor.\n", >> > cpu_khz / 1000, cpu_khz % 1000); >> > >> > +tmp = 1000 * 1000; >> > +tmp += VTSC_NTP_PPM_TOLERANCE; >> > +tmp *= cpu_khz; >> > +tmp /= 1000 * 1000; >> > +tmp -= cpu_khz; >> > +if (tmp >= VTSC_JITTER_RANGE_KHZ) >> > +tmp -= VTSC_JITTER_RANGE_KHZ; >> >> Besides the style issue in the if() - how can this be correct? This >> clearly introduces a discontinuity (just consider the case where >> tmp is exactly VTSC_JITTER_RANGE_KHZ before the if()). And >> I also can't see how it guarantees the resulting value to be >> below VTSC_JITTER_RANGE_KHZ. Did you perhaps mean to >> cap the value (i.e. = instead of -= )? > > This is supposed to make sure the value of Hz for 500us is always larger > than the assumed jitter. So for a 2GHz host, with a theoretical tolerance > of 1000, the actual tolerance is set to 800. tmp will be larger than the > assumed jitter for cpu_khz > 400. Again you don't appear to answer my question regarding the discontinuity you introduce. >> > +vtsc_tolerance_khz = (unsigned int)tmp; >> Stray cast. > > Copy from the cpu_khz assignment, perhaps a remainder from i386 > support? Copy-and-paste is an explanation but not an excuse. Style violations should not be cloned and thus furthered. >> > +printk("Tolerating vtsc jitter for domUs: %u kHz.\n", > vtsc_tolerance_khz); >> Please omit the full stop; the printk() in context above shouldn't >> have one either. > > You mean the trailing dot, or what means "full stop" in this context? Yes, "full stop" means the final period in a sentence. >> > +disable_vtsc = khz_diff <= vtsc_tolerance_khz; >> > + >> > +printk(XENLOG_G_INFO "d%d: host has %lu kHz," >> > + " domU expects %u kHz," >> > + " difference of %ld is %s tolerance of %u\n", >> > + d->domain_id, cpu_khz, gtsc_khz, khz_diff, >> > + disable_vtsc ? "within" : "outside", >> > + vtsc_tolerance_khz); >> > +} >> > + >> > if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() && >> > - (d->arch.tsc_khz == cpu_khz || >> > + (disable_vtsc || >> >(is_hvm_domain(d) && >> > hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) ) >> > { >> >> In any event I don't follow why all of the sudden this becomes >> an always-on mode, with not even a boot command line override. > > Perhaps I failed to explain why there is no need to make this a knob. > > If a domU uses TSC as its timesource, and if it also uses NTP to make > sure the time advances correctly, then this change will make sure the > advancing of time will be withing the bounds that NTP can correct. > If it does use TSC, but does not use NTP then the advancing will be > undefined before and after my change. As per above - I'm lost. I simply don't understand. All I notice is that you talk about one specific use case of the TSC in a guest, without (apparently) considering uses for other than what Linux calls its clocksource. I in particular don't understand
Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
On Thu, Dec 13, 2018 at 02:44:00AM +, Tian, Kevin wrote: > btw can you also capture ISR/IRR/PPR right before local_irq_enable()? > though I didn't see a reason why code in-between may impact those > bits, it doesn't hurt to capture the context right before interrupt is > raised. :-) I've done that and the result is the same as the ones that are currently printed on the trace, there's no change to the registers between the point where they are printed and the call to local_irq_enable. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware
>>> On 13.12.18 at 02:28, wrote: > btw I checked your original mail: > > (XEN)[] mwait-idle.c#mwait_idle+0x2a5/0x381 > xen/arch/x86/cpu/mwait-idle.c:802 > >788if (cpu_is_haltable(cpu)) >789mwait_idle_with_hints(eax, > MWAIT_ECX_INTERRUPT_BREAK); >790 >791after = cpuidle_get_tick(); >792 >793cstate_restore_tsc(); >794trace_exit_reason(irq_traced); >795TRACE_6D(TRC_PM_IDLE_EXIT, cx->type, after, >796irq_traced[0], irq_traced[1], irq_traced[2], > irq_traced[3]); >797 >798/* Now back in C0. */ >799update_idle_stats(power, cx, before, after); >800local_irq_enable(); >801 > -> 802if (!(lapic_timer_reliable_states & (1 << cstate))) >803lapic_timer_on(); >804 >805sched_tick_resume(); >806cpufreq_dbs_timer_resume(); > > Looks above code is different from staging: > > acpi_processor_idle: > acpi_idle_do_entry: > acpi_processor_ffh_cstate_enter: > mwait_idle_with_hints > > there is no mwait_idle alone. and even with compiler optimization I didn't > find code sequence like above... You're looking at two entirely different code paths, only one of which can be in use in any particular case: Either the idle entering routine used is acpi_processor_idle(), or (when the processor is supported by that driver code) it is mwait_idle(). See mwait_idle_init() for when the latter gets used; the former may get installed at the point the Dom0 kernel reports ACPI C-state data (and only when mwait_idle() is not in use). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [linux-3.18 bisection] complete test-amd64-amd64-pair
>>> On 12.12.18 at 22:41, wrote: > branch xen-unstable > xenbranch xen-unstable > job test-amd64-amd64-pair > testid xen-boot/src_host > > Tree: linux > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git > Tree: linuxfirmware git://xenbits.xen.org/osstest/linux-firmware.git > Tree: qemu git://xenbits.xen.org/qemu-xen-traditional.git > Tree: qemuu git://xenbits.xen.org/qemu-xen.git > Tree: xen git://xenbits.xen.org/xen.git > > *** Found and reproduced problem changeset *** > > Bug is in tree: linux > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git > Bug introduced: 7b8052e19304865477e03a0047062d977309a22f > Bug not present: d255d18a34a8d53ccc4a019dc07e17b6e8cf6bd1 > Last fail repro: http://logs.test-lab.xenproject.org/osstest/logs/131278/ > > > commit 7b8052e19304865477e03a0047062d977309a22f > Author: Jan Beulich > Date: Mon Oct 19 04:23:29 2015 -0600 > > igb: fix NULL derefs due to skipped SR-IOV enabling _Very_ interesting. An over three years old commit was determined to cause whatever regression it is. But wait - that's the date of the mainline commit, not that of the backport (which was done a month ago). I notice that of the two original commits the combination of which the one here is supposed to fix, only one actually got backported. Hence I wonder whether backporting the one here was actually appropriate. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v11] tolerate jitter in cpu_khz calculation to avoid TSC emulation
Am Wed, 12 Dec 2018 09:39:25 -0700 schrieb "Jan Beulich" : > >>> On 12.12.18 at 16:20, wrote: > > If a domU uses TSC as clocksoure it also must run NTP in some way to > > avoid the potential drift what will most likely happen, independent of > > any migration. > Which drift? While anyone's well advised to run NTP, a completely > isolated set of systems may have no need to, if their interactions don't > depend on exactly matching time. If these hosts do not sync time to some reference host, the advancing of time is undefined before and after my change. > > The calculation of the drift is based on the time > > returned by remote servers versus how fast the local clock advances. NTP > > can handle a drift up to 500PPM. This means the local clocksource can > > run up to 500us slower or faster. This calculation is based on the TSC > > frequency of the host where the domU was started. Once a domU is > > migrated to a host of a different class, like from "2.3GHz" to "2.4GHz", > > the TSC frequency changes, but the domU kernel may not recalibrate > > itself. > > That's why we switch to emulated (or hardware scaling) mode in that > case. It's anyway not really clear to me what this entire ... > > > As a result, the drift will be larger and might be outside of > > the 500 PPM range. In addition, the kernel may notice the change of > > speed in which the TSC advances and could change the clocksource. All > > this depends of course on the type of OS that is running in the domU. > > ... (up to here) paragraph is supposed to tell the reader. > > > @@ -1885,6 +1888,16 @@ void __init early_time_init(void) > > printk("Detected %lu.%03lu MHz processor.\n", > > cpu_khz / 1000, cpu_khz % 1000); > > > > +tmp = 1000 * 1000; > > +tmp += VTSC_NTP_PPM_TOLERANCE; > > +tmp *= cpu_khz; > > +tmp /= 1000 * 1000; > > +tmp -= cpu_khz; > > +if (tmp >= VTSC_JITTER_RANGE_KHZ) > > +tmp -= VTSC_JITTER_RANGE_KHZ; > > Besides the style issue in the if() - how can this be correct? This > clearly introduces a discontinuity (just consider the case where > tmp is exactly VTSC_JITTER_RANGE_KHZ before the if()). And > I also can't see how it guarantees the resulting value to be > below VTSC_JITTER_RANGE_KHZ. Did you perhaps mean to > cap the value (i.e. = instead of -= )? This is supposed to make sure the value of Hz for 500us is always larger than the assumed jitter. So for a 2GHz host, with a theoretical tolerance of 1000, the actual tolerance is set to 800. tmp will be larger than the assumed jitter for cpu_khz > 400. > > +vtsc_tolerance_khz = (unsigned int)tmp; > Stray cast. Copy from the cpu_khz assignment, perhaps a remainder from i386 support? > > +printk("Tolerating vtsc jitter for domUs: %u kHz.\n", > > vtsc_tolerance_khz); > Please omit the full stop; the printk() in context above shouldn't > have one either. You mean the trailing dot, or what means "full stop" in this context? > > +disable_vtsc = khz_diff <= vtsc_tolerance_khz; > > + > > +printk(XENLOG_G_INFO "d%d: host has %lu kHz," > > + " domU expects %u kHz," > > + " difference of %ld is %s tolerance of %u\n", > > + d->domain_id, cpu_khz, gtsc_khz, khz_diff, > > + disable_vtsc ? "within" : "outside", > > + vtsc_tolerance_khz); > > +} > > + > > if ( tsc_mode == TSC_MODE_DEFAULT && host_tsc_is_safe() && > > - (d->arch.tsc_khz == cpu_khz || > > + (disable_vtsc || > >(is_hvm_domain(d) && > > hvm_get_tsc_scaling_ratio(d->arch.tsc_khz))) ) > > { > > In any event I don't follow why all of the sudden this becomes > an always-on mode, with not even a boot command line override. Perhaps I failed to explain why there is no need to make this a knob. If a domU uses TSC as its timesource, and if it also uses NTP to make sure the time advances correctly, then this change will make sure the advancing of time will be withing the bounds that NTP can correct. If it does use TSC, but does not use NTP then the advancing will be undefined before and after my change. Olaf pgpYPccpEZ4nz.pgp Description: Digitale Signatur von OpenPGP ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH V2] x86/vm_event: block interrupt injection for sync vm_events
On 12/13/18 8:54 AM, Tian, Kevin wrote: >> From: Razvan Cojocaru [mailto:rcojoc...@bitdefender.com] >> Sent: Tuesday, December 11, 2018 8:33 PM >> >>> In any case, I think you want to rename the function and/or document >>> more that expected behavior. >> >> You're right, I should probably rename that function / variable to >> better reflect what it signifies - that sync vm_event processing is in >> progress. For VMX and SVM, that simply means that interrupts will be >> blocked, and the value of the variable will be correct and possibly >> useful for ARM as well. >> > > what about vm_event_block_interrupt_injection? in that case > it's injection instead of interrupt itself being blocked. blocking > injection should mean same thing cross archs? Of course, if Julien agrees with the change I'll rename it as suggested. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel