[Xen-devel] Redhat 6 VM crash on Xen when cpu number reaches 64
Hi all: Now a days, we tested Redhat 6.2(6.4) on Xen(version 4.1.2). If we config the cpu number more than 32, it'll show 32 in the VM, and if we config it 64 cpus, the VM will crash and the log is list below. Can someone tell us why is this happening? thanks ===crash log= CPU: CPU feature rdtscp disabled on xen guest CPU: CPU feature constant_tsc disabled on xen guest mce: CPU supports 0 MCE banks #32 CPU32: Stuck ?? #33 CPU33: Stuck ?? #34 CPU34: Stuck ?? #35 CPU35: Stuck ?? #36 CPU36: Stuck ?? #37 CPU37: Stuck ?? #38 CPU38: Stuck ?? #39 CPU39: Stuck ?? #40 CPU40: Stuck ?? #41 CPU41: Stuck ?? #42 CPU42: Stuck ?? #43 CPU43: Stuck ?? #44 CPU44: Stuck ?? #45 CPU45: Stuck ?? #46 CPU46: Stuck ?? #47 CPU47: Stuck ?? #48 CPU48: Stuck ?? #49 CPU49: Stuck ?? #50 CPU50: Stuck ?? #51 CPU51: Stuck ?? #52 CPU52: Stuck ?? #53 CPU53: Stuck ?? #54 CPU54: Stuck ?? #55 CPU55: Stuck ?? #56 CPU56: Stuck ?? #57 CPU57: Stuck ?? #58 CPU58: Stuck ?? #59 CPU59: Stuck ?? #60 CPU60: Stuck ?? #61 CPU61: Stuck ?? #62 CPU62: Stuck ?? #63 CPU63: Stuck ?? Brought up 32 CPUs Total of 32 processors activated (153583.72 BogoMIPS). devtmpfs: initialized regulator: core version 0.5 NET: Registered protocol family 16 ACPI: bus type pci registered PCI: Using configuration type 1 for base access bio: create slab at 0 ACPI: Interpreter enabled ACPI: (supports S0 S5) ACPI: Using IOAPIC for interrupt routing ACPI: No dock devices found. HEST: Table not found. ACPI: PCI Root Bridge [PCI0] (:00) pci :00:01.3: quirk: region b000-b03f claimed by PIIX4 ACPI pci :00:01.3: quirk: region b100-b10f claimed by PIIX4 SMB ACPI: PCI Interrupt Link [LNKA] (IRQs *5 10 11) ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11) ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11) ACPI: PCI Interrupt Link [LNKD] (IRQs *5 10 11) vgaarb: device added: PCI::00:02.0,decodes=io+mem,owns=io+mem,locks=none vgaarb: loaded vgaarb: bridge control possible :00:02.0 SCSI subsystem initialized usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb PCI: Using ACPI for IRQ routing NetLabel: Initializing NetLabel: domain hash size = 128 NetLabel: protocols = UNLABELED CIPSOv4 NetLabel: unlabeled traffic allowed by default Switching to clocksource jiffies pnp: PnP ACPI init ACPI: bus type pnp registered pnp: PnP ACPI: found 11 devices ACPI: ACPI bus type pnp unregistered system 00:00: iomem range 0x0-0x9 could not be reserved system 00:02: ioport range 0x8a0-0x8a3 has been reserved system 00:02: ioport range 0xcc0-0xccf has been reserved system 00:02: ioport range 0x4d0-0x4d1 has been reserved system 00:0a: ioport range 0xae00-0xae0f has been reserved system 00:0a: ioport range 0xb044-0xb047 has been reserved NET: Registered protocol family 2 IP route cache hash table entries: 65536 (order: 7, 524288 bytes) TCP established hash table entries: 262144 (order: 10, 4194304 bytes) TCP bind hash table entries: 65536 (order: 8, 1048576 bytes) TCP: Hash tables configured (established 262144 bind 65536) TCP reno registered NET: Registered protocol family 1 pci :00:00.0: Limiting direct PCI/PCI transfers pci :00:01.0: PIIX3: Enabling Passive Release pci :00:01.0: Activating ISA DMA hang workarounds Trying to unpack rootfs image as initramfs... Freeing initrd memory: 15508k freed audit: initializing netlink socket (disabled) type=2000 audit(1427337937.473:1): initialized HugeTLB registered 2 MB page size, pre-allocated 0 pages VFS: Disk quotas dquot_6.5.2 Dquot-cache hash table entries: 512 (order 0, 4096 bytes) msgmni has been set to 2912 alg: No test for stdrng (krng) ksign: Installing public key data Loading keyring - Added public key 4BB1E63D18B42658 - User ID: Red Hat, Inc. (Kernel Module GPG key) - Added public key D4A26C9CCD09BEDA - User ID: Red Hat Enterprise Linux Driver Update Program Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252) io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) pci_hotplug: PCI Hot Plug PCI Core version: 0.5 pciehp: PCI Express Hot Plug Controller Driver version: 0.4 acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 acpiphp: Slot [4] registered acpiphp: Slot [5] registered acpiphp: Slot [6] registered acpiphp: Slot [7] registered acpiphp: Slot [8] registered acpiphp: Slot [9] registered acpiphp: Slot [10] registered acpiphp: Slot [11] registered acpiphp: Slot [12] registered acpiphp: Slot [13] registered acpiphp: Slot [14] registered acpiphp: Slot [15] registered acpiphp: Slot [16] registered acpiphp: Slot [17] registered acpiphp: Slot [18] registered acpiphp: Slot [19] registered acpiphp: Slot [20] registered acpiphp: Slot [21] registered acpiphp: Slot [22] registered acpiphp: Slot [23] registered acpiphp: Slot [24] registered acpiphp: Slot [25] r
[Xen-devel] linux-next: manual merge of the xen-tip tree with the arm64-acpi tree
Hi all, Today's linux-next merge of the xen-tip tree got a conflict in drivers/xen/Kconfig between commit 94ccae47e02d ("XEN / ACPI: Make XEN ACPI depend on X86") from the arm64-acpi tree and commit 628c28eefd6f ("xen: unify foreign GFN map/unmap for auto-xlated physmap guests") from the xen-tip tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc drivers/xen/Kconfig index a31cd29b68a8,afc39ca5cc4f.. --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@@ -253,8 -253,10 +253,14 @@@ config XEN_EF def_bool y depends on X86_64 && EFI +config XEN_ACPI + def_bool y + depends on X86 && ACPI + + config XEN_AUTO_XLATE + def_bool y + depends on ARM || ARM64 || XEN_PVHVM + help + Support for auto-translated physmap guests. + endmenu pgpySq275iekB.pgp Description: OpenPGP digital signature ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [libvirt test] 36729: tolerable FAIL - PUSHED
flight 36729 libvirt real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/36729/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-i386-libvirt-xsm 9 guest-start fail never pass test-amd64-amd64-libvirt-xsm 9 guest-start fail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt 10 migrate-support-checkfail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 5 xen-boot fail never pass version targeted for testing: libvirt 6cf1e11cc09212f71991f1061681ea0b1ee2bc1b baseline version: libvirt 3b289a81eaa49fd96b829cf47ea5c6d84e4f65c2 People who touched revisions under test: Ján Tomko Laine Stump Luyao Huang Pavel Hrdina jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-xsm fail test-armhf-armhf-libvirt-xsm fail test-amd64-i386-libvirt-xsm fail test-amd64-amd64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass sg-report-flight on osstest.cam.xci-test.com logs: /home/xc_osstest/logs images: /home/xc_osstest/images Logs, config files, etc. are available at http://www.chiark.greenend.org.uk/~xensrcts/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Pushing revision : + branch=libvirt + revision=6cf1e11cc09212f71991f1061681ea0b1ee2bc1b + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getconfig Repos +++ perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' ++ repos=/export/home/osstest/repos ++ repos_lock=/export/home/osstest/repos/lock ++ '[' x '!=' x/export/home/osstest/repos/lock ']' ++ OSSTEST_REPOS_LOCK_LOCKED=/export/home/osstest/repos/lock ++ exec with-lock-ex -w /export/home/osstest/repos/lock ./ap-push libvirt 6cf1e11cc09212f71991f1061681ea0b1ee2bc1b + branch=libvirt + revision=6cf1e11cc09212f71991f1061681ea0b1ee2bc1b + . cri-lock-repos ++ . cri-common +++ . cri-getconfig +++ umask 002 +++ getconfig Repos +++ perl -e ' use Osstest; readglobalconfig(); print $c{"Repos"} or die $!; ' ++ repos=/export/home/osstest/repos ++ repos_lock=/export/home/osstest/repos/lock ++ '[' x/export/home/osstest/repos/lock '!=' x/export/home/osstest/repos/lock ']' + . cri-common ++ . cri-getconfig ++ umask 002 + select_xenbranch + case "$branch" in + tree=libvirt + xenbranch=xen-unstable + '[' xlibvirt = xlinux ']' + linuxbranch= + '[' x = x ']' + qemuubranch=qemu-upstream-unstable + : tested/2.6.39.x + . ap-common ++ : osst...@xenbits.xensource.com ++ : git://xenbits.xen.org/xen.git ++ : osst...@xenbits.xensource.com:/home/xen/git/xen.git ++ : git://xenbits.xen.org/staging/qemu-xen-unstable.git ++ : git://git.kernel.org ++ : git://git.kernel.org/pub/scm/linux/kernel/git ++ : git ++ : git://libvirt.org/libvirt.git ++ : osst...@xenbits.xensource.com:/home/xen/git/libvirt.git ++ : git://xenbits.xen.org/libvirt.git ++ : git://xenbits.xen.org/rumpuser-xen.git ++ : git ++ : git://xenbits.xen.org/rumpuser-xen.git ++ : osst...@xenbits.xensource.com:/home/xen/git/rumpuser-xen.git +++ besteffort_repo https://github.com/rumpkernel/rumpkernel-netbsd-src +++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src +++ cached_repo https://github.com/rumpkernel/rumpkernel-netbsd-src '[fetch=try]' +++ local repo=https://github.com/rumpkernel/rumpkernel-netbsd-src +++ local 'options=[fetch=try]' getconfig GitCacheProxy perl -e ' use Osstest; readglobalconfig(); print $c{"GitCache
Re: [Xen-devel] [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
On 03/25/2015 08:59 PM, Konrad Rzeszutek Wilk wrote: On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote: From: "Luis R. Rodriguez" It is possible to enable CONFIG_MTRR and up with it disabled at run time and yet CONFIG_X86_PAT continues to kick through fully functionally. This can happen s/fully/full/ ? for instance on Xen where MTRR is not supported but PAT is, this can happen now on Linux as of commit 47591df50 by Juergen introduced as of v3.19. s/3.19/4.0/ No, 3.19 is correct. Juergen Technically we should assume the proper CPU bits would be set to disable MTRR but we can't always rely on this. At least on the Xen Hypervisor for instance only X86_FEATURE_MTRR was disabled as of Xen 4.4 through Xen commit 586ab6a [0], but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR, or X86_FEATURE_CYRIX_ARR for instance. Oh, could you send an patch for that to Xen please? x86 mtrr code relies on quite a bit of checks for mtrr_if being set to check to see if MTRR did get set up, instead of using that lets provide a generic setter which when set we know MTRR is enabled. This s/we know MTRR is enabled/will let us know that MTRR is enabled/ also adds a few checks where they were not before which could potentially safeguard ourselves against incorrect usage of MTRR where this was not desirable. Where possible match error codes as if MTRR was disabled on arch/x86/include/asm/mtrr.h. Lastly, since disabling MTRR can happen at run time and we could end up with PAT enabled best record now on our logs when MTRR is disabled. [0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a 4.4.0-rc1~18 Cc: Andy Lutomirski Cc: Suresh Siddha Cc: Venkatesh Pallipadi Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Juergen Gross Cc: Daniel Vetter Cc: Dave Airlie Cc: Antonino Daplas Cc: Jean-Christophe Plagniol-Villard Cc: Tomi Valkeinen Cc: Dave Hansen Cc: venkatesh.pallip...@intel.com Cc: Stefan Bader Cc: konrad.w...@oracle.com Cc: ville.syrj...@linux.intel.com Cc: david.vra...@citrix.com Cc: jbeul...@suse.com Cc: toshi.k...@hp.com Cc: bhelg...@google.com Cc: Roger Pau Monné Cc: linux-fb...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: xen-de...@lists.xensource.com Signed-off-by: Luis R. Rodriguez --- arch/x86/include/asm/mtrr.h| 2 ++ arch/x86/kernel/cpu/mtrr/cleanup.c | 2 +- arch/x86/kernel/cpu/mtrr/generic.c | 5 +++-- arch/x86/kernel/cpu/mtrr/if.c | 3 +++ arch/x86/kernel/cpu/mtrr/main.c| 31 ++- 5 files changed, 31 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h index f768f62..cade917 100644 --- a/arch/x86/include/asm/mtrr.h +++ b/arch/x86/include/asm/mtrr.h @@ -31,6 +31,7 @@ * arch_phys_wc_add and arch_phys_wc_del. */ # ifdef CONFIG_MTRR +extern int mtrr_enabled; extern u8 mtrr_type_lookup(u64 addr, u64 end); extern void mtrr_save_fixed_ranges(void *); extern void mtrr_save_state(void); @@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn); extern int amd_special_default_mtrr(void); extern int phys_wc_to_mtrr_index(int handle); # else +static const int mtrr_enabled; static inline u8 mtrr_type_lookup(u64 addr, u64 end) { /* diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c b/arch/x86/kernel/cpu/mtrr/cleanup.c index 5f90b85..784dc55 100644 --- a/arch/x86/kernel/cpu/mtrr/cleanup.c +++ b/arch/x86/kernel/cpu/mtrr/cleanup.c @@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long end_pfn) * Make sure we only trim uncachable memory on machines that * support the Intel MTRR architecture: */ - if (!is_cpu(INTEL) || disable_mtrr_trim) + if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled) return 0; rdmsr(MSR_MTRRdefType, def, dummy); diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c index 09c82de..df321b2 100644 --- a/arch/x86/kernel/cpu/mtrr/generic.c +++ b/arch/x86/kernel/cpu/mtrr/generic.c @@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat) u8 prev_match, curr_match; *repeat = 0; - if (!mtrr_state_set) + /* generic_mtrr_ops is only set for generic_mtrr_ops */ + if (!mtrr_state_set || !mtrr_enabled) return 0xFF; if (!mtrr_state.enabled) @@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs) void mtrr_save_fixed_ranges(void *info) { - if (cpu_has_mtrr) + if (mtrr_enabled && cpu_has_mtrr) get_fixed_ranges(mtrr_state.fixed_ranges); } diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c index d76f13d..e9e001a 100644 --- a/arch/x86/kernel/cpu/mtrr/if.c +++ b/arch/x86/kernel/cpu/mtrr/if.c @@ -436,6 +436,9 @@ static int __init mtrr_if_init(void) { struct cpuinfo_x86 *c = &boot_cpu_data; + if (!mtrr_enabled) +
[Xen-devel] [xen-unstable test] 36728: regressions - FAIL
flight 36728 xen-unstable real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/36728/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-credit2 12 guest-localmigratefail REGR. vs. 36514 test-amd64-amd64-xl 12 guest-localmigratefail REGR. vs. 36514 test-amd64-amd64-xl-multivcpu 12 guest-localmigrate fail REGR. vs. 36514 test-amd64-amd64-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. 36514 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-sedf 12 guest-localmigratefail REGR. vs. 36514 test-amd64-amd64-xl-sedf-pin 12 guest-localmigratefail REGR. vs. 36514 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 36514 Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-armhf-armhf-xl-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl-xsm1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 1 build-check(1)blocked n/a test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-amd64-xl-xsm 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 1 build-check(1)blocked n/a build-armhf-xsm 5 xen-buildfail never pass test-amd64-amd64-rumpuserxen-amd64 13 rumpuserxen-demo-xenstorels/xenstorels.repeat fail never pass test-armhf-armhf-xl-sedf 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-midway 10 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass build-amd64-xsm 5 xen-buildfail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass version targeted for testing: xen 84066dd4ef4bb5983e246c629a26ef4f3394e5d5 baseline version: xen 3a28f760508fb35c430edac17a9efde5aff6d1d5 People who touched revisions under test: Andrew Cooper Boris Ostrovsky Daniel De Graaf Dario Faggioli Don Slutz George Dunlap Ian Campbell Ian Jackson Jan Beulich Jim Fehlig Juergen Gross Kevin Tian Konrad Rzeszutek Wilk Koushik Chakravarty Olaf Hering Pramod Devendra Quan Xu Riku Voipio Ross Lagerwall Vijaya Kumar K Vijaya Kumar K Wei Liu Wen Congyang jobs: build-amd64-xsm fail build-armhf-xsm fail build-i386-xsm pass
[Xen-devel] [rumpuserxen test] 36748: regressions - FAIL
flight 36748 rumpuserxen real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/36748/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-rumpuserxen 5 rumpuserxen-build fail REGR. vs. 33866 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866 Tests which did not succeed, but are not blocking: test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a version targeted for testing: rumpuserxen 80294081448af74270ca2dd2ba758f0ed308f46c baseline version: rumpuserxen 30d72f3fc5e35cd53afd82c8179cc0e0b11146ad People who touched revisions under test: Antti Kantee Ian Jackson Martin Lucina Wei Liu jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-i386-rumpuserxen-i386 blocked sg-report-flight on osstest.cam.xci-test.com logs: /home/xc_osstest/logs images: /home/xc_osstest/images Logs, config files, etc. are available at http://www.chiark.greenend.org.uk/~xensrcts/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 527 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 05/47] pci: add pci_iomap_wc() variants
On Mon, Mar 23, 2015 at 12:20:47PM -0500, Bjorn Helgaas wrote: > Hi Luis, > > This seems OK to me, Great. > but I'm curious about a few things. > > On Fri, Mar 20, 2015 at 6:17 PM, Luis R. Rodriguez > wrote: > > From: "Luis R. Rodriguez" > > > > This allows drivers to take advantage of write-combining > > when possible. Ideally we'd have pci_read_bases() just > > peg an IORESOURCE_WC flag for us > > We do set IORESOURCE_PREFETCH. Do you mean something different? I did not think we had a WC IORESOURCE flag. Are you saying that we can use IORESOURCE_PREFETCH for that purpose? If so then great. As I read a PCI BAR can have PCI_BASE_ADDRESS_MEM_PREFETCH and when that's the case we peg IORESOURCE_PREFETCH. That seems to be what I want indeed. Questions below. > > but where exactly > > video devices memory lie varies *largely* and at times things > > are mixed with MMIO registers, sometimes we can address > > the changes in drivers, other times the change requires > > intrusive changes. > > What does a video device address have to do with this? I do see that > if a BAR maps only a frame buffer, the device might be able to mark it > prefetchable, while if the BAR mapped both a frame buffer and some > registers, it might not be able to make it prefetchable. But that > doesn't seem like it depends on the *address*. I meant the offsets for each of those, either registers or framebuffer, and that typically they are mixed (primarily on older devices), so indeed your summary of the problem is what I meant. Let's remember that we are trying to take advantage of PAT here when available and avoid MTRR in that case, do we know that the same PCI BARs that have always historically used MTRRs had IORESOURCE_PREFETCH set, is that a fair assumption ? I realize they are different things -- but its precisely why I ask. > pci_iomap_range() already makes a cacheable mapping if > IORESOURCE_CACHEABLE; I'm guessing that you would like it to > automatically use WC if the BAR if IORESOURCE_PREFETCH, e.g., > > if (flags & IORESOURCE_CACHEABLE) > return ioremap(start, len); > if (flags & IORESOURCE_PREFETCH) > return ioremap_wc(start, len); > return ioremap_nocache(start, len); Indeed, that's exactly what I think we should strive towards. > Is there a reason not to do that? This depends on the exact defintion of IORESOURCE_PREFETCH and PCI_BASE_ADDRESS_MEM_PREFETCH and how they are used all over and accross *all devices*. This didn't look promising for starters: include/uapi/linux/pci_regs.h:#define PCI_BASE_ADDRESS_MEM_PREFETCH0x08 /* prefetchable? */ PCI_BASE_ADDRESS_MEM_PREFETCH seems to be BAR specific, so a few questions: 1) Can we rest assured for instance that if we check for PCI_BASE_ADDRESS_MEM_PREFETCH and if set that it will *only* be set on a full PCI BAR if the full PCI BAR does want WC? If not this can regress functionality. That seems risky. It however would not be risky if we used another API that did look for IORESOURCE_PREFETCH and if so use ioremap_wc() -- that way only drivers we know that do use the full PCI bar would use this API. There's a bit of a problem with this though: 2) Do we know that if a *full PCI BAR* is used for WC that PCI_BASE_ADDRESS_MEM_PREFETCH *was* definitely set for the PCI BAR? If so then the API usage would be restricted only to devices that we know *do* adhere to this. That reduces the possible uses for older drivers and can create regressions if used loosely without verification... but.. 3) If from now on we get folks to commit to uset PCI_BASE_ADDRESS_MEM_PREFETCH for full PCI BARs that do want WC perhaps newer devices / drivers will use this very consistently ? Can we bank on that and is it worth it ? 4) If a PCI BAR *does not* have PCI_BASE_ADDRESS_MEM_PREFETCH do we know it must not never want WC ? If we don't have certainty on any of the above I'm afraid we can't do much right now but perhaps we can push towards better use of PCI_BASE_ADDRESS_MEM_PREFETCH and hope folks will only use this for the full PCI BAR only if WC is desired. Thoughts? > > Although there is also arch_phys_wc_add() that makes use of > > architecture specific write-combinging alternatives (MTRR on > > x86 when a system does not have PAT) we void polluting > > pci_iomap() space with it and force drivers and subsystems > > that want to use it to be explicit. > > > > There are a few motivations for this: > > > > a) Take advantage of PAT when available > > > > b) Help bury MTRR code away, MTRR is architecture specific and on > >x86 its replaced by PAT > > > > c) Help with the goal of eventually using _PAGE_CACHE_UC over > >_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e) > > ... > > > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev, > > +int bar, > > +unsigned long offset, > > +unsigned long maxlen) > > +{ > > + resource_size_t start = pci
[Xen-devel] [PATCH] libxl: Fix memory leak if pthread_create fails.
If we fail to create the thread we leak the shutdown_info structure. Signed-off-by: Konrad Rzeszutek Wilk --- src/libxl/libxl_domain.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c index 774b070..0ac5c62 100644 --- a/src/libxl/libxl_domain.c +++ b/src/libxl/libxl_domain.c @@ -482,7 +482,7 @@ libxlDomainEventHandler(void *data, VIR_LIBXL_EVENT_CONST libxl_event *event) libxlDriverPrivatePtr driver = data; virDomainObjPtr vm = NULL; libxl_shutdown_reason xl_reason = event->u.domain_shutdown.shutdown_reason; -struct libxlShutdownThreadInfo *shutdown_info; +struct libxlShutdownThreadInfo *shutdown_info = NULL; virThread thread; libxlDriverConfigPtr cfg; @@ -535,6 +535,7 @@ libxlDomainEventHandler(void *data, VIR_LIBXL_EVENT_CONST libxl_event *event) virObjectUnref(cfg); if (vm) virObjectUnlock(vm); +VIR_FREE(shutdown_info); } void -- 1.8.4.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/3] libxl: acquire a job when destroying a domain
On Wed, Mar 25, 2015 at 02:08:35PM -0600, Jim Fehlig wrote: > A job should be acquired at the beginning of a domain destroy operation, > not at the end when cleaning up the domain. Fix two occurances of this > late job acquisition in the libxl driver. Doing so renders > libxlDomainCleanup unused, so it is removed. > > Signed-off-by: Jim Fehlig > --- > src/libxl/libxl_domain.c | 49 > +--- > src/libxl/libxl_domain.h | 4 > src/libxl/libxl_driver.c | 20 > 3 files changed, 29 insertions(+), 44 deletions(-) > > diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c > index eca1ff0..e240eae 100644 > --- a/src/libxl/libxl_domain.c > +++ b/src/libxl/libxl_domain.c > @@ -371,6 +371,9 @@ libxlDomainShutdownThread(void *opaque) > > cfg = libxlDriverConfigGet(driver); > > +if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) > +goto cleanup; > + Won't this a deadlock with 'libxlDomainAutoCoreDump' (line 410) (which also calls : 727 if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) 728 goto cleanup; well, not deadlock - but spin up to 30 seconds? and then give up? Perhaps this patch should be folded in? >From 9f2bac0c28815fc51850290c4b1962881691bfca Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 25 Mar 2015 22:34:51 -0400 Subject: [PATCH] squash --- src/libxl/libxl_domain.c | 4 1 file changed, 4 deletions(-) diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c index aef0157..774b070 100644 --- a/src/libxl/libxl_domain.c +++ b/src/libxl/libxl_domain.c @@ -723,15 +723,11 @@ libxlDomainAutoCoreDump(libxlDriverPrivatePtr driver, timestr) < 0) goto cleanup; -if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) -goto cleanup; - /* Unlock virDomainObj while dumping core */ virObjectUnlock(vm); libxl_domain_core_dump(cfg->ctx, vm->def->id, dumpfile, NULL); virObjectLock(vm); -ignore_value(libxlDomainObjEndJob(driver, vm)); ret = 0; cleanup: -- 1.8.4.2 > if (xl_reason == LIBXL_SHUTDOWN_REASON_POWEROFF) { > dom_event = virDomainEventLifecycleNewFromObj(vm, > VIR_DOMAIN_EVENT_STOPPED, > @@ -384,7 +387,7 @@ libxlDomainShutdownThread(void *opaque) > goto restart; > case VIR_DOMAIN_LIFECYCLE_PRESERVE: > case VIR_DOMAIN_LIFECYCLE_LAST: > -goto cleanup; > +goto endjob; > } > } else if (xl_reason == LIBXL_SHUTDOWN_REASON_CRASH) { > dom_event = virDomainEventLifecycleNewFromObj(vm, > @@ -399,7 +402,7 @@ libxlDomainShutdownThread(void *opaque) > goto restart; > case VIR_DOMAIN_LIFECYCLE_CRASH_PRESERVE: > case VIR_DOMAIN_LIFECYCLE_CRASH_LAST: > -goto cleanup; > +goto endjob; > case VIR_DOMAIN_LIFECYCLE_CRASH_COREDUMP_DESTROY: > libxlDomainAutoCoreDump(driver, vm); > goto destroy; > @@ -420,11 +423,11 @@ libxlDomainShutdownThread(void *opaque) > goto restart; > case VIR_DOMAIN_LIFECYCLE_PRESERVE: > case VIR_DOMAIN_LIFECYCLE_LAST: > -goto cleanup; > +goto endjob; > } > } else { > VIR_INFO("Unhandled shutdown_reason %d", xl_reason); > -goto cleanup; > +goto endjob; > } > > destroy: > @@ -433,13 +436,11 @@ libxlDomainShutdownThread(void *opaque) > dom_event = NULL; > } > libxl_domain_destroy(cfg->ctx, vm->def->id, NULL); > -if (libxlDomainCleanupJob(driver, vm, reason)) { > -if (!vm->persistent) { > -virDomainObjListRemove(driver->domains, vm); > -vm = NULL; > -} > -} > -goto cleanup; > +libxlDomainCleanup(driver, vm, reason); > +if (!vm->persistent) > +virDomainObjListRemove(driver->domains, vm); > + > +goto endjob; > > restart: > if (dom_event) { > @@ -447,13 +448,17 @@ libxlDomainShutdownThread(void *opaque) > dom_event = NULL; > } > libxl_domain_destroy(cfg->ctx, vm->def->id, NULL); > -libxlDomainCleanupJob(driver, vm, VIR_DOMAIN_SHUTOFF_SHUTDOWN); > +libxlDomainCleanup(driver, vm, VIR_DOMAIN_SHUTOFF_SHUTDOWN); > if (libxlDomainStart(driver, vm, false, -1) < 0) { > virErrorPtr err = virGetLastError(); > VIR_ERROR(_("Failed to restart VM '%s': %s"), >vm->def->name, err ? err->message : _("unknown error")); > } > > + endjob: > +if (!libxlDomainObjEndJob(driver, vm)) > +vm = NULL; > + > cleanup: > if (vm) > virObjectUnlock(vm); > @@ -690,26 +695,6 @@ libxlDomainCleanup(libxlDriverPrivatePtr driver, > } > > /* > - * Cleanup function for domain that has reached shutoff state. > - * Executed in the context of a job. > - * > - * virDomainObjP
Re: [Xen-devel] [PATCH v6 04/30] xen/PCI: Don't use deprecated function pci_scan_bus_parented()
That seems OK to me. Probably still wrong, but no worse than it was before. >>> >>> Interesting. The mechanism for PCI passthrough can either synthesize >>> and PCI bus number starting at zero (so first device is always 0:0:0.0) >>> or it can replicate the backend PCI topology. That means you >>> could have segment values passed in, so: ab:ff:00.1). I've to admin >>> I hadn't tried the 'physical' replication on an machine with >>> domains (err, segments). >>> >>> Is there an git tree with this so I can just try it out? >> >> git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git >> pci/enumeration-yw6 has similar code (it exports the single > > I presume now it is bjorn/pci/enumeration-yw8 ? Going to test this out > this week. Yes, it's the latest version. thanks! >> busn_resource and makes xen use it). That should be functionally >> identical to what v4.0-rc1 does. >> >> Yijing hasn't posted the static busn_res proposal above yet, so I >> don't have a branch with that in it. >> >> Bjorn > > . > -- Thanks! Yijing ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v3][PATCH 2/2] libxl: introduce gfx_passthru_kind
On 2015/3/25 18:32, Ian Campbell wrote: On Wed, 2015-03-25 at 09:10 +0800, Chen, Tiejun wrote: +But when given as a string the B option describes the type +of device to enable. Not this behavior is only supported with upstream "Note" and "the upstream..." Fixed. +=item "igd" + +Enables graphics device PCI passthrough but force set the type of device +with the Intel Graphics Device. "but force set the type" doesn't scan very well. "... forcing the type of device to Intel Graphics Device" works I think. Fine to me as well. I understand what you mean but that table just includes IGDs existed on BDW and HSW. Because in the case of qemu upstream we're just covering these platforms, and with our discussion we don't have any plan to add those legacy platforms in the future. But qemu-xen-traditional still covers those platforms. So I'm afraid its not good to check this with that table as well. Hrm, OK. I suppose we can live with autodetect and igd both meaning igd and whoever adds a new type will have to remember to add a check for qemu-trad then. When we really have to introduce a new type, this means we probably need to change something inside qemu codes. So a new type should just go into that table to support qemu upstream since now we shouldn't refactor anything in qemu-xen-traditional, right? Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] One question to lowlevel/xl/xl.c and lowlevel/xc/xc.c
On 2015/3/25 18:26, Ian Campbell wrote: On Wed, 2015-03-25 at 09:18 +0800, Chen, Tiejun wrote: Actually my problem is that, I need to add a new parameter, 'flag', like this, xc_assign_device(xxx,xxx,flag). So if I don't refine xc.c, tools can't be compiled successfully. Or maybe you're suggesting I may isolate this file while building tools, right? I answered this in my original reply: it is acceptable for the new parameter to not be plumbed up to the Python bindings until someone comes along with a requirement to use it from Python. IOW you can just pass whatever the nop value is for the new argument. Yes, I knew this but I'm just getting a little confusion again after we're starting to talk if xc.c is compiled... And I will try to do this. The "nop value" is whatever value should be passed to retain the current behaviour. Sure. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [qemu-upstream-4.4-testing test] 36726: regressions - FAIL
flight 36726 qemu-upstream-4.4-testing real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/36726/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemuu-winxpsp3 7 windows-install fail REGR. vs. 31663 Tests which did not succeed, but are not blocking: test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xend-winxpsp3 17 leak-check/check fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass version targeted for testing: qemuud173a0c20d7970c17fa593cf86abc1791a8a4a3a baseline version: qemuub04df88d41f64fc6b56d193b6e90fb840cedb1d3 People who touched revisions under test: Benoit Canet Benoît Canet Dmitry Fleytman Gerd Hoffmann Jason Wang Jeff Cody Juan Quintela Kevin Wolf Laszlo Ersek Michael Roth Michael S. Tsirkin Peter Maydell Petr Matousek Stefan Hajnoczi Stefano Stabellini jobs: build-amd64-xend pass build-i386-xend pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl pass test-amd64-i386-xl pass test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-win7-amd64 fail test-amd64-i386-xl-win7-amd64fail test-amd64-amd64-xl-credit2 pass test-amd64-i386-freebsd10-i386 pass test-amd64-amd64-xl-pcipt-intel fail test-amd64-i386-rhel6hvm-intel pass test-amd64-i386-qemut-rhel6hvm-intel pass test-amd64-i386-qemuu-rhel6hvm-intel pass test-amd64-amd64-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-xl-multivcpupass test-amd64-amd64-pairpass test-amd64-i386-pair pass test-amd64-amd64-xl-sedf-pin pass test-amd64-amd64-pv pass test-amd64-i386-pv pass test-amd64-amd64-xl-sedf pass test-amd64-i386-xl-qemut-w
Re: [Xen-devel] [PATCH 3/3] xen: arm: handle thumb mode guest in continuations
Hi Ian, On 25/03/2015 15:34, Ian Campbell wrote: PC only needs adjusting by 2, otherwise we rerun the instruction prior to the hvc as well. I don't understand why you have to adjust PC by 2 for thumb. The spec encodes the HVC thumb instruction on 32 bits (i.e 4 bytes). Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/3] xen: arm: correctly handle continuations for 64-bit guests
Hi Ian, On 25/03/2015 15:34, Ian Campbell wrote: The 64-bit ABI is different to 32-bit: - uses x16 as the op register rather than r12. - arguments in x0..x5 and not r0..r5. Using rN here potentially truncates. - return value goes in x0, not r0. Hypercalls can only be made directly from kernel space, so checking the domain's size is sufficient. The update of regs->pc is duplicated in both halves because the 32-bit case is going to need fixing to handle Thumb mode (next patch). Spotted due to spurious -EFAULT when destroying a domain, due to the hypercall's pointer argument being truncated. I'm unclear why I am only seeing this now. Good catch! x16 would still contain the valid operation, because we are (most of the time?) continuing on the same hypercall. So the only issue would be argument truncation. I guess that we don't have big value (i.e > 32 bits) to store. Signed-off-by: Ian Campbell --- I imagine this needs backporting everywhere... Agree for Xen 4.4 and Xen 4.5. Reviewed-by: Julien Grall Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-3.10 test] 36723: regressions - FAIL
flight 36723 linux-3.10 real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/36723/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-winxpsp3 7 windows-install fail REGR. vs. 26303 Tests which are failing intermittently (not blocking): test-amd64-amd64-xl-qemuu-ovmf-amd64 7 debian-hvm-install fail pass in 36615 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-rumpuserxen-amd64 11 rumpuserxen-demo-xenstorels/xenstorels fail blocked in 26303 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 26303 test-amd64-amd64-xl-winxpsp3 7 windows-install fail like 26303 test-amd64-amd64-rumpuserxen-amd64 8 guest-start fail in 36615 blocked in 26303 test-amd64-amd64-xl-qemuu-debianhvm-amd64 7 debian-hvm-install fail in 36615 blocked in 26303 Tests which did not succeed, but are not blocking: test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-amd64-amd64-xl-xsm 9 guest-start fail never pass test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-i386-libvirt-xsm 9 guest-start fail never pass test-amd64-i386-xl-xsm9 guest-start fail never pass test-amd64-amd64-libvirt-xsm 9 guest-start fail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt-xsm 5 xen-boot fail never pass test-armhf-armhf-xl-multivcpu 5 xen-boot fail never pass test-armhf-armhf-xl-sedf-pin 5 xen-boot fail never pass test-armhf-armhf-xl-midway5 xen-boot fail never pass test-armhf-armhf-xl-credit2 5 xen-boot fail never pass test-armhf-armhf-libvirt 5 xen-boot fail never pass test-armhf-armhf-xl-xsm 5 xen-boot fail never pass test-armhf-armhf-xl 5 xen-boot fail never pass test-armhf-armhf-xl-sedf 5 xen-boot fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass version targeted for testing: linux7f4e64246049cef5ae1eca37eec1701a9477799e baseline version: linuxbe67db109090b17b56eb8eb2190cd70700f107aa 997 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass
[Xen-devel] [linux-3.16 test] 36722: regressions - FAIL
flight 36722 linux-3.16 real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/36722/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-amd64-pvops 5 kernel-build fail REGR. vs. 34167 test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. 34167 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 1 build-check(1)blocked n/a test-amd64-amd64-xl-xsm 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 1 build-check(1)blocked n/a test-amd64-amd64-libvirt-xsm 1 build-check(1) blocked n/a test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-amd64-xl-pvh-intel 1 build-check(1) blocked n/a test-amd64-i386-libvirt-xsm 9 guest-start fail never pass test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-armhf-armhf-xl-sedf 10 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 10 migrate-support-checkfail never pass test-amd64-i386-xl-xsm9 guest-start fail never pass test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail never pass test-amd64-amd64-libvirt 1 build-check(1) blocked n/a test-armhf-armhf-xl-midway 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 1 build-check(1) blocked n/a test-amd64-amd64-xl-pcipt-intel 1 build-check(1) blocked n/a test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-multivcpu 1 build-check(1) blocked n/a test-amd64-amd64-xl-credit2 1 build-check(1) blocked n/a test-armhf-armhf-xl-xsm 5 xen-boot fail never pass test-amd64-amd64-xl 1 build-check(1) blocked n/a test-armhf-armhf-libvirt-xsm 5 xen-boot fail never pass test-amd64-i386-freebsd10-i386 7 freebsd-install fail never pass test-amd64-amd64-xl-sedf 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-debianhvm-amd64 1 build-check(1)blocked n/a test-amd64-amd64-xl-qemuu-debianhvm-amd64 1 build-check(1)blocked n/a test-amd64-i386-freebsd10-amd64 7 freebsd-install fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-pair 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-winxpsp3 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-ovmf-amd64 1 build-check(1) blocked n/a test-amd64-amd64-xl-win7-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-amd64-xl-qemuu-win7-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-sedf-pin 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemut-win7-amd64 1 build-check(1) blocked n/a test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 1 build-check(1) blocked n/a test-amd64-amd64-xl-qemuu-winxpsp3 1 build-check(1) blocked n/a version targeted for testing: linux4923505b93e073f70380557cb360997e5b84b4f9 baseline version: linux19583ca584d6f574384e17fe7613dfaeadcdc4a6 1129 people touched revisions under test, not listing them all jobs: build-amd64-xsm pass build-armhf-xsm pass build-i386-xsm pass build-amd64 pass build-armhf pass bu
Re: [Xen-devel] [PATCH OSSTEST 04/12] toolstack: distinguish local and remote migration support
Ian Campbell wrote: > On Mon, 2015-02-09 at 11:09 +, Wei Liu wrote: > >> Libvirt supports migrating a guest to remote host but not local host. >> > > Jim, is that right? > Opps, I missed this mail. Sorry for the delay. Yes, that is correct # virsh migrate --live test-pv xen+ssh://localhost error: Requested operation is not valid: domain 'test-pv' is already active I suppose the code could be fixed up to allow localhost migration, but doing so would be quite low on my todo list. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] 3.18 xen-pcifront regression?
Konrad Rzeszutek Wilk wrote on 03/25/2015 04:27:00 PM: > From: Konrad Rzeszutek Wilk > To: Bjorn Helgaas , > Cc: Michael D Labriola , xen- > de...@lists.xenproject.org, Stuart Wehrly , > michael.d.labri...@gmail.com, Jayson A Dyke , "Rafael > J. Wysocki" , linux-...@vger.kernel.org, linux- > a...@vger.kernel.org > Date: 03/25/2015 04:27 PM > Subject: Re: [Xen-devel] 3.18 xen-pcifront regression? > > On Tue, Mar 24, 2015 at 12:27:02PM -0500, Bjorn Helgaas wrote: > > [+cc Rafael, linux-pci, linux-acpi] > > > > On Tue, Mar 24, 2015 at 11:28:06AM -0400, Konrad Rzeszutek Wilk wrote: > > > On Tue, Mar 24, 2015 at 11:14:29AM -0400, Michael D Labriola wrote: > > > > I'm having problems booting a 3.18 or newer domU w/ PCI devices passed > > > > through. It only seems to be the domU kernel that's upset > (i.e., Behavior > > > > is identical whether I'm running 3.19 or 3.13 dom0). I'm running a 32bit > > > > dom0 (3.13.11) w/ 64bit 4.4.0 hypervisor and 32bit domU. I get the > > > > following Oops when trying to boot my domU with a couple PCI cards passed > > > > through: > > > > > > > > BUG: unable to handle kernel paging request at 0030303e > > > > IP: [] acpi_ns_validate_handle+0x12/0x1a > > > > *pdpt = 019f1027 *pde = > > > > Oops: [#1] PREEMPT SMP > > > > Modules linked in: xen_pcifront(+) pcspkr xen_blkfront loop > > > > CPU: 0 PID: 18 Comm: xenwatch Not tainted 3.17.0-test+ #6 > > > > task: cb869950 ti: cb8ae000 task.ti: cb8ae000 > > > > EIP: 0061:[] EFLAGS: 00010246 CPU: 0 > > > > EIP is at acpi_ns_validate_handle+0x12/0x1a > > > > EAX: EBX: cb895dc0 ECX: EDX: 0030303a > > > > ESI: c0a6bccd EDI: EBP: 0004 ESP: cb8afd00 > > > > DS: 007b ES: 007b FS: 00d8 GS: SS: 0069 > > > > CR0: 8005003b CR2: 0030303e CR3: 0a68e000 CR4: 00040660 > > > > Stack: > > > > c06eda4d c096a21d 6462 c0c102c0 > > > > 0030303a 00040004 0030303a cb8afd94 cb8afdec cb8afd60 c06b78e1 cb8afd60 > > > > 0061 0246 c0407bc7 c0c102c0 cb8afda0 cb8afda8 cb8afdb0 > > > > Call Trace: > > > > [] ? acpi_evaluate_object+0x31/0x1fc > > > > > > We should not be calling in any acpi code in PV domU guests. > > > > > > WE actually disable it (acpi=0) to make sure we don't call it - as > > > there is no ACPI AML data at all in the guest. > > > > > > CC-ing Bjorn. > > > > [] ? resume_kernel+0x5/0x7 > > > > [] ? pci_get_hp_params+0x111/0x4e0 > > > > [] ? xen_force_evtchn_callback+0x17/0x30 > > > > [] ? xen_restore_fl_direct_reloc+0x4/0x4 > > > > [] ? pci_device_add+0x24/0x450 > > > > [] ? pci_bus_read_config_word+0x6e/0x80 > > > > [] ? pci_scan_single_device+0x8d/0xb0 > > > > [] ? pci_scan_slot+0x3c/0xf0 > > > > [] ? pci_scan_child_bus+0x1c/0x90 > > > > [] ? pci_scan_bus_parented+0x6a/0x90 > > > > [] ? pcifront_scan_root+0x91/0x130 [xen_pcifront] > > > > [] ? pcifront_backend_changed+0x4af/0x654 [xen_pcifront] > > > > [] ? xenbus_gather+0x5f/0x90 > > > > [] ? xenbus_gather+0x5f/0x90 > > > > [] ? xenbus_read_driver_state+0x33/0x50 > > > > [] ? xenbus_otherend_changed+0x95/0xa0 > > > > [] ? backend_changed+0xf/0x20 > > > > [] ? xenwatch_thread+0x72/0x110 > > > > [] ? bit_waitqueue+0x50/0x50 > > > > [] ? join+0x70/0x70 > > > > [] ? kthread+0xab/0xd0 > > > > [] ? ret_from_kernel_thread+0x21/0x30 > > > > [] ? flush_kthread_worker+0xa0/0xa0 > > > > Code: 03 10 00 00 eb 0e 46 83 c2 04 4b 85 db 75 b9 c6 02 00 31 > c0 5b 5e 5f > > > > 5d c3 89 c2 8d 40 ff 83 f8 fd 76 06 a1 2c 32 c1 c0 c3 31 c0 <80>7a 04 0f > > > > 0f 44 c2 c3 83 ec 10 83 f8 1d 76 24 89 44 24 0c c7 > > > > EIP: [] acpi_ns_validate_handle+0x12/0x1a SS:ESP 0069:cb8afd00 > > > > CR2: 0030303e > > > > ---[ end trace d4ddeb038cbcbdf7 ]--- > > > > > > > > > > > > I've bisected down to the following commit in 3.18, which breaks my > > > > system. > > > > > > > > 6cd33649fa83d97ba7b66f1d871a360e867c5220 is the first bad commit > > > > commit 6cd33649fa83d97ba7b66f1d871a360e867c5220 > > > > Author: Bjorn Helgaas > > > > Date: Wed Aug 27 14:29:47 2014 -0600 > > > > > > > > PCI: Add pci_configure_device() during enumeration > > > > > > > > Some platforms can tell the OS how to configure PCI devices,e.g., how > > > > to > > > > set cache line size, error reporting enables, etc. ACPI defines _HPP > > > > and > > > > _HPX methods for this purpose. > > > > > > > > This configuration was previously done by some of the hotplug drivers > > > > using > > > > pci_configure_slot(). But not all hotplug drivers did this, and per > > > > the > > > > spec (ACPI rev 5.0, sec 6.2.7), we can also do it for "devices not > > > > configured by the BIOS at system boot." > > > > > > > > Move this configuration into the PCI core by adding > > > > pci_configure_device() > > > > and calling it from pci_device_add(), so we do this for all > devices as > > > > we > > > > enumerate
Re: [Xen-devel] [PATCH 05/11] x86/altp2m: basic data structures and support routines.
>>> diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c >>> index abf3d7a..8fe0650 100644 >>> --- a/xen/arch/x86/mm/hap/hap.c >>> +++ b/xen/arch/x86/mm/hap/hap.c >>> @@ -439,7 +439,7 @@ void hap_domain_init(struct domain *d) >>> int hap_enable(struct domain *d, u32 mode) >>> { >>> unsigned int old_pages; >>> -uint8_t i; >>> +uint16_t i; >>> int rv = 0; >>> >>> domain_pause(d); >>> @@ -485,6 +485,23 @@ int hap_enable(struct domain *d, u32 mode) >>> goto out; >>> } >>> >>> +/* Init alternate p2m data */ >>> +if ( (d->arch.altp2m_eptp = alloc_xenheap_page()) == NULL ) >> >> This memory should be allocated from some domain-accounted pool, >> probably the paging pool (d->->arch.paging.alloc_page()). You can use >> map_domain_page_global() to get a safe pointer to anchor in >> d->arch.altp2m_eptp for hardware. I tried this but could not get it to work due to panics in Xen. Looking at the current VMX code, all the existing structures shared with hardware (VMCS, exception bitmap, etc.) are allocated using alloc_xenheap_page(), which is what induced me to write the code this way. Ed ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] 3.18 xen-pcifront regression?
> > Thanks for the report, Michael, and sorry for the inconvenience. I think > > the patch below will fix it, but I don't think it's the right fix either > > because it seems a little ad hoc to sprinkle "acpi_pci_disabled" tests > > around like fairy dust. I wonder if we can set things up so ACPI methods > > would fail gracefully like they do when ACPI is disabled at compile-time. > > > > I can boot with "acpi=off" on qemu just fine, and when we look up the ACPI > > device handles, we just get NULL pointers, so everything works out even > > without a fix like the one below. > > > > > > There must be something different about the way things get set up in that > > domU kernel. I'll try to look into that some more, but I'm going on > > vacation for the next week, so if you learn anything before then, let me > > know. > > And I don't see where 'resume_kernel' gets called. > > > And under my PV guest I see: > # lspci > 00:00.0 USB Controller: NEC Corporation Device 0194 (rev 04) > 00:01.0 USB Controller: NEC Corporation Device 0194 (rev 03) > > when the guest boots up. Hm, what kind of cards are these? > Could you also provide the config file and the guest config please? It helps when I boot an proper kernel (I had booted 3.16 by mistake). With the 4.0 kernel I see this as well: [4.059387] pci :00:00.0: reg 0x10: [mem 0xfbd0-0xfbd01fff 64bit] [4.114184] BUG: unable to handle kernel paging request at 0030303e [4.114199] IP: [] acpi_ns_validate_handle+0x1c/0x26 [4.114216] *pdpt = *pde = c2c2c2c2c2c2c2c2 [4.114230] Oops: [#1] SMP [4.114241] Modules linked in: [4.114252] CPU: 0 PID: 22 Comm: xenwatch Not tainted 4.0.0-rc5upstream-00070-g3a88f16 #1 [4.114268] task: dd557370 ti: dd55a000 task.ti: dd55a000 [4.114278] EIP: e019:[] EFLAGS: 00010246 CPU: 0 [4.114289] EIP is at acpi_ns_validate_handle+0x1c/0x26 [4.114299] EAX: EBX: 0030303a ECX: EDX: [4.114310] ESI: dd66e7c0 EDI: 0030303a EBP: dd55bcd8 ESP: dd55bcd4 [4.114322] DS: 007b ES: 007b FS: 00d8 GS: SS: e021 [4.114334] CR0: 80050033 CR2: 0030303e CR3: 019a4000 CR4: 00042660 [4.114347] Stack: [4.114354] c17ae853 dd55bd14 c137ae18 0010 c1a602c0 dd55bcf0 c109ae9a dd55bcfc [4.114376] c13a9c57 deadbeef dd55bd50 deadbeef 0030303a dd55bd78 dd55bdd0 [4.114397] dd55bd58 c1336d90 dd55bd40 007b 007b 00d8 dd55bd94 dd55bd8c [4.114419] Call Trace: [4.114429] [] acpi_evaluate_object+0x6f/0x366 [4.114443] [] ? irq_exit+0x4a/0xc0 [4.114456] [] ? xen_evtchn_do_upcall+0x27/0x40 [4.114468] [] pci_get_hp_params+0x110/0x4b0 [4.114480] [] pci_device_add+0x26/0x450 [4.114494] [] ? xen_restore_fl_direct_reloc+0x4/0x4 [4.115132] [] ? _raw_spin_unlock_irqrestore+0x16/0x40 [4.115132] [] pci_scan_single_device+0x8b/0xb0 [4.115132] [] pci_scan_slot+0x43/0x100 [4.115132] [] pci_scan_child_bus+0x1c/0xa0 [4.115132] [] pci_scan_bus_parented+0x64/0x90 [4.115132] [] pcifront_scan_root+0x90/0x120 [4.115132] [] pcifront_backend_changed+0x475/0x63c [4.115132] [] ? kmemleak_free+0x20/0x50 [4.115132] [] ? kfree+0x7d/0x100 [4.115132] [] ? __cleanup+0xfd/0x180 [4.115132] [] ? xenbus_gather+0x5d/0x90 [4.115132] [] ? xenbus_read_driver_state+0x35/0x50 [4.115132] [] xenbus_otherend_changed+0x7d/0x80 [4.115132] [] backend_changed+0x12/0x20 [4.115132] [] xenwatch_thread+0x72/0x120 [4.115132] [] ? woken_wake_function+0x20/0x20 [4.115132] [] kthread+0xac/0xd0 [4.115132] [] ? xenbus_transaction_start+0x60/0x60 [4.115132] [] ret_from_kernel_thread+0x21/0x30 [4.115132] [] ? kthread_freezable_should_stop+0x60/0x60 [4.115132] Code: 4f b2 00 00 31 c0 83 c4 2c 5b 5e 5f 5d c3 90 55 89 e5 53 89 c3 e8 45 b0 00 00 8d 43 ff 83 f8 fd 76 07 a1 20 80 a6 c1 eb 09 31 c0 <80> 7b 04 0f 0f 44 c3 5b 5d c3 55 89 e5 53 89 c3 e8 1f b0 00 00 [4.115132] EIP: [] acpi_ns_validate_handle+0x1c/0x26 SS:ESP e021:dd55bcd4 [4.115132] CR2: 0030303e [4.115132] ---[ end trace 21d8bfe52b77b825 ]--- [4.115132] Kernel panic - not syncing: Fatal exception [4.115132] Kernel Offset: 0x0 from 0xc100 (relocation range: 0xc000-0xedbfdfff) The interesting thing is that under 64-bit kernels I see this: [3.304732] pci_bus :00: root bus resource [io 0x-0x]^M [3.304748] pci_bus :00: root bus resource [mem 0x-0xf]^M [3.304764] pci_bus :00: root bus resource [bus 00-ff]^M [3.305023] pci :00:00.0: [1033:0194] type 00 class 0x0c0330^M [3.322181] pci :00:00.0: reg 0x10: [mem 0xfbd0-0xfbd01fff 64bit]^M [3.377359] ACPI Exception: AE_BAD_PARAMETER, Thread 520623344 could not acquire Mutex [0x1] (20150204/utmutex-285)^M [3.377379] ACPI Exception: AE_BAD_PARAMETER, Thread 520623344 could not acquire Mutex [0x1] (20150204/utmutex
Re: [Xen-devel] [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
On Wed, Mar 25, 2015 at 04:03:46PM -0400, Konrad Rzeszutek Wilk wrote: > On Fri, Mar 20, 2015 at 04:17:54PM -0700, Luis R. Rodriguez wrote: > > From: "Luis R. Rodriguez" > > > > This lets drivers take advanate of PAT when available. This > > s/advanate/advantage/ Amended. > > should help with the transition of converting video drivers over > > to ioremap_wc() to help with the goal of eventually using > > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on > > ioremap_nocache() (de33c442e) > > Please mention the title of the patch too: > > "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), > ioremap_nocache() and pci_mmap_page_range()" Added. > > > > Cc: Suresh Siddha > > Cc: Venkatesh Pallipadi > > Cc: Ingo Molnar > > Cc: Thomas Gleixner > > Cc: Juergen Gross > > Cc: Daniel Vetter > > Cc: Andy Lutomirski > > Cc: Dave Airlie > > Cc: Antonino Daplas > > Cc: Jean-Christophe Plagniol-Villard > > Cc: Tomi Valkeinen > > Cc: linux-fb...@vger.kernel.org > > Cc: linux-ker...@vger.kernel.org > > Signed-off-by: Luis R. Rodriguez > > --- > > drivers/pci/pci.c | 14 ++ > > include/linux/pci.h | 1 + > > 2 files changed, 15 insertions(+) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 81f06e8..6afd507 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, > > int bar) > > pci_resource_len(pdev, bar)); > > } > > EXPORT_SYMBOL_GPL(pci_ioremap_bar); > > + > > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar) > > +{ > > + /* > > +* Make sure the BAR is actually a memory resource, not an IO resource > > +*/ > > + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) { > > + WARN_ON(1); > > Would it be better to use dev_warn ? That way you can see which BDF it is? > > Thought WARN will give a nice stack-trace that should easily point to the > driver so perhaps not.. Either way - up to you. I'm sticking to the style and use as with pci_ioremap_bar(). Whatever we pick we should make both use the same. More information is always better and since we do have dev_warn(), it would be nice to use that however within its use on both pci_ioremap_wc_bar() and pci_ioremap_bar() we have a use of the pdev with pci_resource_flags() and I believe if pdev is NULL we'd get a NULL dereference (dev_driver_string() is used), so it would seem it might be best to stick with a simple WARN_ON(). Arjan, any preference? Obviously if pdev is NULL your driver is dumb but as folks develop drivers this should be expected. Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] 3.18 xen-pcifront regression?
On Tue, Mar 24, 2015 at 12:27:02PM -0500, Bjorn Helgaas wrote: > [+cc Rafael, linux-pci, linux-acpi] > > On Tue, Mar 24, 2015 at 11:28:06AM -0400, Konrad Rzeszutek Wilk wrote: > > On Tue, Mar 24, 2015 at 11:14:29AM -0400, Michael D Labriola wrote: > > > I'm having problems booting a 3.18 or newer domU w/ PCI devices passed > > > through. It only seems to be the domU kernel that's upset (i.e., > > > Behavior > > > is identical whether I'm running 3.19 or 3.13 dom0). I'm running a 32bit > > > dom0 (3.13.11) w/ 64bit 4.4.0 hypervisor and 32bit domU. I get the > > > following Oops when trying to boot my domU with a couple PCI cards passed > > > through: > > > > > > BUG: unable to handle kernel paging request at 0030303e > > > IP: [] acpi_ns_validate_handle+0x12/0x1a > > > *pdpt = 019f1027 *pde = > > > Oops: [#1] PREEMPT SMP > > > Modules linked in: xen_pcifront(+) pcspkr xen_blkfront loop > > > CPU: 0 PID: 18 Comm: xenwatch Not tainted 3.17.0-test+ #6 > > > task: cb869950 ti: cb8ae000 task.ti: cb8ae000 > > > EIP: 0061:[] EFLAGS: 00010246 CPU: 0 > > > EIP is at acpi_ns_validate_handle+0x12/0x1a > > > EAX: EBX: cb895dc0 ECX: EDX: 0030303a > > > ESI: c0a6bccd EDI: EBP: 0004 ESP: cb8afd00 > > > DS: 007b ES: 007b FS: 00d8 GS: SS: 0069 > > > CR0: 8005003b CR2: 0030303e CR3: 0a68e000 CR4: 00040660 > > > Stack: > > > c06eda4d c096a21d 6462 c0c102c0 > > > 0030303a 00040004 0030303a cb8afd94 cb8afdec cb8afd60 c06b78e1 cb8afd60 > > > 0061 0246 c0407bc7 c0c102c0 cb8afda0 cb8afda8 cb8afdb0 > > > Call Trace: > > > [] ? acpi_evaluate_object+0x31/0x1fc > > > > We should not be calling in any acpi code in PV domU guests. > > > > WE actually disable it (acpi=0) to make sure we don't call it - as > > there is no ACPI AML data at all in the guest. > > > > CC-ing Bjorn. > > > [] ? resume_kernel+0x5/0x7 > > > [] ? pci_get_hp_params+0x111/0x4e0 > > > [] ? xen_force_evtchn_callback+0x17/0x30 > > > [] ? xen_restore_fl_direct_reloc+0x4/0x4 > > > [] ? pci_device_add+0x24/0x450 > > > [] ? pci_bus_read_config_word+0x6e/0x80 > > > [] ? pci_scan_single_device+0x8d/0xb0 > > > [] ? pci_scan_slot+0x3c/0xf0 > > > [] ? pci_scan_child_bus+0x1c/0x90 > > > [] ? pci_scan_bus_parented+0x6a/0x90 > > > [] ? pcifront_scan_root+0x91/0x130 [xen_pcifront] > > > [] ? pcifront_backend_changed+0x4af/0x654 [xen_pcifront] > > > [] ? xenbus_gather+0x5f/0x90 > > > [] ? xenbus_gather+0x5f/0x90 > > > [] ? xenbus_read_driver_state+0x33/0x50 > > > [] ? xenbus_otherend_changed+0x95/0xa0 > > > [] ? backend_changed+0xf/0x20 > > > [] ? xenwatch_thread+0x72/0x110 > > > [] ? bit_waitqueue+0x50/0x50 > > > [] ? join+0x70/0x70 > > > [] ? kthread+0xab/0xd0 > > > [] ? ret_from_kernel_thread+0x21/0x30 > > > [] ? flush_kthread_worker+0xa0/0xa0 > > > Code: 03 10 00 00 eb 0e 46 83 c2 04 4b 85 db 75 b9 c6 02 00 31 c0 5b 5e > > > 5f > > > 5d c3 89 c2 8d 40 ff 83 f8 fd 76 06 a1 2c 32 c1 c0 c3 31 c0 <80> 7a 04 0f > > > 0f 44 c2 c3 83 ec 10 83 f8 1d 76 24 89 44 24 0c c7 > > > EIP: [] acpi_ns_validate_handle+0x12/0x1a SS:ESP 0069:cb8afd00 > > > CR2: 0030303e > > > ---[ end trace d4ddeb038cbcbdf7 ]--- > > > > > > > > > I've bisected down to the following commit in 3.18, which breaks my > > > system. > > > > > > 6cd33649fa83d97ba7b66f1d871a360e867c5220 is the first bad commit > > > commit 6cd33649fa83d97ba7b66f1d871a360e867c5220 > > > Author: Bjorn Helgaas > > > Date: Wed Aug 27 14:29:47 2014 -0600 > > > > > > PCI: Add pci_configure_device() during enumeration > > > > > > Some platforms can tell the OS how to configure PCI devices, e.g., > > > how > > > to > > > set cache line size, error reporting enables, etc. ACPI defines _HPP > > > and > > > _HPX methods for this purpose. > > > > > > This configuration was previously done by some of the hotplug drivers > > > using > > > pci_configure_slot(). But not all hotplug drivers did this, and per > > > the > > > spec (ACPI rev 5.0, sec 6.2.7), we can also do it for "devices not > > > configured by the BIOS at system boot." > > > > > > Move this configuration into the PCI core by adding > > > pci_configure_device() > > > and calling it from pci_device_add(), so we do this for all devices > > > as > > > we > > > enumerate them. > > > > > > This is based on pci_configure_slot(), which is used by hotplug > > > drivers. > > > I omitted: > > > > > > - pcie_bus_configure_settings() because it configures MPS and MRRS, > > > which > > > requires global knowledge of the fabric and must be done later, > > > and > > > > > > - configuration of subordinate devices; that will happen when we > > > call > > > pci_device_add() for those devices. > > > > > > Because pci_configure_slot() was only done by hotplug drivers, this > > > initial > > > version of pci_configu
Re: [Xen-devel] [PATCH v1 05/47] pci: add pci_iomap_wc() variants
On Fri, Mar 20, 2015 at 04:17:55PM -0700, Luis R. Rodriguez wrote: > From: "Luis R. Rodriguez" > > This allows drivers to take advantage of write-combining > when possible. Ideally we'd have pci_read_bases() just > peg an IORESOURCE_WC flag for us but where exactly > video devices memory lie varies *largely* and at times things > are mixed with MMIO registers, sometimes we can address > the changes in drivers, other times the change requires > intrusive changes. > > Although there is also arch_phys_wc_add() that makes use of > architecture specific write-combinging alternatives (MTRR on combinging? > x86 when a system does not have PAT) we void polluting > pci_iomap() space with it and force drivers and subsystems > that want to use it to be explicit. > > There are a few motivations for this: > > a) Take advantage of PAT when available > > b) Help bury MTRR code away, MTRR is architecture specific and on >x86 its replaced by PAT > > c) Help with the goal of eventually using _PAGE_CACHE_UC over >_PAGE_CACHE_UC_MINUS on x86 on ioremap_nocache() (de33c442e) > > Cc: Andy Lutomirski > Cc: Suresh Siddha > Cc: Venkatesh Pallipadi > Cc: Ingo Molnar > Cc: Thomas Gleixner > Cc: Juergen Gross > Cc: Daniel Vetter > Cc: Dave Airlie > Cc: Bjorn Helgaas > Cc: Antonino Daplas > Cc: Jean-Christophe Plagniol-Villard > Cc: Tomi Valkeinen > Cc: Dave Hansen > Cc: Arnd Bergmann > Cc: Michael S. Tsirkin > Cc: venkatesh.pallip...@intel.com > Cc: Stefan Bader > Cc: konrad.w...@oracle.com > Cc: ville.syrj...@linux.intel.com > Cc: david.vra...@citrix.com > Cc: jbeul...@suse.com > Cc: toshi.k...@hp.com > Cc: Roger Pau Monné > Cc: linux-fb...@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Cc: xen-de...@lists.xensource.com > Signed-off-by: Luis R. Rodriguez > --- > include/asm-generic/pci_iomap.h | 14 ++ > lib/pci_iomap.c | 61 > + > 2 files changed, 75 insertions(+) > > diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h > index 7389c87..b1e17fc 100644 > --- a/include/asm-generic/pci_iomap.h > +++ b/include/asm-generic/pci_iomap.h > @@ -15,9 +15,13 @@ struct pci_dev; > #ifdef CONFIG_PCI > /* Create a virtual mapping cookie for a PCI BAR (memory or IO) */ > extern void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long > max); > +extern void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, unsigned > long max); > extern void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, >unsigned long offset, >unsigned long maxlen); > +extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, > + unsigned long offset, > + unsigned long maxlen); > /* Create a virtual mapping cookie for a port on a given PCI device. > * Do not call this directly, it exists to make it easier for architectures > * to override */ > @@ -34,12 +38,22 @@ static inline void __iomem *pci_iomap(struct pci_dev > *dev, int bar, unsigned lon > return NULL; > } > > +static inline void __iomem *pci_iomap_wc(struct pci_dev *dev, int bar, > unsigned long max) > +{ > + return NULL; > +} > static inline void __iomem *pci_iomap_range(struct pci_dev *dev, int bar, > unsigned long offset, > unsigned long maxlen) > { > return NULL; > } > +static inline void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, > +unsigned long offset, > +unsigned long maxlen) > +{ > + return NULL; > +} > #endif > > #endif /* __ASM_GENERIC_IO_H */ > diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c > index bcce5f1..30b65ae 100644 > --- a/lib/pci_iomap.c > +++ b/lib/pci_iomap.c > @@ -52,6 +52,46 @@ void __iomem *pci_iomap_range(struct pci_dev *dev, > EXPORT_SYMBOL(pci_iomap_range); > > /** > + * pci_iomap_wc_range - create a virtual WC mapping cookie for a PCI BAR > + * @dev: PCI device that owns the BAR > + * @bar: BAR number > + * @offset: map memory at the given offset in BAR > + * @maxlen: max length of the memory to map > + * > + * Using this function you will get a __iomem address to your device BAR. > + * You can access it using ioread*() and iowrite*(). These functions hide > + * the details if this is a MMIO or PIO address space and will just do what > + * you expect from them in the correct way. When possible write combining > + * is used. > + * > + * @maxlen specifies the maximum length to map. If you want to get access to > + * the complete BAR from offset to the end, pass %0 here. s/%0/0 ? Or is that some special syntax? > + * */ > +void __iomem *pci_iomap_wc_range(struct pci_dev *dev, > + int bar, > + unsigned l
[Xen-devel] [PATCH 2/3] libxl: acquire a job when destroying a domain
A job should be acquired at the beginning of a domain destroy operation, not at the end when cleaning up the domain. Fix two occurances of this late job acquisition in the libxl driver. Doing so renders libxlDomainCleanup unused, so it is removed. Signed-off-by: Jim Fehlig --- src/libxl/libxl_domain.c | 49 +--- src/libxl/libxl_domain.h | 4 src/libxl/libxl_driver.c | 20 3 files changed, 29 insertions(+), 44 deletions(-) diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c index eca1ff0..e240eae 100644 --- a/src/libxl/libxl_domain.c +++ b/src/libxl/libxl_domain.c @@ -371,6 +371,9 @@ libxlDomainShutdownThread(void *opaque) cfg = libxlDriverConfigGet(driver); +if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) +goto cleanup; + if (xl_reason == LIBXL_SHUTDOWN_REASON_POWEROFF) { dom_event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, @@ -384,7 +387,7 @@ libxlDomainShutdownThread(void *opaque) goto restart; case VIR_DOMAIN_LIFECYCLE_PRESERVE: case VIR_DOMAIN_LIFECYCLE_LAST: -goto cleanup; +goto endjob; } } else if (xl_reason == LIBXL_SHUTDOWN_REASON_CRASH) { dom_event = virDomainEventLifecycleNewFromObj(vm, @@ -399,7 +402,7 @@ libxlDomainShutdownThread(void *opaque) goto restart; case VIR_DOMAIN_LIFECYCLE_CRASH_PRESERVE: case VIR_DOMAIN_LIFECYCLE_CRASH_LAST: -goto cleanup; +goto endjob; case VIR_DOMAIN_LIFECYCLE_CRASH_COREDUMP_DESTROY: libxlDomainAutoCoreDump(driver, vm); goto destroy; @@ -420,11 +423,11 @@ libxlDomainShutdownThread(void *opaque) goto restart; case VIR_DOMAIN_LIFECYCLE_PRESERVE: case VIR_DOMAIN_LIFECYCLE_LAST: -goto cleanup; +goto endjob; } } else { VIR_INFO("Unhandled shutdown_reason %d", xl_reason); -goto cleanup; +goto endjob; } destroy: @@ -433,13 +436,11 @@ libxlDomainShutdownThread(void *opaque) dom_event = NULL; } libxl_domain_destroy(cfg->ctx, vm->def->id, NULL); -if (libxlDomainCleanupJob(driver, vm, reason)) { -if (!vm->persistent) { -virDomainObjListRemove(driver->domains, vm); -vm = NULL; -} -} -goto cleanup; +libxlDomainCleanup(driver, vm, reason); +if (!vm->persistent) +virDomainObjListRemove(driver->domains, vm); + +goto endjob; restart: if (dom_event) { @@ -447,13 +448,17 @@ libxlDomainShutdownThread(void *opaque) dom_event = NULL; } libxl_domain_destroy(cfg->ctx, vm->def->id, NULL); -libxlDomainCleanupJob(driver, vm, VIR_DOMAIN_SHUTOFF_SHUTDOWN); +libxlDomainCleanup(driver, vm, VIR_DOMAIN_SHUTOFF_SHUTDOWN); if (libxlDomainStart(driver, vm, false, -1) < 0) { virErrorPtr err = virGetLastError(); VIR_ERROR(_("Failed to restart VM '%s': %s"), vm->def->name, err ? err->message : _("unknown error")); } + endjob: +if (!libxlDomainObjEndJob(driver, vm)) +vm = NULL; + cleanup: if (vm) virObjectUnlock(vm); @@ -690,26 +695,6 @@ libxlDomainCleanup(libxlDriverPrivatePtr driver, } /* - * Cleanup function for domain that has reached shutoff state. - * Executed in the context of a job. - * - * virDomainObjPtr should be locked on invocation - * Returns true if references remain on virDomainObjPtr, false otherwise. - */ -bool -libxlDomainCleanupJob(libxlDriverPrivatePtr driver, - virDomainObjPtr vm, - virDomainShutoffReason reason) -{ -if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_DESTROY) < 0) -return true; - -libxlDomainCleanup(driver, vm, reason); - -return libxlDomainObjEndJob(driver, vm); -} - -/* * Core dump domain to default dump path. * * virDomainObjPtr must be locked on invocation diff --git a/src/libxl/libxl_domain.h b/src/libxl/libxl_domain.h index a032e46..30855a2 100644 --- a/src/libxl/libxl_domain.h +++ b/src/libxl/libxl_domain.h @@ -108,10 +108,6 @@ libxlDomainCleanup(libxlDriverPrivatePtr driver, virDomainObjPtr vm, virDomainShutoffReason reason); -bool -libxlDomainCleanupJob(libxlDriverPrivatePtr driver, - virDomainObjPtr vm, - virDomainShutoffReason reason); /* * Note: Xen 4.3 removed the const from the event handler signature. * Detect which signature to use based on diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index da4c1c7..a1977c2 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -1238,10 +1238,13 @@ libxlDomainDestroyFlags(virDomainPtr dom, if (virDomainDestroyFlagsEnsureACL(dom->conn, vm->def) < 0)
[Xen-devel] [PATCH 1/3] libxl: Move job acquisition in libxlDomainStart to callers
Let callers of libxlDomainStart decide when it is appropriate to acquire a job on the associated virDomainObj. Signed-off-by: Jim Fehlig --- src/libxl/libxl_domain.c | 24 -- src/libxl/libxl_driver.c | 53 +++- 2 files changed, 52 insertions(+), 25 deletions(-) diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c index 8feb4dc..eca1ff0 100644 --- a/src/libxl/libxl_domain.c +++ b/src/libxl/libxl_domain.c @@ -907,16 +907,13 @@ libxlDomainStart(libxlDriverPrivatePtr driver, virDomainObjPtr vm, libxl_domain_config_init(&d_config); -if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) -return ret; - cfg = libxlDriverConfigGet(driver); /* If there is a managed saved state restore it instead of starting * from scratch. The old state is removed once the restoring succeeded. */ if (restore_fd < 0) { managed_save_path = libxlDomainManagedSavePath(driver, vm); if (managed_save_path == NULL) -goto endjob; +goto cleanup; if (virFileExists(managed_save_path)) { @@ -924,7 +921,7 @@ libxlDomainStart(libxlDriverPrivatePtr driver, virDomainObjPtr vm, managed_save_path, &def, &hdr); if (managed_save_fd < 0) -goto endjob; +goto cleanup; restore_fd = managed_save_fd; @@ -938,7 +935,7 @@ libxlDomainStart(libxlDriverPrivatePtr driver, virDomainObjPtr vm, _("cannot restore domain '%s' uuid %s from a file" " which belongs to domain '%s' uuid %s"), vm->def->name, vm_uuidstr, def->name, def_uuidstr); -goto endjob; +goto cleanup; } virDomainObjAssignDef(vm, def, true, NULL); @@ -955,18 +952,18 @@ libxlDomainStart(libxlDriverPrivatePtr driver, virDomainObjPtr vm, if (libxlBuildDomainConfig(driver->reservedVNCPorts, vm->def, cfg->ctx, &d_config) < 0) -goto endjob; +goto cleanup; if (cfg->autoballoon && libxlDomainFreeMem(cfg->ctx, &d_config) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("libxenlight failed to get free memory for domain '%s'"), d_config.c_info.name); -goto endjob; +goto cleanup; } if (virHostdevPrepareDomainDevices(hostdev_mgr, LIBXL_DRIVER_NAME, vm->def, VIR_HOSTDEV_SP_PCI) < 0) -goto endjob; +goto cleanup; /* Unlock virDomainObj while creating the domain */ virObjectUnlock(vm); @@ -998,7 +995,7 @@ libxlDomainStart(libxlDriverPrivatePtr driver, virDomainObjPtr vm, virReportError(VIR_ERR_INTERNAL_ERROR, _("libxenlight failed to restore domain '%s'"), d_config.c_info.name); -goto endjob; +goto cleanup; } /* @@ -1045,7 +1042,7 @@ libxlDomainStart(libxlDriverPrivatePtr driver, virDomainObjPtr vm, libxlDomainEventQueue(driver, event); ret = 0; -goto endjob; +goto cleanup; cleanup_dom: if (priv->deathW) { @@ -1056,10 +1053,7 @@ libxlDomainStart(libxlDriverPrivatePtr driver, virDomainObjPtr vm, vm->def->id = -1; virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF, VIR_DOMAIN_SHUTOFF_FAILED); - endjob: -if (!libxlDomainObjEndJob(driver, vm)) -vm = NULL; - + cleanup: libxl_domain_config_dispose(&d_config); VIR_FREE(dom_xml); VIR_FREE(managed_save_path); diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 05f6eb1..da4c1c7 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -303,18 +303,26 @@ libxlAutostartDomain(virDomainObjPtr vm, virObjectLock(vm); virResetLastError(); +if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) { +virObjectUnlock(vm); +return ret; +} + if (vm->autostart && !virDomainObjIsActive(vm) && libxlDomainStart(driver, vm, false, -1) < 0) { err = virGetLastError(); VIR_ERROR(_("Failed to autostart VM '%s': %s"), vm->def->name, err ? err->message : _("unknown error")); -goto cleanup; +goto endjob; } ret = 0; - cleanup: -virObjectUnlock(vm); + + endjob: +if (libxlDomainObjEndJob(driver, vm)) +virObjectUnlock(vm); + return ret; } @@ -885,17 +893,26 @@ libxlDomainCreateXML(virConnectPtr conn, const char *xml, goto cleanup; def = NULL; +if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0) { +virDomainObjListRemove(driver->domains, vm); +vm = NULL; +goto cleanup; +} +
[Xen-devel] [PATCH 0/3] libxl: domain destroy fixes
This small series of patches fixes some issues wrt domain destroy in the libxl driver. The primary motivation for this work is to prevent locking the virDomainObj during long running destroy operations on large memory domains. Patch 1 moves job acquisition from libxlDomainStart to it's callers so they have more control over when the job is acquired. Patch 2 fixes a few spots where we never acquired a job during domain destroy. Patch 3 contains the interesting change, where the virDomainObj is unlocked during the long-running destroy operation. This series wraps up my work to improve parallel OpenStack Tempest runs against the libxl driver. With libvirt.git master + this series + a patched libxl [1], I've successfully run a reproducer that was hitting the same issues encountered by Tempest. [1] libxl commits from xen.git: 93699882d, f1335f0d, 4783c99a, 1c91d6fba, and 188e9c54. I'll contact the stable branch maintainers and ask them to include these commits in the next Xen 4.4.x and 4.5.x releases. Jim Fehlig (3): libxl: Move job acquisition in libxlDomainStart to callers libxl: acquire a job when destroying a domain libxl: drop virDomainObj lock when destroying a domain src/libxl/libxl_domain.c | 77 +++ src/libxl/libxl_domain.h | 4 --- src/libxl/libxl_driver.c | 78 3 files changed, 89 insertions(+), 70 deletions(-) -- 1.8.4.5 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/3] libxl: drop virDomainObj lock when destroying a domain
A destroy operation can take considerable time on large memory domains due to scrubbing the domain' memory. The operation is running in the context of a job, so unlocking the domain and allowing query operations is safe. Signed-off-by: Jim Fehlig --- src/libxl/libxl_domain.c | 4 src/libxl/libxl_driver.c | 5 - 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c index e240eae..aef0157 100644 --- a/src/libxl/libxl_domain.c +++ b/src/libxl/libxl_domain.c @@ -435,7 +435,9 @@ libxlDomainShutdownThread(void *opaque) libxlDomainEventQueue(driver, dom_event); dom_event = NULL; } +virObjectUnlock(vm); libxl_domain_destroy(cfg->ctx, vm->def->id, NULL); +virObjectLock(vm); libxlDomainCleanup(driver, vm, reason); if (!vm->persistent) virDomainObjListRemove(driver->domains, vm); @@ -447,7 +449,9 @@ libxlDomainShutdownThread(void *opaque) libxlDomainEventQueue(driver, dom_event); dom_event = NULL; } +virObjectUnlock(vm); libxl_domain_destroy(cfg->ctx, vm->def->id, NULL); +virObjectLock(vm); libxlDomainCleanup(driver, vm, VIR_DOMAIN_SHUTOFF_SHUTDOWN); if (libxlDomainStart(driver, vm, false, -1) < 0) { virErrorPtr err = virGetLastError(); diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index a1977c2..21e20bb 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -1250,7 +1250,10 @@ libxlDomainDestroyFlags(virDomainPtr dom, event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_DESTROYED); -if (libxl_domain_destroy(cfg->ctx, vm->def->id, NULL) < 0) { +virObjectUnlock(vm); +ret = libxl_domain_destroy(cfg->ctx, vm->def->id, NULL); +virObjectLock(vm); +if (ret < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Failed to destroy domain '%d'"), vm->def->id); goto endjob; -- 1.8.4.5 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
On Fri, Mar 20, 2015 at 04:50:32PM -0700, Andy Lutomirski wrote: > On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez > wrote: > > From: "Luis R. Rodriguez" > > > > This lets drivers take advanate of PAT when available. This > > should help with the transition of converting video drivers over > > to ioremap_wc() to help with the goal of eventually using > > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on > > ioremap_nocache() (de33c442e) > > > > Cc: Suresh Siddha > > Cc: Venkatesh Pallipadi > > Cc: Ingo Molnar > > Cc: Thomas Gleixner > > Cc: Juergen Gross > > Cc: Daniel Vetter > > Cc: Andy Lutomirski > > Cc: Dave Airlie > > Cc: Antonino Daplas > > Cc: Jean-Christophe Plagniol-Villard > > Cc: Tomi Valkeinen > > Cc: linux-fb...@vger.kernel.org > > Cc: linux-ker...@vger.kernel.org > > Signed-off-by: Luis R. Rodriguez > > --- > > drivers/pci/pci.c | 14 ++ > > include/linux/pci.h | 1 + > > 2 files changed, 15 insertions(+) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index 81f06e8..6afd507 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, > > int bar) > > pci_resource_len(pdev, bar)); > > } > > EXPORT_SYMBOL_GPL(pci_ioremap_bar); > > + > > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar) > > +{ > > + /* > > +* Make sure the BAR is actually a memory resource, not an IO > > resource > > +*/ > > + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) { > > + WARN_ON(1); > > + return NULL; > > + } > > if (WARN_ON(...))? Sure, they are equivalent however this follows the same exact style as pci_ioremap_bar() so if we change this one might as well change the style of pci_ioremap_bar() as well. Let me know if there is any preference. I personally don't mind the extra line as it shortens the check. Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 04/47] pci: add pci_ioremap_wc_bar()
On Fri, Mar 20, 2015 at 04:17:54PM -0700, Luis R. Rodriguez wrote: > From: "Luis R. Rodriguez" > > This lets drivers take advanate of PAT when available. This s/advanate/advantage/ > should help with the transition of converting video drivers over > to ioremap_wc() to help with the goal of eventually using > _PAGE_CACHE_UC over _PAGE_CACHE_UC_MINUS on x86 on > ioremap_nocache() (de33c442e) Please mention the title of the patch too: "x86 PAT: fix performance drop for glx, use UC minus for ioremap(), ioremap_nocache() and pci_mmap_page_range()" > > Cc: Suresh Siddha > Cc: Venkatesh Pallipadi > Cc: Ingo Molnar > Cc: Thomas Gleixner > Cc: Juergen Gross > Cc: Daniel Vetter > Cc: Andy Lutomirski > Cc: Dave Airlie > Cc: Antonino Daplas > Cc: Jean-Christophe Plagniol-Villard > Cc: Tomi Valkeinen > Cc: linux-fb...@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Signed-off-by: Luis R. Rodriguez > --- > drivers/pci/pci.c | 14 ++ > include/linux/pci.h | 1 + > 2 files changed, 15 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 81f06e8..6afd507 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -137,6 +137,20 @@ void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int > bar) >pci_resource_len(pdev, bar)); > } > EXPORT_SYMBOL_GPL(pci_ioremap_bar); > + > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar) > +{ > + /* > + * Make sure the BAR is actually a memory resource, not an IO resource > + */ > + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) { > + WARN_ON(1); Would it be better to use dev_warn ? That way you can see which BDF it is? Thought WARN will give a nice stack-trace that should easily point to the driver so perhaps not.. Either way - up to you. > + return NULL; > + } > + return ioremap_wc(pci_resource_start(pdev, bar), > + pci_resource_len(pdev, bar)); > +} > +EXPORT_SYMBOL_GPL(pci_ioremap_wc_bar); > #endif > > #define PCI_FIND_CAP_TTL 48 > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 211e9da..c235b09 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1667,6 +1667,7 @@ static inline void pci_mmcfg_late_init(void) { } > int pci_ext_cfg_avail(void); > > void __iomem *pci_ioremap_bar(struct pci_dev *pdev, int bar); > +void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar); > > #ifdef CONFIG_PCI_IOV > int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); > -- > 2.3.2.209.gd67f9d5.dirty > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/1] hvm.c: Prevent gcc uninitialised var warning
On 03/25/15 11:48, Jan Beulich wrote: On 25.03.15 at 16:02, wrote: >> On 23/03/15 15:01, Don Slutz wrote: >>> gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 reports: >>> -- >>> hvm.c: In function `hvm_create_ioreq_server': >>> hvm.c:487:18: error: `bufioreq_pfn' may be used uninitialised in this >> function >>> [-Werror=uninitialized] >>> hvm.c:718:30: note: `bufioreq_pfn' was declared here >>> -- >>> >>> My code analysis says that gcc is wrong, but initilize the variable >>> to prevent the gcc warning. >>> >>> Reported-by: Ian Murray >>> Signed-off-by: Don Slutz >> >> GCC is correct and there is path through this function where >> bufioreq_pfn is used while uninitialised (non-debug build, is_default = >> 1, handle_bufioreq = 0). > Looks like you are talking about the ASSERT(handle_bufioreq). But that does not leave bufioreq_pfn uninitialised. Currently you cannot get here in that state. The only way to have is_default=1 is: rc = hvm_create_ioreq_server(d, domid, 1, 1, NULL); Which should prevent "handle_bufioreq == 0" > I'm not seeing it: When is_default is set, bufioreq_pfn gets > initialized in the first if()'s body. > >> As an aside, the compiler is in a very easy position to spot this. The >> error means that GCC has positively identified a basic block which does >> use bufioreq_pfn before it has been initialised. >> If the compiler is right, how come the messages says: may be used which to me says that it's determination is not 100% >> I observe that the patch has been committed, but it merely hides the >> problem and doesn't fix it; hvm_free_ioreq_gmfn(d, bufioreq_pfn) will >> still clobber arbitrary memory. > hvm_free_ioreq_gmfn() does have issues. And since it is possible to call it with d->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_PFN] which can be anything. > I committed the patch based on me not being able to find a path > where the variable would actually be used uninitialized (and it is > known that various gcc versions have varying levels of problems > with correctly identifying such issues). If indeed there is a path > that I'm not seeing, then I'd indeed favor reverting and putting > in a proper, backportable fix. > I still cannot find a path where this fails. However I can provide a patch that does 2 things: 1) Change the "ASSERT(handle_bufioreq)" to "BUG_ON(handle_bufioreq);" (not sure it is required). 2) Add checking to hvm_free_ioreq_gmfn() to prevent memory clobber since it's arg may come from an HVM_PARAM. -Don Slutz > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 02/47] x86: mtrr: generalize run time disabling of MTRR
On Fri, Mar 20, 2015 at 04:17:52PM -0700, Luis R. Rodriguez wrote: > From: "Luis R. Rodriguez" > > It is possible to enable CONFIG_MTRR and up with it > disabled at run time and yet CONFIG_X86_PAT continues > to kick through fully functionally. This can happen s/fully/full/ ? > for instance on Xen where MTRR is not supported but > PAT is, this can happen now on Linux as of commit > 47591df50 by Juergen introduced as of v3.19. s/3.19/4.0/ > > Technically we should assume the proper CPU > bits would be set to disable MTRR but we can't > always rely on this. At least on the Xen Hypervisor > for instance only X86_FEATURE_MTRR was disabled > as of Xen 4.4 through Xen commit 586ab6a [0], > but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR, > or X86_FEATURE_CYRIX_ARR for instance. Oh, could you send an patch for that to Xen please? > > x86 mtrr code relies on quite a bit of checks for > mtrr_if being set to check to see if MTRR did get > set up, instead of using that lets provide a generic > setter which when set we know MTRR is enabled. This s/we know MTRR is enabled/will let us know that MTRR is enabled/ > also adds a few checks where they were not before > which could potentially safeguard ourselves against > incorrect usage of MTRR where this was not desirable. > > Where possible match error codes as if MTRR was > disabled on arch/x86/include/asm/mtrr.h. > > Lastly, since disabling MTRR can happen at run time > and we could end up with PAT enabled best record now > on our logs when MTRR is disabled. > > [0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a > 4.4.0-rc1~18 > > Cc: Andy Lutomirski > Cc: Suresh Siddha > Cc: Venkatesh Pallipadi > Cc: Ingo Molnar > Cc: Thomas Gleixner > Cc: Juergen Gross > Cc: Daniel Vetter > Cc: Dave Airlie > Cc: Antonino Daplas > Cc: Jean-Christophe Plagniol-Villard > Cc: Tomi Valkeinen > Cc: Dave Hansen > Cc: venkatesh.pallip...@intel.com > Cc: Stefan Bader > Cc: konrad.w...@oracle.com > Cc: ville.syrj...@linux.intel.com > Cc: david.vra...@citrix.com > Cc: jbeul...@suse.com > Cc: toshi.k...@hp.com > Cc: bhelg...@google.com > Cc: Roger Pau Monné > Cc: linux-fb...@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Cc: xen-de...@lists.xensource.com > Signed-off-by: Luis R. Rodriguez > --- > arch/x86/include/asm/mtrr.h| 2 ++ > arch/x86/kernel/cpu/mtrr/cleanup.c | 2 +- > arch/x86/kernel/cpu/mtrr/generic.c | 5 +++-- > arch/x86/kernel/cpu/mtrr/if.c | 3 +++ > arch/x86/kernel/cpu/mtrr/main.c| 31 ++- > 5 files changed, 31 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h > index f768f62..cade917 100644 > --- a/arch/x86/include/asm/mtrr.h > +++ b/arch/x86/include/asm/mtrr.h > @@ -31,6 +31,7 @@ > * arch_phys_wc_add and arch_phys_wc_del. > */ > # ifdef CONFIG_MTRR > +extern int mtrr_enabled; > extern u8 mtrr_type_lookup(u64 addr, u64 end); > extern void mtrr_save_fixed_ranges(void *); > extern void mtrr_save_state(void); > @@ -50,6 +51,7 @@ extern int mtrr_trim_uncached_memory(unsigned long end_pfn); > extern int amd_special_default_mtrr(void); > extern int phys_wc_to_mtrr_index(int handle); > # else > +static const int mtrr_enabled; > static inline u8 mtrr_type_lookup(u64 addr, u64 end) > { > /* > diff --git a/arch/x86/kernel/cpu/mtrr/cleanup.c > b/arch/x86/kernel/cpu/mtrr/cleanup.c > index 5f90b85..784dc55 100644 > --- a/arch/x86/kernel/cpu/mtrr/cleanup.c > +++ b/arch/x86/kernel/cpu/mtrr/cleanup.c > @@ -880,7 +880,7 @@ int __init mtrr_trim_uncached_memory(unsigned long > end_pfn) >* Make sure we only trim uncachable memory on machines that >* support the Intel MTRR architecture: >*/ > - if (!is_cpu(INTEL) || disable_mtrr_trim) > + if (!is_cpu(INTEL) || disable_mtrr_trim || !mtrr_enabled) > return 0; > > rdmsr(MSR_MTRRdefType, def, dummy); > diff --git a/arch/x86/kernel/cpu/mtrr/generic.c > b/arch/x86/kernel/cpu/mtrr/generic.c > index 09c82de..df321b2 100644 > --- a/arch/x86/kernel/cpu/mtrr/generic.c > +++ b/arch/x86/kernel/cpu/mtrr/generic.c > @@ -116,7 +116,8 @@ static u8 __mtrr_type_lookup(u64 start, u64 end, u64 > *partial_end, int *repeat) > u8 prev_match, curr_match; > > *repeat = 0; > - if (!mtrr_state_set) > + /* generic_mtrr_ops is only set for generic_mtrr_ops */ > + if (!mtrr_state_set || !mtrr_enabled) > return 0xFF; > > if (!mtrr_state.enabled) > @@ -290,7 +291,7 @@ static void get_fixed_ranges(mtrr_type *frs) > > void mtrr_save_fixed_ranges(void *info) > { > - if (cpu_has_mtrr) > + if (mtrr_enabled && cpu_has_mtrr) > get_fixed_ranges(mtrr_state.fixed_ranges); > } > > diff --git a/arch/x86/kernel/cpu/mtrr/if.c b/arch/x86/kernel/cpu/mtrr/if.c > index d76f13d..e9e001a 100644 > --- a/arch/x86/kernel/cpu/mtrr/if.c > +++ b/arch/x86/kernel/cpu/mtrr/if.c > @@ -436,6 +436,9 @@
Re: [Xen-devel] [PATCH v1 03/47] devres: add devm_ioremap_wc()
On Fri, Mar 20, 2015 at 04:49:51PM -0700, Andy Lutomirski wrote: > On Fri, Mar 20, 2015 at 4:17 PM, Luis R. Rodriguez > wrote: > > From: "Luis R. Rodriguez" > > > > We have devm_ioremap_nocache() but no devm_ioremap_wc() > > so add that. This will be used later. > > > > Cc: Suresh Siddha > > Cc: Venkatesh Pallipadi > > Cc: Ingo Molnar > > Cc: Thomas Gleixner > > Cc: Juergen Gross > > Cc: Daniel Vetter > > Cc: Andy Lutomirski > > Cc: Dave Airlie > > Cc: Antonino Daplas > > Cc: Jean-Christophe Plagniol-Villard > > Cc: Tomi Valkeinen > > Cc: linux-fb...@vger.kernel.org > > Cc: linux-ker...@vger.kernel.org > > Signed-off-by: Luis R. Rodriguez > > Looks good to me. Thanks, I'll peg a Reviewed-by. Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/9] qspinlock stuff -v15
On Mon, Mar 16, 2015 at 02:16:13PM +0100, Peter Zijlstra wrote: > Hi Waiman, > > As promised; here is the paravirt stuff I did during the trip to BOS last > week. > > All the !paravirt patches are more or less the same as before (the only real > change is the copyright lines in the first patch). > > The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more > convoluted and I've no real way to test that but it should be stright fwd to > make work. > > I ran this using the virtme tool (thanks Andy) on my laptop with a 4x > overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and > it both booted and survived a hackbench run (perf bench sched messaging -g 20 > -l 5000). > > So while the paravirt code isn't the most optimal code ever conceived it does > work. > > Also, the paravirt patching includes replacing the call with "movb $0, %arg1" > for the native case, which should greatly reduce the cost of having > CONFIG_PARAVIRT_SPINLOCKS enabled on actual hardware. Ah nice. That could be spun out as a seperate patch to optimize the existing ticket locks I presume. Now with the old pv ticketlock code an vCPU would only go to sleep once and be woken up when it was its turn. With this new code it is woken up twice (and twice it goes to sleep). With an overcommit scenario this would imply that we will have at least twice as many VMEXIT as with the previous code. I presume when you did benchmarking this did not even register? Thought I wonder if it would if you ran the benchmark for a week or so. > > I feel that if someone were to do a Xen patch we can go ahead and merge this > stuff (finally!). > > These patches do not implement the paravirt spinlock debug stats currently > implemented (separately) by KVM and Xen, but that should not be too hard to do > on top and in the 'generic' code -- no reason to duplicate all that. > > Of course; once this lands people can look at improving the paravirt nonsense. > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen: arm: Fix typo "Falltrough"
Hi Ian, On 25/03/15 15:34, Ian Campbell wrote: > Signed-off-by: Ian Campbell Reviewed-by: Julien Grall Regards, > --- > xen/arch/arm/domain.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index fdba081..939d8cd 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -742,7 +742,7 @@ int domain_relinquish_resources(struct domain *d) > { > case RELMEM_not_started: > d->arch.relmem = RELMEM_xen; > -/* Falltrough */ > +/* Fallthrough */ > > case RELMEM_xen: > ret = relinquish_memory(d, &d->xenpage_list); > -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 12/12] xen: arm: Dump guest state when invalid trap state is detected
Hi Ian, On 25/03/15 14:22, Ian Campbell wrote: > By adding GUEST_BUG_ON locally to traps.c. > > Signed-off-by: Ian Campbell Reviewed-by: Julien Grall Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 10/12] xen: arm: handle remaining traps from userspace
Hi Ian, On 25/03/15 14:22, Ian Campbell wrote: > CP14 dbg and general CP register access are both handled with > unconditional injection of #undef from their respective handlers, so > allow these even from 32-bit userspace on a 64-bit kernel. > > SMC32 and HVC32 should only come from a guest in AArch32 mode and > SMC64 and HVC64 should only come from a guest in AArch64 mode. Add > appropriate BUG_ONs to all cases. > > After this bad_trap is no longer used. > > Signed-off-by: Ian Campbell Reviewed-by: Julien Grall Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 04/30] xen/PCI: Don't use deprecated function pci_scan_bus_parented()
On Fri, Mar 13, 2015 at 09:26:08AM -0500, Bjorn Helgaas wrote: > On Fri, Mar 13, 2015 at 9:01 AM, Konrad Rzeszutek Wilk > wrote: > > On Fri, Mar 13, 2015 at 08:24:58AM -0500, Bjorn Helgaas wrote: > >> On Thu, Mar 12, 2015 at 9:36 PM, Yijing Wang wrote: > >> > + pci_add_resource(&resources, &ioport_resource); > >> > + pci_add_resource(&resources, &iomem_resource); > >> > + pci_add_resource(&resources, &busn_resource); > >> > >> Since I don't want to export busn_resource, you might have to > >> allocate your > >> own struct resource for it here. And, of course, figure out the > >> details of > >> which PCI domain you're in and whether you need to share one struct > >> resource across several host bridges in the same domain. > >> >>> > >> >>> Allocate its own resource here is ok for me, as I mentioned in > >> >>> previous reply, > >> >>> so do we still need to add additional info to figure out which domain > >> >>> own the bus resource ? > >> >> > >> >> That's up to the caller. Only the platform knows which bridges it > >> >> wants to > >> >> have in the same domain. In principle, every host bridge could be in > >> >> its > >> >> own domain, since each bridge is the root of a unique PCI hierarchy. > >> >> But > >> >> some platforms have firmware that assumes otherwise. I have no idea > >> >> what > >> >> xen assumes. > >> > > >> > I'm not xen guy, so I don't know much about it, but because it call > >> > pci_scan_bus_parented() > >> > before, and in which busn_resource is always shared for different host > >> > bridges(same domain or not), > >> > I think add a static bus resource(0,255) should be safe, at least, it > >> > would not introduce new risk. > >> > > >> > Something like: > >> > > >> > diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c > >> > index b1ffebe..a69e529 100644 > >> > --- a/drivers/pci/xen-pcifront.c > >> > +++ b/drivers/pci/xen-pcifront.c > >> > @@ -446,9 +446,15 @@ static int pcifront_scan_root(struct > >> > pcifront_device *pdev, > >> > unsigned int domain, unsigned int bus) > >> > { > >> > struct pci_bus *b; > >> > + LIST_HEAD(resources); > >> > struct pcifront_sd *sd = NULL; > >> > struct pci_bus_entry *bus_entry = NULL; > >> > int err = 0; > >> > + static struct resource busn_res = { > >> > + .start = 0, > >> > + .end = 255, > >> > + .flags = IORESOURCE_BUS, > >> > + }; > >> > > >> > #ifndef CONFIG_PCI_DOMAINS > >> > if (domain != 0) { > >> > @@ -470,17 +476,21 @@ static int pcifront_scan_root(struct > >> > pcifront_device *pdev, > >> > err = -ENOMEM; > >> > goto err_out; > >> > } > >> > + pci_add_resource(&resources, &ioport_resource); > >> > + pci_add_resource(&resources, &iomem_resource); > >> > + pci_add_resource(&resources, &busn_res); > >> > pcifront_init_sd(sd, domain, bus, pdev); > >> > > >> > pci_lock_rescan_remove(); > >> > > >> > - b = pci_scan_bus_parented(&pdev->xdev->dev, bus, > >> > - &pcifront_bus_ops, sd); > >> > + b = pci_scan_root_bus(&pdev->xdev->dev, bus, > >> > + &pcifront_bus_ops, sd, &resources); > >> > if (!b) { > >> > > >> > Bjorn, what do you think about ? > >> > >> That seems OK to me. Probably still wrong, but no worse than it was > >> before. > > > > Interesting. The mechanism for PCI passthrough can either synthesize > > and PCI bus number starting at zero (so first device is always 0:0:0.0) > > or it can replicate the backend PCI topology. That means you > > could have segment values passed in, so: ab:ff:00.1). I've to admin > > I hadn't tried the 'physical' replication on an machine with > > domains (err, segments). > > > > Is there an git tree with this so I can just try it out? > > git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git > pci/enumeration-yw6 has similar code (it exports the single I presume now it is bjorn/pci/enumeration-yw8 ? Going to test this out this week. > busn_resource and makes xen use it). That should be functionally > identical to what v4.0-rc1 does. > > Yijing hasn't posted the static busn_res proposal above yet, so I > don't have a branch with that in it. > > Bjorn ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 09/12] xen: arm: correctly handle sysreg accesses from userspace
Hi Ian, On 25/03/15 14:22, Ian Campbell wrote: > Previously we implemented all registers as RAZ/WI even if they > shouldn't be accessible to userspace. > > Accesses to the *_EL1 registers from EL0 are trapped to EL1 by the > hardware, so add a BUG_ON. Likewise accesses from 32-bit EL1 cannot > happen. It's not true on all *_EL1 registers. See PMINTENSET_EL1 for instance. > PMUSERENR_EL0 and MDCCSR_EL0 are R/O to EL0. Might be worth to explain that we forgot to implement MDCCSR_EL0 before. > > Other PM*_EL0 registers are accessible at EL0 only if PMUSERENR_EL0.EN > is set, since we emulate that as RAZ/WI we know that bit cannot be > set. The ARMv8 spec is confusing on this point Let's take PMCR_EL0 for instance. The PMUSERENR_EL0.EN trapping is not explained. Although, PMCR is trapped when PMUSERENR_EL0.EN == 0. Do you have a paragraph on the spec which clearly explain the behavior? Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 08/12] xen: arm: Handle CP14 32-bit register accesses from userspace
Hi Ian, On 25/03/15 14:22, Ian Campbell wrote: > Accesses to these from 32-bit userspace would cause a hypervisor > exception (host crash) when running a 64-bit kernel, which is worked > around by the fix to XSA-102. On 32-bit kernels they would be > implemented as RAZ/WI which is incorrect but harmless. > > Update as follows: > - DBGDSCRINT should be R/O. > - DBGDSCREXT should be EL1 only. > - DBGOSLAR is RO and EL1 only. This register is WO. Actually the implementation is right, just the commit message. Other than that: Reviewed-by: Julien Grall Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 07/12] xen: arm: Handle CP15 register traps from userspace
Hi Ian, On 25/03/15 14:22, Ian Campbell wrote: > Previously userspace access to PM* would have been incorrectly (but > benignly) implemented as RAZ/WI when running on a 32-bit kernel and > would cause a hypervisor exception (host crash) when running a 64-bit > kernel (this was already solved via the fix to XSA-102). > > CLIDR, CCSIDR, DCCISW, ACTLR, PMINTENSET, PMINTENCLR are EL1 only, > attempts to access from EL0 will trap to EL1 not to us, hence BUG_ON > is appropriate now. For PMINTENSET and PMINTENCLR the spec (ARMv8 DDI0487A rev d) says: "If MDCR_EL2.TPM==1, Non-secure accesses to this register will trap from EL1 and EL0 to EL2." As we set to 1 MDCR_EL1.TPM, EL0 access will trap to Xen. So I think we should replace the BUG_ON to injected a exception. Reading more the spec only ACTLR access from EL0 will trap to EL1. All access from EL0 to the others registers in the list above will trap to EL2. Although, the ARMv7 spec seems to say to only valid access will be trapped. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Design and Question: Eliminate Xen (RTDS) scheduler overhead on dedicated CPU
On Tue, Mar 24, 2015 at 11:27:35AM -0400, Meng Xu wrote: > 2015-03-24 7:54 GMT-04:00 George Dunlap : > > > On Tue, Mar 24, 2015 at 3:50 AM, Meng Xu wrote: > > > Hi Dario and George, > > > > > > I'm exploring the design choice of eliminating the Xen scheduler > > overhead on > > > the dedicated CPU. A dedicated CPU is a PCPU that has a full capacity > > VCPU > > > pinned onto it and no other VCPUs will run on that PCPU. > > > > Hey Meng! This sounds awesome, thanks for looking into it. > > > > :-) I think it is a useful feature for extreme low latency applications. > > > > > > > > > [Problems] > > > The issue I'm encountering is as follows: > > > After I implemented the dedicated cpu feature, I compared the latency of > > a > > > cpu-intensive task in domU on dedicated CPU (denoted as R_dedcpu) and the > > > latency on non-dedicated CPU (denoted as R_nodedcpu). The expected result > > > should be R_dedcpu < R_nodedcpu since we avoid the scheduler overhead. > > > However, the actual result is R_dedcpu > R_nodedcpu, and R_dedcpu - > > > R_nodedcpu ~= 1000 cycles. > > > > > > After adding some trace to every function that may raise the > > > SCHEDULE_SOFTIRQ, I found: > > > When a cpu is not marked as dedicated cpu and the scheduler on it is not > > > disabled, the vcpu_block() is triggered 2896 times during 58280322928ns > > > (i.e., triggered once every 20,124,421ns in average) on the dedicated > > cpu. > > > However, > > > When I disable the scheduler on a dedicated cpu, the function > > > vcpu_block(void) @schedule.c will be triggered very frequently; the > > > vcpu_block(void) is triggered 644824 times during 8,918,636,761ns (i.e., > > > once every 13831ns in average) on the dedicated cpu. > > > > > > To sum up the problem I'm facing, the vcpu_block(void) is trigger much > > > faster and more frequently when the scheduler is disabled on a cpu than > > when > > > the scheduled is enabled. > > > > > > [My question] > > > I'm very confused at the reason why vcpu_block(void) is triggered so > > > frequently when the scheduler is disabled. The vcpu_block(void) is > > called > > > by the SCHEDOP_block hypercall, but why this hypercall will be triggered > > so > > > frequently? > > > > > > It will be great if you know the answer directly. (This is just a pure > > hope > > > and I cannot really expect it. :-) ) > > > But I really appreciate it if you could give me some directions on how I > > > should figure it out. I grepped vcpu_block(void) and SCHEDOP_block in > > the > > > xen code base, but didn't found much call to them. > > > > > > What confused me most is that the dedicated VCPU should be blocked less > > > frequently instead of more frequently when the scheduler is disabled on > > the > > > dedicated CPU, because the dedicated VCPU is always running on the CPU > > now > > > without the hypervisor scheduler's interference. > > > > So if I had to guess, I would guess that you're not actually blocking > > when the guest tries to block. Normally if the guest blocks, it > > blocks in a loop like this: > > > > do { > > enable_irqs(); > > hlt; > > disable_irqs; > > } while (!interrup_pending); > > > > For a PV guest, the hlt() would be replaced with a PV block() hypercall. > > > > Normally, when a guest calls block(), then it's taken off the > > runqueue; and if there's nothing on the runqueue, then the scheduler > > will run the idle domain; it's the idle domain that actually does the > > blocking. > > > > If you've hardwired it always to return the vcpu in question rather > > than the idle domain, then it will never block -- it will busy-wait, > > calling block millions of times. > > > > The simplest way to get your prototype working, in that case, would be > > to return the idle vcpu for that pcpu if the guest is blocked. > > > > Exactly! Thank you so much for pointing this out! I did hardwired it > always to return the vcpu that is supposed to be blocked. Now I totally > understand what happened. :-) Or you could change the kernel to do an idle poll instead of using 'hlt'. And idle poll is just: do { nop;pause; } while (!interrupt_pending); On Linux I believe you do 'idle=poll' to make it do that. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 3/7] libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap.
On Tue, Mar 24, 2015 at 03:22:55PM +, Ian Campbell wrote: > On Mon, 2015-03-23 at 14:20 -0400, Konrad Rzeszutek Wilk wrote: > > There is no sense in trying to online (or offline) CPUs when the size of > > cpumap is greater than the maximum number of VCPUs the guest can go to. > > > > As such fail the operation if the count of CPUs to online is greater > > than what the guest started with. For the offline case we do not > > check (as the bits are unset in the cpumap) and let it go through. > > > > Signed-off-by: Konrad Rzeszutek Wilk > > --- > > tools/libxl/libxl.c | 27 --- > > 1 file changed, 24 insertions(+), 3 deletions(-) > > > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > > index 4152ee4..d2b5ff3 100644 > > --- a/tools/libxl/libxl.c > > +++ b/tools/libxl/libxl.c > > @@ -5449,6 +5449,20 @@ int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, > > uint32_t domid, > > return 0; > > } > > > > +static int libxl__check_max(libxl__gc *gc, libxl_dominfo *info, > > +libxl_bitmap *cpumap) > > +{ > > +int maxcpus = libxl_bitmap_count_set(cpumap); > > + > > +if (maxcpus > info->vcpu_max_id + 1) > > +{ > > +LOGE(ERROR, "You have a max of %d vCPUs and you want %d vCPUs!", > > + info->vcpu_max_id + 1, maxcpus); > > Please avoid personal pronouns in libxl logs (I don't have a max of any > number of vcpus, the domain does). Here "setting %d vcpus, max vcpus is > %d" perhaps. > > Both libxl__set_vcpuonline_qmp and libxl__set_vcpuonline_xenstore start > with the same code to get info. Please could you move that (and the > associated dispose) into libxl_set_vcpuonline and check the limit once > up front (no need for helper then) and pass the info as a pointer to the > specific functions. Would this be OK ? >From a160e64842f36f96f5da610c2498cf1770665dcb Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Fri, 20 Mar 2015 16:29:16 -0400 Subject: [PATCH v3 3/8] libxl: In libxl_set_vcpuonline check for maximum number of VCPUs against the cpumap. There is no sense in trying to online (or offline) CPUs when the size of cpumap is greater than the maximum number of VCPUs the guest can go to. As such fail the operation if the count of CPUs to online is greater than what the guest started with. For the offline case we do not check (as the bits are unset in the cpumap) and let it go through. We coalesce some of the underlaying libxl_set_vcpuonline code together to take advantage for the QMP and XenStore ways. Signed-off-by: Konrad Rzeszutek Wilk --- tools/libxl/libxl.c | 75 - 1 file changed, 45 insertions(+), 30 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index f957713..c37d057 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -5447,28 +5447,34 @@ int libxl_domain_get_nodeaffinity(libxl_ctx *ctx, uint32_t domid, return 0; } +static int libxl__check_max(libxl__gc *gc, libxl_dominfo *info, +libxl_bitmap *cpumap) +{ +int maxcpus = libxl_bitmap_count_set(cpumap); + +if (maxcpus > info->vcpu_max_id + 1) +{ +LOGE(ERROR, "Requested %d VCPUs, however maxcpus is %d!", + maxcpus, info->vcpu_max_id + 1); +return ERROR_FAIL; +} +return 0; +} + static int libxl__set_vcpuonline_xenstore(libxl__gc *gc, uint32_t domid, - libxl_bitmap *cpumap) + libxl_bitmap *cpumap, + libxl_dominfo *info) { -libxl_dominfo info; char *dompath; xs_transaction_t t; -int i, rc; - -libxl_dominfo_init(&info); +int i, rc = ERROR_FAIL; -rc = libxl_domain_info(CTX, &info, domid); -if (rc < 0) { -LOGE(ERROR, "getting domain info list"); -goto out; -} -rc = ERROR_FAIL; if (!(dompath = libxl__xs_get_dompath(gc, domid))) goto out; retry_transaction: t = xs_transaction_start(CTX->xsh); -for (i = 0; i <= info.vcpu_max_id; i++) +for (i = 0; i <= info->vcpu_max_id; i++) libxl__xs_write(gc, t, libxl__sprintf(gc, "%s/cpu/%u/availability", dompath, i), "%s", libxl_bitmap_test(cpumap, i) ? "online" : "offline"); @@ -5478,25 +5484,15 @@ retry_transaction: } else rc = 0; out: -libxl_dominfo_dispose(&info); return rc; } static int libxl__set_vcpuonline_qmp(libxl__gc *gc, uint32_t domid, - libxl_bitmap *cpumap) + libxl_bitmap *cpumap, libxl_dominfo *info) { -libxl_dominfo info; -int i, rc; - -libxl_dominfo_init(&info); +int i; -rc = libxl_domain_info(CTX, &info, domid); -if (rc < 0) { -LOGE(ERROR, "getting domain info list"); -libxl_dominfo_dispose(&info); -return rc; -}
Re: [Xen-devel] [PATCH v3 7/7] libxl/vcpu-set - allow to decrease vcpu count on overcommitted guests (v3)
On Tue, Mar 24, 2015 at 03:41:46PM +, Ian Campbell wrote: > On Mon, 2015-03-23 at 14:21 -0400, Konrad Rzeszutek Wilk wrote: > > We have a check to warn the user if they are overcommitting. > > But the check only checks the hosts CPU amount and does > > not take into account the case when the user is trying to fix > > the overcommit. That is - they want to limit the amount of > > online VCPUs. > > > > This fix allows the user to offline vCPUs without any > > warnings when they are running an overcommitted guest. > > > > Also fix the extra space in the message. > > > > Signed-off-by: Konrad Rzeszutek Wilk > > > > --- > > [v2: Remove crud code as spotted by Boris] > > [v3: Moved crud code removal to another patch] > > --- > > tools/libxl/xl_cmdimpl.c | 15 --- > > 1 file changed, 12 insertions(+), 3 deletions(-) > > > > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > > index b121d75..d7bd5d5 100644 > > --- a/tools/libxl/xl_cmdimpl.c > > +++ b/tools/libxl/xl_cmdimpl.c > > @@ -5238,9 +5238,18 @@ static int vcpuset(uint32_t domid, const char* > > nr_vcpus, int check_host) > > */ > > if (check_host) { > > unsigned int host_cpu = libxl_get_max_cpus(ctx); > > -if (max_vcpus > host_cpu) { > > -fprintf(stderr, "You are overcommmitting! You have %d physical > > " \ > > -" CPUs and want %d vCPUs! Aborting, use --ignore-host > > to " \ > > +libxl_dominfo dominfo; > > + > > +rc = libxl_domain_info(ctx, &dominfo, domid); > > +if (rc) > > +{ > > + if (rc == ERROR_DOMAIN_NOTFOUND) > > +fprintf(stderr, "Domain %u does not exist.\n", domid); > > +return 1; > > Indentation is wrong on the if. > > More generally all of these rc == ERROR_DOMAIN_NOTFOUND check and prints > are going to get rather tiresome, especially if/when there are other > interesting error codes. Perhaps we could arrange for something further > down the stack on libxl to log this sort of thing, such that xl can rely > on it already having been mentioned? > > e.g. libxl_domain_info could do it perhaps? Would this patch work? >From 9a0a0e581b29d9aa893d80962053383c235e9ad9 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 25 Mar 2015 13:36:58 -0400 Subject: [PATCH v3 4/8] libxl/libxl_domain_info: Log if domain not found. If we cannot find the domain - log an error (and still continue returning an error). Signed-off-by: Konrad Rzeszutek Wilk --- tools/libxl/libxl.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index c37d057..181b5bd 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -707,8 +707,10 @@ int libxl_domain_info(libxl_ctx *ctx, libxl_dominfo *info_r, LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "getting domain info list"); return ERROR_FAIL; } -if (ret==0 || xcinfo.domain != domid) return ERROR_DOMAIN_NOTFOUND; - +if (ret==0 || xcinfo.domain != domid) { +LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "Domain %d not found!", domid); +return ERROR_DOMAIN_NOTFOUND; +} if (info_r) xcinfo2xlinfo(ctx, &xcinfo, info_r); return 0; -- 2.1.0 And then this patch would become: >From 37d530f04a266e8d707b811bc7583f9d7b6fb18d Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 2 Feb 2015 16:18:39 -0500 Subject: [PATCH] libxl/vcpu-set - allow to decrease vcpu count on overcommitted guests (v3) We have a check to warn the user if they are overcommitting. But the check only checks the hosts CPU amount and does not take into account the case when the user is trying to fix the overcommit. That is - they want to limit the amount of online VCPUs. This fix allows the user to offline vCPUs without any warnings when they are running an overcommitted guest. Also fix the extra space in the message. Signed-off-by: Konrad Rzeszutek Wilk --- [v2: Remove crud code as spotted by Boris] [v3: Moved crud code removal to another patch] [v4: Remove the fprintf as it is done in libxl_domain_info] [v5: Fix memory leak] --- tools/libxl/xl_cmdimpl.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index b121d75..c77c691 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -5238,12 +5238,21 @@ static int vcpuset(uint32_t domid, const char* nr_vcpus, int check_host) */ if (check_host) { unsigned int host_cpu = libxl_get_max_cpus(ctx); -if (max_vcpus > host_cpu) { -fprintf(stderr, "You are overcommmitting! You have %d physical " \ -" CPUs and want %d vCPUs! Aborting, use --ignore-host to " \ -" continue\n", host_cpu, max_vcpus); +libxl_dominfo dominfo; + +rc = libxl_domain_info(ctx, &dominfo, domid); +if (rc) ret
Re: [Xen-devel] [PATCH 06/12] xen: arm: correctly handle vtimer traps from userspace
Hi Ian, On 25/03/15 14:22, Ian Campbell wrote: > -static void vtimer_cntp_tval(struct cpu_user_regs *regs, uint32_t *r, int > read) > +static int vtimer_cntp_tval(struct cpu_user_regs *regs, uint32_t *r, int > read) > { > struct vcpu *v = current; > s_time_t now; > > +if ( psr_mode_is_user(regs) && > + !(READ_SYSREG(CNTKCTL_EL1) & CNTKCTL_EL1_EL0PTEN) ) > +return 0; > + Would it make sense to create a macro for this check? The code is pretty much the same on every function except the field to check (EL0PTEN/EL0PCTEN). Either way: Reviewed-by: Julien Grall Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/MSI: fix error handling
On 25/03/15 16:23, Jan Beulich wrote: __setup_msi_irq() needs to undo what it did before calling write_msi_msg() in case that returned an error. map_domain_pirq() needs to get rid of the MSI descriptor it (implicitly) allocated. The case of a setup_msi_irq() failure on a non-initial multi-vector-MSI interrupt needs special handling: While the initial IRQ will get freed by the caller (who also passed it to us), we need to undo the effect setup_msi_irq() had on it. (As a benefit from the added call to msi_free_irq() we no longer need to explicitly call destroy_irq() on the non-initial slots.) Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 02/12] xen: arm: handle accesses to CNTP_CVAL_EL0
On 25/03/15 14:22, Ian Campbell wrote: > +static int vtimer_cntp_cval(struct cpu_user_regs *regs, uint64_t *r, int > read) > +{ > +struct vcpu *v = current; > + > +if ( psr_mode_is_user(regs) && > + !(READ_SYSREG(CNTKCTL_EL1) & CNTKCTL_EL1_EL0PTEN) ) Sorry, I didn't notice it on my previous review. CNTKCTL_EL1_EL0PTEN and psr_mode_is_user are only defined in respectively patch #6 and #4. Can you invert the patches to avoid build breakage during bisection? Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 03/12] xen: arm: Use ARMv8 names for CNTHCTL_EL2 bits
Hi Ian, On 25/03/15 14:22, Ian Campbell wrote: > Rather than using the v8 register names and the v7 bit names, which > makes things needlessly difficult when reading. > > Signed-off-by: Ian Campbell Reviewed-by: Julien Grall Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 02/12] xen: arm: handle accesses to CNTP_CVAL_EL0
Hi Ian, On 25/03/15 14:22, Ian Campbell wrote: > All OSes we have run on top of Xen use CNTP_TVAL_EL0 but for > completeness we really should handle CVAL too. > > In vtimer_emulate_cp64 pull the propagation of the 64-bit result into > two 32-bit registers out of the switch to avoid duplicating for every > register. We also need to initialise x now since previously the only > register implemented register was R/O. > > While adding HSR_SYSREG_CNTP_CVAL_EL0 also move > HSR_SYSREG_CNTP_CTL_EL0 so it is sorted correctly. > > Signed-off-by: Ian Campbell Reviewed-by: Julien Grall Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86: don't change affinity with interrupt unmasked
On 20/03/15 16:40, Jan Beulich wrote: With ->startup unmasking the IRQ, setting the affinity afterwards without masking the IRQ again is invalid namely for MSI (which can't have their affinity updated atomically). Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper --- Changing the affinity of non-maskable MSI IRQs seems bogus too Agreed. Their affinity can clearly only be changed safely by a device driver which can guarantee that an interrupt will not be generated during the vulnerable period. This further implies that Xen can't even safely set an affinity to start with... , but I can't immediately see what we can do about this (better than disabling affinity changes for them). If we used interrupt remapping properly (which we don't), we could update the effective affinity without changing any device configuration, but this still doesn't provide a solution for the many systems out there without (functional) interrupt remapping. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/2] x86: simplify non‑atomic bitops
On 20/03/15 14:53, Jan Beulich wrote: - being non-atomic, their pointer arguments shouldn't be volatile- qualified - their (half fake) memory operands can be a single "+m" instead of being both an output and an input Signed-off-by: Jan Beulich After further consideration, would it not be better to change the non-atomic variants to being straight C. e.g. static inline void __set_bit(int nr, void *_addr) { int *addr = _addr; addr[nr / sizeof(int)] |= (1U << (nr % sizeof(int))); } This would drop the memory clobber from the asm statement and allow the compiler to optimise repeated __set_bit() calls to the same word into a single action. ~Andrew (On a separate note, I feel that all of these operations should be acting on unsigned rather than signed ints, but that applies to all of these operations, not just the non-atomic ones) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m
>> >> The second thing is how similar some of this is to nested p2m code, >> making me wonder whether it could share more code with that. It's not >> as much duplication as I had feared, but e.g. altp2m_write_p2m_entry() >> is _identical_ to nestedp2m_write_p2m_entry(), (making the >> copyright claim at the top of the file quite dubious, BTW). >> > > I did initially use nestedp2m_write_p2m_entry directly, but I knew > that wouldn't be acceptable! On this specific point, I would be more > inclined to refactor the normal write entry routine so you can call > it everywhere, since both the nested and alternate ones are simply > a copy of a part of the normal one. > I started to look into this again. What I described as the normal write routine is the one for HAP, and that is actually a domain-level routine (akin to the paging mode-specific ones), and takes a domain as its first parameter. The nestedp2m routine is a p2m-level routine that takes a p2m as its first parameter, and that's what I had copied. However, looking at the code, the p2m-level write routine is only ever called from the domain-level one, but the HAP routine doesn't make such calls, and nestedp2m and altp2m are only available in HAP mode. Therefore, I have dropped the altp2m write routine with no ill effects. I don't think the nestedp2m one can ever be called. Ed ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] xen/passthrough: Support a single iommu_domain per xen domain per SMMU
On Wed, 25 Mar 2015, Manish Jaggi wrote: > On Tuesday 24 March 2015 07:34 PM, Robert VanVossen wrote: > > > > On 3/24/2015 7:07 AM, Manish Jaggi wrote: > > > On Monday 23 March 2015 07:16 PM, Robbie VanVossen wrote: > > > > If multiple devices are being passed through to the same domain and they > > > > share a single SMMU, then they only require a single iommu_domain. > > > > > > > > In arm_smmu_assign_dev, before a new iommu_domain is created, the > > > > xen_domain->contexts is checked for any iommu_domains that are already > > > > assigned to device that uses the same SMMU as the current device. If one > > > > is found, attach the device to that iommu_domain. If a new one isn't > > > > found, create a new iommu_domain just like before. > > > > > > > > The arm_smmu_deassign_dev function assumes that there is a single > > > > device per iommu_domain. This meant that when the first device was > > > > deassigned, the iommu_domain was freed and when another device was > > > > deassigned a crash occured in xen. > > > > > > > > To fix this, a reference counter was added to the iommu_domain struct. > > > > When an arm_smmu_xen_device references an iommu_domain, the > > > > iommu_domains ref is incremented. When that reference is removed, the > > > > iommu_domains ref is decremented. The iommu_domain will only be freed > > > > when the ref is 0. > > > > > > > > Signed-off-by: Robbie VanVossen > > > Hi, > > > Are you adding a PCI passthrough support to Xen ?. I am in process of > > > sending smmu driver patches based on juliens latest code. > > > > > Nope, I am just working on what this patch describes > Your patch is doing similar thing what am working on. As per the Xen 4.6 > release I took smmu for pci-passthorugh. I didnt send smmu patches so far as > there are other dependency patches I need to send first like pci_host_bridge > register hypercall patch. > > Julien/Ian could you please hold the merge of this patch. Sorry Manish, but the Xen community works on a first come first served basis: if/when this patch is considered in good condition and ready to be merged, is likely to be merged. Unless you have a specific concern about the code of course. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] iommu VT-d: separate rmrr addition function
On Wed, Mar 25, 2015 at 05:04:38PM +, Jan Beulich wrote: > >>> On 24.03.15 at 17:08, wrote: > > +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru) > > +{ > > +bool_t ignore = 0; > > +unsigned int i = 0; > > +int ret = 0; > > + > > +/* Skip checking if segment is not accessible yet. */ > > +if ( !pci_known_segment(rmrru->segment) ) > > +i = UINT_MAX; > > + > > +for ( ; i < rmrru->scope.devices_cnt; i++ ) > > +{ > > +u8 b, d, f; > > + > > +b = d = f = 0; > > ??? > > > +b = PCI_BUS(rmrru->scope.devices[i]); > > +d = PCI_SLOT(rmrru->scope.devices[i]); > > +f = PCI_FUNC(rmrru->scope.devices[i]); > > Quoting my reply on v1: "Please make this the declarations (with > initializers) of these variables; they don't appear to be used outside > this scope." I.e. > > u8 b = PCI_BUS(...); Got it, I did read it differently first time. Example helps for sure! > > > +if ( !ret && (rmrru->scope.devices_cnt != 0) ) > > +ret = register_one_rmrr(rmrru); > > +if ( ret ) > > xfree(rmrru); > > Indentation is screwed up here. Ok, will fix. > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86: properly parenthesize {read, write}_atomic()
On 20/03/15 15:54, Jan Beulich wrote: ... at once eliminating some redundancy from the read variant (the cast to the destination type can be done once outside the switch). Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 02/11] VMX: implement suppress #VE.
On 01/15/2015 10:46 AM, Ed White wrote: > On 01/15/2015 08:25 AM, Tim Deegan wrote: >> Hi, >> >> At 13:26 -0800 on 09 Jan (1420806392), Ed White wrote: >>> static inline bool_t is_epte_valid(ept_entry_t *e) >>> { >>> -return (e->epte != 0 && e->sa_p2mt != p2m_invalid); >>> +return (e->valid != 0 && e->sa_p2mt != p2m_invalid); >> >> This test for 0 is just catching uninitialised entries in freshly >> allocated pages. Rather than changing it to ignore bit 63, this loop... >> >>> } >>> >>> /* returns : 0 for success, -errno otherwise */ >>> @@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain >>> *p2m, ept_entry_t *ept_entry) >>> >>> ept_entry->r = ept_entry->w = ept_entry->x = 1; >>> >>> +/* Disable #VE on all entries */ >>> +if ( cpu_has_vmx_virt_exceptions ) >>> +{ >>> +ept_entry_t *table = __map_domain_page(pg); >>> + >>> +for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ ) >>> +table[i].suppress_ve = 1; >> >> ...should set the type of the empty entries to p2m_invalid as it goes. >> >>> +/* Disable #VE on all entries */ >>> +if ( cpu_has_vmx_virt_exceptions ) >>> +{ >>> +ept_entry_t *table = >>> +map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); >>> + >>> +for ( int i = 0; i < EPT_PAGETABLE_ENTRIES; i++ ) >>> +table[i].suppress_ve = 1; >> >> And the same here. I have some time to work on this patch series again, and although I tried this it doesn't eliminate all the instances of epte being zero with the possible exception of suppress_ve. I spent some time trying to find all cases where that happens without success, so I've used Andrew's suggestion of using a mask. Ed ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] printk adjustments
On 20/03/15 16:04, Jan Beulich wrote: 1: introduce gprintk() 2: make {,g}dprintk() a no-op in non-debug builds Signed-off-by: Jan Beulich Both Reviewed-by: Andrew Cooper It is likely that some other shifting of messages will happen, but lets see what this looks like as a first pass. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] ARM: KVM/XEN: how should we support virt-what?
On Wed, 25 Mar 2015, Ian Campbell wrote: > On Wed, 2015-03-25 at 10:44 +0100, Andrew Jones wrote: > > Hello ARM virt maintainers, > > > > I'd like to start a discussion about supporting virt-what[1]. virt-what > > allows userspace to determine if the system it's running on is running > > in a guest, and of what type (KVM, Xen, etc.). Despite it being a best > > effort tool, see the Caveat emptor in [1], it has become quite a useful > > tool, and is showing up in different places, such as OpenStack. If you > > look at the code[2], specifically [3], then you'll see how it works on > > x86, which is to use the dedicated hypervisor cpuid leaves. I'm > > wondering what equivalent we have, or can develop, for arm. > > Here are some thoughts; > > 0) there's already something we can use, and I just need to be told > >about it. > > 1) be as similar as possible to x86 by dedicating some currently > >undefined sysreg bits. This would take buy-in from lots of parties, > >so is not likely the way to go. > > 2) create a specific DT node that will get exposed through sysfs, or > >somewhere. > > 3) same as (2), but just use the nodes currently in mach-virt's DT > >as the indication we're a guest. This would just be a heuristic, > >i.e. "have virtio mmio" && psci.method == hvc, or something, > >and we'd still need a way to know if we're kvm vs. xen vs. ??. > > FWIW Xen has a specific node, > https://git.kernel.org/cgit/linux/kernel/git/devicetree/devicetree-rebasing.git/tree/Bindings/arm/xen.txt > > Doesn't help you with ACPI systems though. IIRC there will be a Xen > table of some sort. This is the Xen specific ACPI table: http://wiki.xenproject.org/mediawiki/images/c/c4/Xen-environment-table.pdf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 1/3] libxl/cpumap: Add xc_cpumap_[setcpu, clearcpu, testcpu] to complement xc_cpumap_alloc.
On Wed, Mar 25, 2015 at 08:53:01AM +, Dario Faggioli wrote: > On Tue, 2015-03-24 at 16:29 -0400, Konrad Rzeszutek Wilk wrote: > > On Tue, Mar 24, 2015 at 05:46:04PM +, Ian Campbell wrote: > > > Is it necessary to worry about alignment here, since xc_cpumap_t is > > > actually a uint8_t*. > > > > We can also do and not worry about it: > > > > diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c > > index 7514b84..19a1b18 100644 > > --- a/tools/libxc/xc_misc.c > > +++ b/tools/libxc/xc_misc.c > > @@ -94,19 +94,22 @@ xc_cpumap_t xc_cpumap_alloc(xc_interface *xch) > > return calloc(1, sz); > > } > > > > +#define BITS_PER_CPUMAP(map) (sizeof(*map) * 8) > > +#define CPUMAP_ENTRY(cpu, map) ((map))[(cpu) / BITS_PER_CPUMAP(map)] > > +#define CPUMAP_SHIFT(cpu, map) ((cpu) % BITS_PER_CPUMAP(map)) > > > > [...] > > > Maybe it's only me, but I really find it a bit hard to figure out what > the differences between this and what's in xc_bitops.h are. > > If going for this, I'd say that the reasons why we need these, and such > differences between these and BITMAP_* should be made evident somehow > (changelog, comments, etc.). Here is an updated patch: >From dbb1bf2d12b24a6a91ad95abef0d4310e7141b79 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Mon, 23 Mar 2015 11:52:54 -0400 Subject: [PATCH] libxl/cpumap: Add xc_cpumap_[setcpu, clearcpu, testcpu] to complement xc_cpumap_alloc. We export the xc_cpumap_alloc but not the bit operations. One could include 'xc_bitops.h' but that is naughty - so instead we just export the proper functions to do it on the xc_cpumap_t typedef. Signed-off-by: Konrad Rzeszutek Wilk v2: Use our own macro to make sure ARM is not affected negatively --- tools/libxc/include/xenctrl.h | 9 + tools/libxc/xc_misc.c | 30 ++ 2 files changed, 39 insertions(+) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 4e9537e..565f098 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -394,6 +394,15 @@ int xc_get_cpumap_size(xc_interface *xch); /* allocate a cpumap */ xc_cpumap_t xc_cpumap_alloc(xc_interface *xch); +/* clear an CPU from the cpumap. */ +void xc_cpumap_clearcpu(int cpu, xc_cpumap_t map); + +/* set an CPU in the cpumap. */ +void xc_cpumap_setcpu(int cpu, xc_cpumap_t map); + +/* Test whether the CPU in cpumap is set. */ +int xc_cpumap_testcpu(int cpu, xc_cpumap_t map); + /* * NODEMAP handling */ diff --git a/tools/libxc/xc_misc.c b/tools/libxc/xc_misc.c index be68291..e113385 100644 --- a/tools/libxc/xc_misc.c +++ b/tools/libxc/xc_misc.c @@ -18,6 +18,7 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ +#include "xc_bitops.h" #include "xc_private.h" #include @@ -93,6 +94,35 @@ xc_cpumap_t xc_cpumap_alloc(xc_interface *xch) return calloc(1, sz); } +/* + * xc_bitops.h has macros that do this as well - however xc_cpumap_t + * is an array of uint8 (1byte) while said macros expect an array + * of unsigned long (8 bytes). This is not a problem when setting + * an bit (as the computation will come with the same offset) - the + * issue is when clearing an bit. The aligment on ARM could affect + * other structures that are close in memory causing memory corruption. + * Hence our own macros that differ only in that we compute the + * size of the array elements (sizeof(*map)) as opposed to assuming + * it is unsigned long. + */ +#define BITS_PER_CPUMAP(map) (sizeof(*map) * 8) +#define CPUMAP_ENTRY(cpu, map) ((map))[(cpu) / BITS_PER_CPUMAP(map)] +#define CPUMAP_SHIFT(cpu, map) ((cpu) % BITS_PER_CPUMAP(map)) +void xc_cpumap_clearcpu(int cpu, xc_cpumap_t map) +{ +CPUMAP_ENTRY(cpu, map) &= ~(1U << CPUMAP_SHIFT(cpu, map)); +} + +void xc_cpumap_setcpu(int cpu, xc_cpumap_t map) +{ +CPUMAP_ENTRY(cpu, map) |= (1U << CPUMAP_SHIFT(cpu, map)); +} + +int xc_cpumap_testcpu(int cpu, xc_cpumap_t map) +{ +return (CPUMAP_ENTRY(cpu, map) >> CPUMAP_SHIFT(cpu, map)) & 1; +} + xc_nodemap_t xc_nodemap_alloc(xc_interface *xch) { int sz; -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] sysctl: Don't overwrite array size variable when it is set on error earlier
On 25/03/15 17:09, Boris Ostrovsky wrote: When querying CPU topology, if caller-provided array size is smaller than number of online CPUs then, in addition to returning -ENOBUFS, sysctl is expected to provide back this number. However, this value, stored in 'i', is overwritten in the subsequent loop's control statement. Make sure we don't do this by converting the loop to 'while'. Signed-off-by: Boris Ostrovsky Reported-by: Andrew Cooper --- xen/common/sysctl.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c index a8c629f..b83d230 100644 --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -338,8 +338,10 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) ret = -ENOBUFS; i = num_cpus; } +else +i = 0; -for ( i = 0; i < num_cpus; i++ ) +while ( i < num_cpus ) This would be fine to keep as "for ( ; i < num_cpus; i++)", and helps avoid an issue if someone introduces a continue; in the future. As for the fix itself, Reviewed-by: Andrew Cooper { xen_sysctl_cputopo_t cputopo; @@ -363,6 +365,8 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) ret = -EFAULT; break; } + +i++; } } else ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] iommu VT-d: separate rmrr addition function
>>> On 24.03.15 at 17:08, wrote: > +static int register_one_rmrr(struct acpi_rmrr_unit *rmrru) > +{ > +bool_t ignore = 0; > +unsigned int i = 0; > +int ret = 0; > + > +/* Skip checking if segment is not accessible yet. */ > +if ( !pci_known_segment(rmrru->segment) ) > +i = UINT_MAX; > + > +for ( ; i < rmrru->scope.devices_cnt; i++ ) > +{ > +u8 b, d, f; > + > +b = d = f = 0; ??? > +b = PCI_BUS(rmrru->scope.devices[i]); > +d = PCI_SLOT(rmrru->scope.devices[i]); > +f = PCI_FUNC(rmrru->scope.devices[i]); Quoting my reply on v1: "Please make this the declarations (with initializers) of these variables; they don't appear to be used outside this scope." I.e. u8 b = PCI_BUS(...); > +if ( !ret && (rmrru->scope.devices_cnt != 0) ) > +ret = register_one_rmrr(rmrru); > +if ( ret ) > xfree(rmrru); Indentation is screwed up here. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] sysctl: Don't overwrite array size variable when it is set on error earlier
When querying CPU topology, if caller-provided array size is smaller than number of online CPUs then, in addition to returning -ENOBUFS, sysctl is expected to provide back this number. However, this value, stored in 'i', is overwritten in the subsequent loop's control statement. Make sure we don't do this by converting the loop to 'while'. Signed-off-by: Boris Ostrovsky Reported-by: Andrew Cooper --- xen/common/sysctl.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c index a8c629f..b83d230 100644 --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -338,8 +338,10 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) ret = -ENOBUFS; i = num_cpus; } +else +i = 0; -for ( i = 0; i < num_cpus; i++ ) +while ( i < num_cpus ) { xen_sysctl_cputopo_t cputopo; @@ -363,6 +365,8 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) ret = -EFAULT; break; } + +i++; } } else -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode
>>> On 17.03.15 at 15:54, wrote: > Add support for privileged PMU mode (XENPMU_MODE_ALL) which allows privileged > domain (dom0) profile both itself (and the hypervisor) and the guests. While > this mode is on profiling in guests is disabled. > > Signed-off-by: Boris Ostrovsky Acked-by: Jan Beulich but ... > @@ -177,17 +183,18 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs) > sampling = sampled; > > vpmu = vcpu_vpmu(sampling); > -if ( !is_hvm_vcpu(sampling) ) > +if ( !is_hvm_vcpu(sampling) || (vpmu_mode & XENPMU_MODE_ALL) ) > { > /* PV(H) guest */ > const struct cpu_user_regs *cur_regs; > uint64_t *flags = &vpmu->xenpmu_data->pmu.pmu_flags; > -uint32_t domid = DOMID_SELF; > +uint32_t domid; ... why is this not domid_t? Probably needs adjustment in an earlier patch. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
>>> On 17.03.15 at 15:54, wrote: > --- > Changes in v19: > * const-ified arch_vpmu_ops in vpmu_do_wrmsr > * non-changes: >- kept 'current' as a non-initializer to avoid unnecessary initialization > in the (common) non-VPMU case Afaict there's nothing at all keeping the compiler to still schedule part or all of the involved insns ahead of that first goto. And if it was an initializer, there would equally be nothing keeping the compiler from scheduling part or all of those insns after the if(). So I don't see why not making it the initializer is better (other than increasing line count and patch size). If you really wanted to do something to help the compiler, flagging the if() condition with likely() would perhaps be the thing to do. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [ovmf test] 36719: regressions - FAIL
flight 36719 ovmf real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/36719/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-win7-amd64 7 windows-installfail REGR. vs. 36590 test-amd64-amd64-xl-win7-amd64 7 windows-install fail REGR. vs. 36590 Regressions which are regarded as allowable (not blocking): test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 36590 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-i386-xl-xsm9 guest-start fail never pass test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 debian-hvm-install fail never pass test-amd64-amd64-libvirt-xsm 9 guest-start fail never pass test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-amd64-amd64-xl-xsm 9 guest-start fail never pass test-amd64-i386-libvirt-xsm 9 guest-start fail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-amd64-xl-qemuu-winxpsp3 7 windows-install fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass version targeted for testing: ovmf e29fc5022042f7c86b07673741749cf3de5f3816 baseline version: ovmf f61762d0c86b42c7922ed2d4326c9647252752ae People who touched revisions under test: Cinnamon Shia Fu Siyuan jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl pass test-amd64-i386-xl pass test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmfail test-amd64-i386-xl-qemut-debianhvm-amd64-xsm fail test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmfail test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm fail test-amd64-amd64-libvirt-xsm fail test-amd64-i386-libvirt-xsm fail test-amd64-amd64-xl-xsm fail test-amd64-i386-xl-xsm fail test-amd64-amd64-xl-pvh-amd fail test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass test-amd64-amd64-xl-qemut-win7-amd64
[Xen-devel] [PATCH v2 1/4] x86/MSI-X: be more careful during teardown
When a device gets detached from a guest, pciback will clear its command register, thus disabling both memory and I/O decoding. The disabled memory decoding, however, has an effect on the MSI-X table accesses the hypervisor does: These won't have the intended effect anymore. Even worse, for PCIe devices (but not SR-IOV virtual functions) such accesses may (will?) be treated as Unsupported Requests, causing respective errors to be surfaced, potentially in the form of NMIs that may be fatal to the hypervisor or Dom0 is different ways. Hence rather than carrying out these accesses, we should avoid them where we can, and use alternative (e.g. PCI config space based) mechanisms to achieve at least the same effect. Signed-off-by: Jan Beulich --- v2: mask_msi_irq()'s BUG() needs to be conditional upon IRQ_DISABLED not already being set, as ->shutdown() may be called on error paths when the IRQ never got enabled. This in turn required the changes to irq.c. Backporting note (largely to myself): Depends on (not yet backported) commit 061eebe0e "x86/MSI: drop workaround for insecure Dom0 kernels" (due to re-use of struct arch_msix's warned field). --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -217,9 +217,9 @@ void destroy_irq(unsigned int irq) } spin_lock_irqsave(&desc->lock, flags); -desc->status |= IRQ_DISABLED; desc->status &= ~IRQ_GUEST; desc->handler->shutdown(desc); +desc->status |= IRQ_DISABLED; action = desc->action; desc->action = NULL; desc->msi_desc = NULL; @@ -995,8 +995,8 @@ void __init release_irq(unsigned int irq spin_lock_irqsave(&desc->lock,flags); action = desc->action; desc->action = NULL; -desc->status |= IRQ_DISABLED; desc->handler->shutdown(desc); +desc->status |= IRQ_DISABLED; spin_unlock_irqrestore(&desc->lock,flags); /* Wait to make sure it's not being used on another CPU */ @@ -1725,8 +1725,8 @@ static irq_guest_action_t *__pirq_guest_ BUG_ON(action->in_flight != 0); /* Disabling IRQ before releasing the desc_lock avoids an IRQ storm. */ -desc->status |= IRQ_DISABLED; desc->handler->disable(desc); +desc->status |= IRQ_DISABLED; /* * Mark any remaining pending EOIs as ready to flush. --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -121,6 +121,27 @@ static void msix_put_fixmap(struct arch_ spin_unlock(&msix->table_lock); } +static bool_t memory_decoded(const struct pci_dev *dev) +{ +u8 bus, slot, func; + +if ( !dev->info.is_virtfn ) +{ +bus = dev->bus; +slot = PCI_SLOT(dev->devfn); +func = PCI_FUNC(dev->devfn); +} +else +{ +bus = dev->info.physfn.bus; +slot = PCI_SLOT(dev->info.physfn.devfn); +func = PCI_FUNC(dev->info.physfn.devfn); +} + +return !!(pci_conf_read16(dev->seg, bus, slot, func, PCI_COMMAND) & + PCI_COMMAND_MEMORY); +} + /* * MSI message composition */ @@ -162,7 +183,7 @@ void msi_compose_msg(unsigned vector, co } } -static void read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) +static bool_t read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) { switch ( entry->msi_attrib.type ) { @@ -198,6 +219,8 @@ static void read_msi_msg(struct msi_desc void __iomem *base; base = entry->mask_base; +if ( unlikely(!memory_decoded(entry->dev)) ) +return 0; msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); msg->data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET); @@ -209,6 +232,8 @@ static void read_msi_msg(struct msi_desc if ( iommu_intremap ) iommu_read_msi_from_ire(entry, msg); + +return 1; } static int write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) @@ -260,6 +285,8 @@ static int write_msi_msg(struct msi_desc void __iomem *base; base = entry->mask_base; +if ( unlikely(!memory_decoded(entry->dev)) ) +return -ENXIO; writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); writel(msg->address_hi, @@ -287,7 +314,8 @@ void set_msi_affinity(struct irq_desc *d ASSERT(spin_is_locked(&desc->lock)); memset(&msg, 0, sizeof(msg)); -read_msi_msg(msi_desc, &msg); +if ( !read_msi_msg(msi_desc, &msg) ) +return; msg.data &= ~MSI_DATA_VECTOR_MASK; msg.data |= MSI_DATA_VECTOR(desc->arch.vector); @@ -347,20 +375,24 @@ int msi_maskable_irq(const struct msi_de || entry->msi_attrib.maskbit; } -static void msi_set_mask_bit(struct irq_desc *desc, int flag) +static bool_t msi_set_mask_bit(struct irq_desc *desc, int flag) { struct msi_desc *entry = desc->msi_desc; +struct pci_dev *pdev; +u16 seg; +u8 bus, slot, func; ASSERT(spin_is_locked(&desc->lock)); BUG_ON(!entry || !entry->dev
Re: [Xen-devel] [PATCH] LZ4 : fix the data abort issue
On Wed, 2015-03-25 at 16:14 +, Jan Beulich wrote: > If the part of the compression data are corrupted, or the compression > data is totally fake, the memory access over the limit is possible. > > This is the log from my system usning lz4 decompression. >[6502]data abort, halting >[6503]r0 0x r1 0x r2 0xdcea0ffc r3 0xdcea0ffc >[6509]r4 0xb9ab0bfd r5 0xdcea0ffc r6 0xdcea0ff8 r7 0xdce8 >[6515]r8 0x r9 0x r10 0x r11 0xb9a98000 >[6522]r12 0xdcea1000 usp 0x ulr 0x pc 0x820149bc >[6528]spsr 0x41f3 > and the memory addresses of some variables at the moment are > ref:0xdcea0ffc, op:0xdcea0ffc, oend:0xdcea1000 > > As you can see, COPYLENGH is 8bytes, so @ref and @op can access the momory > over @oend. > > Signed-off-by: JeHyeon Yeon > Reviewed-by: David Sterba > [Linux commit d5e7cafd69da24e6d6cc988fab6ea313a2577efc] > Signed-off-by: Jan Beulich Acked-by: Ian Campbell > > --- a/xen/common/lz4/decompress.c > +++ b/xen/common/lz4/decompress.c > @@ -132,6 +132,9 @@ static int INIT lz4_uncompress(const uns > /* Error: request to write beyond destination buffer */ > if (cpy > oend) > goto _output_error; > + if ((ref + COPYLENGTH) > oend || > + (op + COPYLENGTH) > oend) > + goto _output_error; > LZ4_SECURECOPY(ref, op, (oend - COPYLENGTH)); > while (op < cpy) > *op++ = *ref++; > > > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 4/4] x86/MSI-X: cleanup
- __pci_enable_msix() now checks that an MSI-X capability was actually found - pass "pos" to msix_capability_init() as both callers already know it (and hence there's no need to re-obtain it) - call __pci_disable_msi{,x}() directly instead of via pci_disable_msi() from __pci_enable_msi{x,}() state validation paths - use msix_control_reg() instead of open coding it - log message adjustments - coding style corrections Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -35,6 +35,8 @@ static s8 __read_mostly use_msi = -1; boolean_param("msi", use_msi); +static void __pci_disable_msix(struct msi_desc *); + /* bitmap indicate which fixed map is free */ static DEFINE_SPINLOCK(msix_fixmap_lock); static DECLARE_BITMAP(msix_fixmap_pages, FIX_MSIX_MAX_PAGES); @@ -167,12 +169,14 @@ void msi_compose_msg(unsigned vector, co unsigned dest; memset(msg, 0, sizeof(*msg)); -if ( !cpumask_intersects(cpu_mask, &cpu_online_map) ) { +if ( !cpumask_intersects(cpu_mask, &cpu_online_map) ) +{ dprintk(XENLOG_ERR,"%s, compose msi message error!!\n", __func__); return; } -if ( vector ) { +if ( vector ) +{ cpumask_t *mask = this_cpu(scratch_mask); cpumask_and(mask, cpu_mask, &cpu_online_map); @@ -233,8 +237,7 @@ static bool_t read_msi_msg(struct msi_de } case PCI_CAP_ID_MSIX: { -void __iomem *base; -base = entry->mask_base; +void __iomem *base = entry->mask_base; if ( unlikely(!msix_memory_decoded(entry->dev, entry->msi_attrib.pos)) ) @@ -300,8 +303,7 @@ static int write_msi_msg(struct msi_desc } case PCI_CAP_ID_MSIX: { -void __iomem *base; -base = entry->mask_base; +void __iomem *base = entry->mask_base; if ( unlikely(!msix_memory_decoded(entry->dev, entry->msi_attrib.pos)) ) @@ -327,7 +329,7 @@ void set_msi_affinity(struct irq_desc *d struct msi_desc *msi_desc = desc->msi_desc; dest = set_desc_affinity(desc, mask); -if (dest == BAD_APICID || !msi_desc) +if ( dest == BAD_APICID || !msi_desc ) return; ASSERT(spin_is_locked(&desc->lock)); @@ -379,11 +381,11 @@ static void msix_set_enable(struct pci_d pos = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSIX); if ( pos ) { -control = pci_conf_read16(seg, bus, slot, func, pos + PCI_MSIX_FLAGS); +control = pci_conf_read16(seg, bus, slot, func, msix_control_reg(pos)); control &= ~PCI_MSIX_FLAGS_ENABLE; if ( enable ) control |= PCI_MSIX_FLAGS_ENABLE; -pci_conf_write16(seg, bus, slot, func, pos + PCI_MSIX_FLAGS, control); +pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos), control); } } @@ -408,9 +410,11 @@ static bool_t msi_set_mask_bit(struct ir bus = pdev->bus; slot = PCI_SLOT(pdev->devfn); func = PCI_FUNC(pdev->devfn); -switch (entry->msi_attrib.type) { +switch ( entry->msi_attrib.type ) +{ case PCI_CAP_ID_MSI: -if (entry->msi_attrib.maskbit) { +if ( entry->msi_attrib.maskbit ) +{ u32 mask_bits; mask_bits = pci_conf_read32(seg, bus, slot, func, entry->msi.mpos); @@ -778,13 +782,14 @@ static u64 read_pci_mem_bar(u16 seg, u8 * requested MSI-X entries with allocated irqs or non-zero for otherwise. **/ static int msix_capability_init(struct pci_dev *dev, +unsigned int pos, struct msi_info *msi, struct msi_desc **desc, unsigned int nr_entries) { struct arch_msix *msix = dev->msix; struct msi_desc *entry = NULL; -int pos, vf; +int vf; u16 control; u64 table_paddr; u32 table_offset; @@ -796,7 +801,6 @@ static int msix_capability_init(struct p ASSERT(spin_is_locked(&pcidevs_lock)); -pos = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSIX); control = pci_conf_read16(seg, bus, slot, func, msix_control_reg(pos)); /* * Ensure MSI-X interrupts are masked during setup. Some devices require @@ -984,10 +988,9 @@ static int __pci_enable_msi(struct msi_i old_desc = find_msi_entry(pdev, msi->irq, PCI_CAP_ID_MSI); if ( old_desc ) { -dprintk(XENLOG_WARNING, "irq %d has already mapped to MSI on " -"device %04x:%02x:%02x.%01x\n", -msi->irq, msi->seg, msi->bus, -PCI_SLOT(msi->devfn), PCI_FUNC(msi->devfn)); +printk(XENLOG_WARNING "irq %d already mapped to MSI on %04x:%02x:%02x.%u\n", + msi->irq, msi->seg, msi->bus, + PCI_SLOT(msi->devfn), PCI_FUNC(msi->devfn)); *desc = old_desc; return 0; } @@ -995,10 +998,10 @@ static int __p
[Xen-devel] [PATCH v2 3/4] x86/MSI-X: reduce fiddling with control register during restore
Rather than disabling and enabling MSI-X once per vector, do it just once per device. Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1217,6 +1217,9 @@ int pci_restore_msi_state(struct pci_dev struct msi_desc *entry, *tmp; struct irq_desc *desc; struct msi_msg msg; +u8 slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn); +unsigned int type = 0, pos = 0; +u16 control = 0; ASSERT(spin_is_locked(&pcidevs_lock)); @@ -1233,8 +1236,6 @@ int pci_restore_msi_state(struct pci_dev list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list ) { unsigned int i = 0, nr = 1; -u16 control = 0; -u8 slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn); irq = entry->irq; desc = &irq_desc[irq]; @@ -1251,31 +1252,38 @@ int pci_restore_msi_state(struct pci_dev pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn), i); spin_unlock_irqrestore(&desc->lock, flags); +if ( type == PCI_CAP_ID_MSIX ) +pci_conf_write16(pdev->seg, pdev->bus, slot, func, + msix_control_reg(pos), + control & ~PCI_MSIX_FLAGS_ENABLE); return -EINVAL; } +ASSERT(!type || type == entry->msi_attrib.type); +pos = entry->msi_attrib.pos; if ( entry->msi_attrib.type == PCI_CAP_ID_MSI ) { msi_set_enable(pdev, 0); nr = entry->msi.nvec; } -else if ( entry->msi_attrib.type == PCI_CAP_ID_MSIX ) +else if ( !type && entry->msi_attrib.type == PCI_CAP_ID_MSIX ) { control = pci_conf_read16(pdev->seg, pdev->bus, slot, func, - msix_control_reg(entry->msi_attrib.pos)); + msix_control_reg(pos)); pci_conf_write16(pdev->seg, pdev->bus, slot, func, - msix_control_reg(entry->msi_attrib.pos), + msix_control_reg(pos), control | (PCI_MSIX_FLAGS_ENABLE | PCI_MSIX_FLAGS_MASKALL)); if ( unlikely(!memory_decoded(pdev)) ) { spin_unlock_irqrestore(&desc->lock, flags); pci_conf_write16(pdev->seg, pdev->bus, slot, func, - msix_control_reg(entry->msi_attrib.pos), + msix_control_reg(pos), control & ~PCI_MSIX_FLAGS_ENABLE); return -ENXIO; } } +type = entry->msi_attrib.type; msg = entry->msg; write_msi_msg(entry, &msg); @@ -1298,9 +1306,9 @@ int pci_restore_msi_state(struct pci_dev spin_unlock_irqrestore(&desc->lock, flags); -if ( entry->msi_attrib.type == PCI_CAP_ID_MSI ) +if ( type == PCI_CAP_ID_MSI ) { -unsigned int cpos = msi_control_reg(entry->msi_attrib.pos); +unsigned int cpos = msi_control_reg(pos); control = pci_conf_read16(pdev->seg, pdev->bus, slot, func, cpos) & ~PCI_MSI_FLAGS_QSIZE; @@ -1310,12 +1318,13 @@ int pci_restore_msi_state(struct pci_dev msi_set_enable(pdev, 1); } -else if ( entry->msi_attrib.type == PCI_CAP_ID_MSIX ) -pci_conf_write16(pdev->seg, pdev->bus, slot, func, - msix_control_reg(entry->msi_attrib.pos), - control | PCI_MSIX_FLAGS_ENABLE); } +if ( type == PCI_CAP_ID_MSIX ) +pci_conf_write16(pdev->seg, pdev->bus, slot, func, + msix_control_reg(pos), + control | PCI_MSIX_FLAGS_ENABLE); + return 0; } x86/MSI-X: reduce fiddling with control register during restore Rather than disabling and enabling MSI-X once per vector, do it just once per device. Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1217,6 +1217,9 @@ int pci_restore_msi_state(struct pci_dev struct msi_desc *entry, *tmp; struct irq_desc *desc; struct msi_msg msg; +u8 slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn); +unsigned int type = 0, pos = 0; +u16 control = 0; ASSERT(spin_is_locked(&pcidevs_lock)); @@ -1233,8 +1236,6 @@ int pci_restore_msi_state(struct pci_dev list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list ) { unsigned int i = 0, nr = 1; -u16 control = 0; -u8 slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn); irq = entry->irq; desc = &irq_desc[irq]; @@ -1251,31 +1252,38 @@ int pci_restore_msi_state(struct pci_dev pdev->seg, pdev->bus, PCI_SLOT(pdev
Re: [Xen-devel] [PATCH] hvmloader: don't treat ROM BAR like other BARs
On 20/03/15 16:20, Jan Beulich wrote: Its low 11 bits have different meaning. Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/4] x86/MSI-X: access MSI-X table only after having enabled MSI-X
As done in Linux by f598282f51 ("PCI: Fix the NIU MSI-X problem in a better way") and its broken predecessor, make sure we don't access the MSI-X table without having enabled MSI-X first, using the mask-all flag instead to prevent interrupts from occurring. Signed-off-by: Jan Beulich Reviewed-by: Andrew Cooper --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -142,6 +142,19 @@ static bool_t memory_decoded(const struc PCI_COMMAND_MEMORY); } +static bool_t msix_memory_decoded(const struct pci_dev *dev, unsigned int pos) +{ +u16 control = pci_conf_read16(dev->seg, dev->bus, + PCI_SLOT(dev->devfn), + PCI_FUNC(dev->devfn), + msix_control_reg(pos)); + +if ( !(control & PCI_MSIX_FLAGS_ENABLE) ) +return 0; + +return memory_decoded(dev); +} + /* * MSI message composition */ @@ -219,7 +236,8 @@ static bool_t read_msi_msg(struct msi_de void __iomem *base; base = entry->mask_base; -if ( unlikely(!memory_decoded(entry->dev)) ) +if ( unlikely(!msix_memory_decoded(entry->dev, + entry->msi_attrib.pos)) ) return 0; msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET); @@ -285,7 +303,8 @@ static int write_msi_msg(struct msi_desc void __iomem *base; base = entry->mask_base; -if ( unlikely(!memory_decoded(entry->dev)) ) +if ( unlikely(!msix_memory_decoded(entry->dev, + entry->msi_attrib.pos)) ) return -ENXIO; writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET); @@ -379,7 +398,7 @@ static bool_t msi_set_mask_bit(struct ir { struct msi_desc *entry = desc->msi_desc; struct pci_dev *pdev; -u16 seg; +u16 seg, control; u8 bus, slot, func; ASSERT(spin_is_locked(&desc->lock)); @@ -401,35 +420,38 @@ static bool_t msi_set_mask_bit(struct ir } break; case PCI_CAP_ID_MSIX: +control = pci_conf_read16(seg, bus, slot, func, + msix_control_reg(entry->msi_attrib.pos)); +if ( unlikely(!(control & PCI_MSIX_FLAGS_ENABLE)) ) +pci_conf_write16(seg, bus, slot, func, + msix_control_reg(entry->msi_attrib.pos), + control | (PCI_MSIX_FLAGS_ENABLE | +PCI_MSIX_FLAGS_MASKALL)); if ( likely(memory_decoded(pdev)) ) { writel(flag, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); readl(entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET); -break; +if ( likely(control & PCI_MSIX_FLAGS_ENABLE) ) +break; +flag = 1; } -if ( flag ) +else if ( flag && !(control & PCI_MSIX_FLAGS_MASKALL) ) { -u16 control; domid_t domid = pdev->domain->domain_id; -control = pci_conf_read16(seg, bus, slot, func, - msix_control_reg(entry->msi_attrib.pos)); -if ( control & PCI_MSIX_FLAGS_MASKALL ) -break; -pci_conf_write16(seg, bus, slot, func, - msix_control_reg(entry->msi_attrib.pos), - control | PCI_MSIX_FLAGS_MASKALL); +control |= PCI_MSIX_FLAGS_MASKALL; if ( pdev->msix->warned != domid ) { pdev->msix->warned = domid; printk(XENLOG_G_WARNING - "cannot mask IRQ %d: masked MSI-X on Dom%d's %04x:%02x:%02x.%u\n", + "cannot mask IRQ %d: masking MSI-X on Dom%d's %04x:%02x:%02x.%u\n", desc->irq, domid, pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn)); } -break; } -/* fall through */ +pci_conf_write16(seg, bus, slot, func, + msix_control_reg(entry->msi_attrib.pos), control); +return flag; default: return 0; } @@ -454,7 +476,8 @@ static int msi_get_mask_bit(const struct entry->msi.mpos) >> entry->msi_attrib.entry_nr) & 1; case PCI_CAP_ID_MSIX: -if ( unlikely(!memory_decoded(entry->dev)) ) +if ( unlikely(!msix_memory_decoded(entry->dev, + entry->msi_attrib.pos)) ) break; return readl(entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET) & 1; } @@ -775,16 +798,32 @@ static int msix_capability_init(struct p pos = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSIX); control = pci_conf_read16(s
Re: [Xen-devel] [PATCH OSSTEST] Arrange for core dumps to be placed in /var/core and collect them
On Wed, 2015-03-25 at 16:18 +, Ian Jackson wrote: > Ian Campbell writes ("Re: [PATCH OSSTEST] Arrange for core dumps to be placed > in /var/core and collect them"): > > Do you have any thoughts on how to best universally arrange for the > > rlimits to be increased? Probably inserting a ulimit call into the > > libvirt initscript would be an easy first step, and since that's the > > thing we most often see crashing it seems. I'll send such a patch > > shortly. > > Maybe pam ? Yes, I think that's what my subsequent patch to configure via /etc/security/limits.d has achived. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 0/4] x86/MSI-X: XSA-120 follow-up
The problem requiring the first patch here is actually what lead to XSA-120. 1: be more careful during teardown 2: access MSI-X table only after having enabled MSI-X 3: reduce fiddling with control register during restore 4: cleanup Signed-off-by: Jan Beulich (Only patch 1 really changed - see there for details.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 29/32] ts-kernel-build: enable CONFIG_SCSI_HPSA
Ian Campbell writes ("Re: [OSSTEST PATCH 29/32] ts-kernel-build: enable CONFIG_SCSI_HPSA"): > On Tue, 2015-03-24 at 19:02 +, Ian Jackson wrote: > > +setopt CONFIG_SCSI_HPSA n > > This is disabling the option, not enabling it per the commit message. Yes. I noticed that when I executed it :-/. I intended to enable it (setting it to `m'). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 3/8] sysctl: Make XEN_SYSCTL_topologyinfo sysctl a little more efficient
On 19/03/15 22:53, Boris Ostrovsky wrote: --- a/xen/common/sysctl.c +++ b/xen/common/sysctl.c @@ -324,39 +324,63 @@ long do_sysctl(XEN_GUEST_HANDLE_PARAM(xen_sysctl_t) u_sysctl) } break; -case XEN_SYSCTL_topologyinfo: +case XEN_SYSCTL_cputopoinfo: { -uint32_t i, max_cpu_index, last_online_cpu; -xen_sysctl_topologyinfo_t *ti = &op->u.topologyinfo; +uint32_t i, num_cpus; +xen_sysctl_cputopoinfo_t *ti = &op->u.cputopoinfo; -last_online_cpu = cpumask_last(&cpu_online_map); -max_cpu_index = min_t(uint32_t, ti->max_cpu_index, last_online_cpu); -ti->max_cpu_index = last_online_cpu; - -for ( i = 0; i <= max_cpu_index; i++ ) +num_cpus = cpumask_last(&cpu_online_map) + 1; +if ( !guest_handle_is_null(ti->cputopo) ) { -if ( !guest_handle_is_null(ti->cpu_to_core) ) +if ( ti->num_cpus < num_cpus ) { -uint32_t core = cpu_online(i) ? cpu_to_core(i) : ~0u; -if ( copy_to_guest_offset(ti->cpu_to_core, i, &core, 1) ) -break; +ret = -ENOBUFS; +i = num_cpus; } -if ( !guest_handle_is_null(ti->cpu_to_socket) ) + +for ( i = 0; i < num_cpus; i++ ) Observe that the "i = 0" clobbers the -ENOBUFS detection, meaning that Xen will always write num_cpus into the userspace array, writing past the end of the array if it is too short. As this patch has already been committed, please fix as a matter of priority (or I can if you are overly busy). ~Andrew (Also, you have introduced a mixed tab/space into tools/python/xen/lowlevel/xc/xc.c on the "goto out;" line) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] x86/MSI: fix error handling
__setup_msi_irq() needs to undo what it did before calling write_msi_msg() in case that returned an error. map_domain_pirq() needs to get rid of the MSI descriptor it (implicitly) allocated. The case of a setup_msi_irq() failure on a non-initial multi-vector-MSI interrupt needs special handling: While the initial IRQ will get freed by the caller (who also passed it to us), we need to undo the effect setup_msi_irq() had on it. (As a benefit from the added call to msi_free_irq() we no longer need to explicitly call destroy_irq() on the non-initial slots.) Signed-off-by: Jan Beulich --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1974,6 +1974,8 @@ int map_domain_pirq( dprintk(XENLOG_G_ERR, "dom%d: irq %d in use\n", d->domain_id, irq); pci_disable_msi(msi_desc); +msi_desc->irq = -1; +msi_free_irq(msi_desc); ret = -EBUSY; goto done; } @@ -2028,22 +2030,29 @@ int map_domain_pirq( if ( ret ) { spin_unlock_irqrestore(&desc->lock, flags); +pci_disable_msi(msi_desc); +if ( nr ) +{ +ASSERT(msi_desc->irq >= 0); +desc = irq_to_desc(msi_desc->irq); +spin_lock_irqsave(&desc->lock, flags); +desc->handler = &no_irq_type; +desc->msi_desc = NULL; +spin_unlock_irqrestore(&desc->lock, flags); +} while ( nr-- ) { -if ( irq >= 0 ) -{ -if ( irq_deny_access(d, irq) ) -printk(XENLOG_G_ERR - "dom%d: could not revoke access to IRQ%d (pirq %d)\n", - d->domain_id, irq, pirq); -destroy_irq(irq); -} +if ( irq >= 0 && irq_deny_access(d, irq) ) +printk(XENLOG_G_ERR + "dom%d: could not revoke access to IRQ%d (pirq %d)\n", + d->domain_id, irq, pirq); if ( info ) cleanup_domain_irq_pirq(d, irq, info); info = pirq_info(d, pirq + nr); irq = info->arch.irq; } -pci_disable_msi(msi_desc); +msi_desc->irq = -1; +msi_free_irq(msi_desc); goto done; } --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -561,6 +561,7 @@ static struct msi_desc *alloc_msi_entry( while ( nr-- ) { entry[nr].dev = NULL; +entry[nr].irq = -1; entry[nr].remap_index = -1; } @@ -578,11 +579,19 @@ int __setup_msi_irq(struct irq_desc *des hw_irq_controller *handler) { struct msi_msg msg; +int ret; desc->msi_desc = msidesc; desc->handler = handler; msi_compose_msg(desc->arch.vector, desc->arch.cpu_mask, &msg); -return write_msi_msg(msidesc, &msg); +ret = write_msi_msg(msidesc, &msg); +if ( unlikely(ret) ) +{ +desc->handler = &no_irq_type; +desc->msi_desc = NULL; +} + +return ret; } int msi_free_irq(struct msi_desc *entry) @@ -592,7 +601,8 @@ int msi_free_irq(struct msi_desc *entry) while ( nr-- ) { -destroy_irq(entry[nr].irq); +if ( entry[nr].irq >= 0 ) +destroy_irq(entry[nr].irq); /* Free the unused IRTE if intr remap enabled */ if ( iommu_intremap ) x86/MSI: fix error handling __setup_msi_irq() needs to undo what it did before calling write_msi_msg() in case that returned an error. map_domain_pirq() needs to get rid of the MSI descriptor it (implicitly) allocated. The case of a setup_msi_irq() failure on a non-initial multi-vector-MSI interrupt needs special handling: While the initial IRQ will get freed by the caller (who also passed it to us), we need to undo the effect setup_msi_irq() had on it. (As a benefit from the added call to msi_free_irq() we no longer need to explicitly call destroy_irq() on the non-initial slots.) Signed-off-by: Jan Beulich --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -1974,6 +1974,8 @@ int map_domain_pirq( dprintk(XENLOG_G_ERR, "dom%d: irq %d in use\n", d->domain_id, irq); pci_disable_msi(msi_desc); +msi_desc->irq = -1; +msi_free_irq(msi_desc); ret = -EBUSY; goto done; } @@ -2028,22 +2030,29 @@ int map_domain_pirq( if ( ret ) { spin_unlock_irqrestore(&desc->lock, flags); +pci_disable_msi(msi_desc); +if ( nr ) +{ +ASSERT(msi_desc->irq >= 0); +desc = irq_to_desc(msi_desc->irq); +spin_lock_irqsave(&desc->lock, flags); +desc->handler = &no_irq_type; +desc->msi_desc = NULL; +
Re: [Xen-devel] [PATCH OSSTEST] Arrange for core dumps to be placed in /var/core and collect them
Ian Campbell writes ("Re: [PATCH OSSTEST] Arrange for core dumps to be placed in /var/core and collect them"): > Do you have any thoughts on how to best universally arrange for the > rlimits to be increased? Probably inserting a ulimit call into the > libvirt initscript would be an easy first step, and since that's the > thing we most often see crashing it seems. I'll send such a patch > shortly. Maybe pam ? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] LZ4 : fix the data abort issue
If the part of the compression data are corrupted, or the compression data is totally fake, the memory access over the limit is possible. This is the log from my system usning lz4 decompression. [6502]data abort, halting [6503]r0 0x r1 0x r2 0xdcea0ffc r3 0xdcea0ffc [6509]r4 0xb9ab0bfd r5 0xdcea0ffc r6 0xdcea0ff8 r7 0xdce8 [6515]r8 0x r9 0x r10 0x r11 0xb9a98000 [6522]r12 0xdcea1000 usp 0x ulr 0x pc 0x820149bc [6528]spsr 0x41f3 and the memory addresses of some variables at the moment are ref:0xdcea0ffc, op:0xdcea0ffc, oend:0xdcea1000 As you can see, COPYLENGH is 8bytes, so @ref and @op can access the momory over @oend. Signed-off-by: JeHyeon Yeon Reviewed-by: David Sterba [Linux commit d5e7cafd69da24e6d6cc988fab6ea313a2577efc] Signed-off-by: Jan Beulich --- a/xen/common/lz4/decompress.c +++ b/xen/common/lz4/decompress.c @@ -132,6 +132,9 @@ static int INIT lz4_uncompress(const uns /* Error: request to write beyond destination buffer */ if (cpy > oend) goto _output_error; + if ((ref + COPYLENGTH) > oend || + (op + COPYLENGTH) > oend) + goto _output_error; LZ4_SECURECOPY(ref, op, (oend - COPYLENGTH)); while (op < cpy) *op++ = *ref++; LZ4 : fix the data abort issue If the part of the compression data are corrupted, or the compression data is totally fake, the memory access over the limit is possible. This is the log from my system usning lz4 decompression. [6502]data abort, halting [6503]r0 0x r1 0x r2 0xdcea0ffc r3 0xdcea0ffc [6509]r4 0xb9ab0bfd r5 0xdcea0ffc r6 0xdcea0ff8 r7 0xdce8 [6515]r8 0x r9 0x r10 0x r11 0xb9a98000 [6522]r12 0xdcea1000 usp 0x ulr 0x pc 0x820149bc [6528]spsr 0x41f3 and the memory addresses of some variables at the moment are ref:0xdcea0ffc, op:0xdcea0ffc, oend:0xdcea1000 As you can see, COPYLENGH is 8bytes, so @ref and @op can access the momory over @oend. Signed-off-by: JeHyeon Yeon Reviewed-by: David Sterba [Linux commit d5e7cafd69da24e6d6cc988fab6ea313a2577efc] Signed-off-by: Jan Beulich --- a/xen/common/lz4/decompress.c +++ b/xen/common/lz4/decompress.c @@ -132,6 +132,9 @@ static int INIT lz4_uncompress(const uns /* Error: request to write beyond destination buffer */ if (cpy > oend) goto _output_error; + if ((ref + COPYLENGTH) > oend || + (op + COPYLENGTH) > oend) + goto _output_error; LZ4_SECURECOPY(ref, op, (oend - COPYLENGTH)); while (op < cpy) *op++ = *ref++; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] VM Migration on ARM
On 20/03/15 13:17, Vijay Kilari wrote: Hi Andrew, On Wed, Mar 11, 2015 at 4:21 PM, Andrew Cooper wrote: On 11/03/15 05:32, Vijay Kilari wrote: Hi Andrew, From Ian & Stefano, I came to know that you have introduced Migration framework v2 under below patch series http://marc.info/?l=xen-devel&m=141036915311145 I have few queries: 1) Is there any plans/timeline to support for ARM32/ARM64?. I will not personally be doing any ARM migration support, not even as part of getting migration v2 accepted upstream. 2) Also does the Domain Image format specification for ARM is complete? The spec has certain sentinel values for ARM already where it made sense to just include them while writing. It has no specific thought to ARM migration, but an ARM virtual machine is exactly like an x86 HVM domain (from the toolstacks point of view), other than the Qemu record. With any luck, no new additions to the spec will be needed, but the spec is trivially extendible if required. IMO, ARM uses PVH domain. Can you please let me know how to use your patches, I mean xl/xc commands to do migration (non-live/live) using your patches. Any debugging and testing support information would be helpful. Hi - apologises for the late reply. I have been on holiday. The patches series as available does make `xl save/restore/migrate` work, although the error handling leaves a lot to be desired. We (XenServer) are in the process of attempting to get everything upstream. The libxc side of things is definitely working and already shipped in XenServer 6.5 (We absolutely needed to support migrating between 32 and 64bit toolstacks, which was the primary reason for the rewrite). What sort of timescale are you looking for on this? I hope to have a new series posted soon, now that the datacopier subseries has been accepted. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [xen-4.3-testing test] 36695: regressions - FAIL
>>> On 25.03.15 at 10:27, wrote: > On Wed, 2015-03-25 at 08:33 +, xen.org wrote: >> flight 36695 xen-4.3-testing real [real] >> http://www.chiark.greenend.org.uk/~xensrcts/logs/36695/ >> >> Regressions :-( >> >> Tests which did not succeed and are blocking, >> including tests which could not be run: >> test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. > 36483 > > This looks like the swiotlb thing. Given that the only commit under test > here is: > > commit c58b16ef1572176cf2f6a424b527b5ed4bb73f17 > Author: Jan Beulich > Date: Thu Mar 19 16:08:36 2015 +0100 > > update Xen version to 4.3.4 > > Shall we just force push it? Yes, I think so. Thanks, Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/3] xen: arm: correctly handle continuations for 64-bit guests
On Wed, 2015-03-25 at 15:41 +, Andrew Cooper wrote: > On 25/03/15 15:34, Ian Campbell wrote: > > The 64-bit ABI is different to 32-bit: > > > > - uses x16 as the op register rather than r12. > > - arguments in x0..x5 and not r0..r5. Using rN here potentially > > truncates. > > - return value goes in x0, not r0. > > > > Hypercalls can only be made directly from kernel space, so checking > > the domain's size is sufficient. > > > > The update of regs->pc is duplicated in both halves because the 32-bit > > case is going to need fixing to handle Thumb mode (next patch). > > > > Spotted due to spurious -EFAULT when destroying a domain, due to the > > hypercall's pointer argument being truncated. I'm unclear why I am > > only seeing this now. > > > > Signed-off-by: Ian Campbell > > Almost certainly 15e0aac6fe76be6a710a8e6d3da610d437903266 which changed > XEN_DOMCTL_destroydomain to use continuations rather than repeated > hypercalls. That sounds like why domaindestroy now exhibits it, but not why it hasn't been plaguing us on different hypercalls for ever. I suppose we don't actually hit preempt very often in practice. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/1] hvm.c: Prevent gcc uninitialised var warning
>>> On 25.03.15 at 16:02, wrote: > On 23/03/15 15:01, Don Slutz wrote: >> gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 reports: >> -- >> hvm.c: In function `hvm_create_ioreq_server': >> hvm.c:487:18: error: `bufioreq_pfn' may be used uninitialised in this > function >> [-Werror=uninitialized] >> hvm.c:718:30: note: `bufioreq_pfn' was declared here >> -- >> >> My code analysis says that gcc is wrong, but initilize the variable >> to prevent the gcc warning. >> >> Reported-by: Ian Murray >> Signed-off-by: Don Slutz > > GCC is correct and there is path through this function where > bufioreq_pfn is used while uninitialised (non-debug build, is_default = > 1, handle_bufioreq = 0). I'm not seeing it: When is_default is set, bufioreq_pfn gets initialized in the first if()'s body. > As an aside, the compiler is in a very easy position to spot this. The > error means that GCC has positively identified a basic block which does > use bufioreq_pfn before it has been initialised. > > I observe that the patch has been committed, but it merely hides the > problem and doesn't fix it; hvm_free_ioreq_gmfn(d, bufioreq_pfn) will > still clobber arbitrary memory. I committed the patch based on me not being able to find a path where the variable would actually be used uninitialized (and it is known that various gcc versions have varying levels of problems with correctly identifying such issues). If indeed there is a path that I'm not seeing, then I'd indeed favor reverting and putting in a proper, backportable fix. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/3] xen: arm: correctly handle continuations for 64-bit guests
On 25/03/15 15:34, Ian Campbell wrote: The 64-bit ABI is different to 32-bit: - uses x16 as the op register rather than r12. - arguments in x0..x5 and not r0..r5. Using rN here potentially truncates. - return value goes in x0, not r0. Hypercalls can only be made directly from kernel space, so checking the domain's size is sufficient. The update of regs->pc is duplicated in both halves because the 32-bit case is going to need fixing to handle Thumb mode (next patch). Spotted due to spurious -EFAULT when destroying a domain, due to the hypercall's pointer argument being truncated. I'm unclear why I am only seeing this now. Signed-off-by: Ian Campbell Almost certainly 15e0aac6fe76be6a710a8e6d3da610d437903266 which changed XEN_DOMCTL_destroydomain to use continuations rather than repeated hypercalls. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] xen: arm: Fix typo "Falltrough"
On Wed, 2015-03-25 at 15:34 +, Ian Campbell wrote: Sigh, I remembered just too late that without a cover letter git doesn't chain things, sorry. I should stop being so lazy... > Signed-off-by: Ian Campbell > --- > xen/arch/arm/domain.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index fdba081..939d8cd 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -742,7 +742,7 @@ int domain_relinquish_resources(struct domain *d) > { > case RELMEM_not_started: > d->arch.relmem = RELMEM_xen; > -/* Falltrough */ > +/* Fallthrough */ > > case RELMEM_xen: > ret = relinquish_memory(d, &d->xenpage_list); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/3] xen: arm: correctly handle continuations for 64-bit guests
The 64-bit ABI is different to 32-bit: - uses x16 as the op register rather than r12. - arguments in x0..x5 and not r0..r5. Using rN here potentially truncates. - return value goes in x0, not r0. Hypercalls can only be made directly from kernel space, so checking the domain's size is sufficient. The update of regs->pc is duplicated in both halves because the 32-bit case is going to need fixing to handle Thumb mode (next patch). Spotted due to spurious -EFAULT when destroying a domain, due to the hypercall's pointer argument being truncated. I'm unclear why I am only seeing this now. Signed-off-by: Ian Campbell --- I imagine this needs backporting everywhere... For reference the git diff -b version of this is: @@ -357,6 +357,36 @@ unsigned long hypercall_create_continuation( else { regs = guest_cpu_user_regs(); + +#ifdef CONFIG_ARM_64 +if ( !is_32bit_domain(current->domain) ) +{ +regs->x16 = op; + +/* Ensure the hypercall trap instruction is re-executed. */ +regs->pc -= 4; /* re-execute 'hvc #XEN_HYPERCALL_TAG' */ + +for ( i = 0; *p != '\0'; i++ ) +{ +arg = next_arg(p, args); + +switch ( i ) +{ +case 0: regs->x0 = arg; break; +case 1: regs->x1 = arg; break; +case 2: regs->x2 = arg; break; +case 3: regs->x3 = arg; break; +case 4: regs->x4 = arg; break; +case 5: regs->x5 = arg; break; +} +} + +/* Return value gets written back to x0 */ +rc = regs->x0; +} +else +#endif +{ regs->r12 = op; /* Ensure the hypercall trap instruction is re-executed. * */ @@ -380,6 +410,7 @@ unsigned long hypercall_create_continuation( /* Return value gets written back to r0 */ rc = regs->r0; } +} va_end(args); --- xen/arch/arm/domain.c | 63 - 1 file changed, 47 insertions(+), 16 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 939d8cd..10f13e4 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -356,29 +356,60 @@ unsigned long hypercall_create_continuation( } else { -regs = guest_cpu_user_regs(); -regs->r12 = op; +regs = guest_cpu_user_regs(); -/* Ensure the hypercall trap instruction is re-executed. */ -regs->pc -= 4; /* re-execute 'hvc #XEN_HYPERCALL_TAG' */ - -for ( i = 0; *p != '\0'; i++ ) +#ifdef CONFIG_ARM_64 +if ( !is_32bit_domain(current->domain) ) { -arg = next_arg(p, args); +regs->x16 = op; -switch ( i ) +/* Ensure the hypercall trap instruction is re-executed. */ +regs->pc -= 4; /* re-execute 'hvc #XEN_HYPERCALL_TAG' */ + +for ( i = 0; *p != '\0'; i++ ) { -case 0: regs->r0 = arg; break; -case 1: regs->r1 = arg; break; -case 2: regs->r2 = arg; break; -case 3: regs->r3 = arg; break; -case 4: regs->r4 = arg; break; -case 5: regs->r5 = arg; break; +arg = next_arg(p, args); + +switch ( i ) +{ +case 0: regs->x0 = arg; break; +case 1: regs->x1 = arg; break; +case 2: regs->x2 = arg; break; +case 3: regs->x3 = arg; break; +case 4: regs->x4 = arg; break; +case 5: regs->x5 = arg; break; +} } + +/* Return value gets written back to x0 */ +rc = regs->x0; } +else +#endif +{ +regs->r12 = op; + +/* Ensure the hypercall trap instruction is re-executed. */ +regs->pc -= 4; /* re-execute 'hvc #XEN_HYPERCALL_TAG' */ + +for ( i = 0; *p != '\0'; i++ ) +{ +arg = next_arg(p, args); + +switch ( i ) +{ +case 0: regs->r0 = arg; break; +case 1: regs->r1 = arg; break; +case 2: regs->r2 = arg; break; +case 3: regs->r3 = arg; break; +case 4: regs->r4 = arg; break; +case 5: regs->r5 = arg; break; +} +} -/* Return value gets written back to r0 */ -rc = regs->r0; +/* Return value gets written back to r0 */ +rc = regs->r0; +
[Xen-devel] [PATCH 3/3] xen: arm: handle thumb mode guest in continuations
PC only needs adjusting by 2, otherwise we rerun the instruction prior to the hvc as well. hvc is unpredictable if used within a Thumb IT (conditional execution) block, so we don't need to worry about rewinding any of that state. Signed-off-by: Ian Campbell --- Should be backported --- xen/arch/arm/domain.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 10f13e4..2d2197e 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -387,10 +387,11 @@ unsigned long hypercall_create_continuation( else #endif { +int is_thumb = (regs->cpsr & PSR_THUMB); regs->r12 = op; /* Ensure the hypercall trap instruction is re-executed. */ -regs->pc -= 4; /* re-execute 'hvc #XEN_HYPERCALL_TAG' */ +regs->pc -= is_thumb?2:4; /* re-execute 'hvc #XEN_HYPERCALL_TAG' */ for ( i = 0; *p != '\0'; i++ ) { -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/3] xen: arm: Fix typo "Falltrough"
Signed-off-by: Ian Campbell --- xen/arch/arm/domain.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index fdba081..939d8cd 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -742,7 +742,7 @@ int domain_relinquish_resources(struct domain *d) { case RELMEM_not_started: d->arch.relmem = RELMEM_xen; -/* Falltrough */ +/* Fallthrough */ case RELMEM_xen: ret = relinquish_memory(d, &d->xenpage_list); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC] Add the panic info when disable VT-d
>>> On 25.03.15 at 03:32, wrote: >> >>> On 19.01.15 at 10:00, wrote: >> > --- a/xen/arch/x86/apic.c >> > +++ b/xen/arch/x86/apic.c >> > @@ -915,6 +915,11 @@ void __init x2apic_bsp_setup(void) >> > return; >> > } >> > printk("x2APIC: Already enabled by BIOS: Ignoring cmdline >> > disable.\n"); >> > +} else { >> > +if ( !iommu_enable) { >> > +panic("x2APIC should be disabled while IOMMU is disabled," >> > + "try to set x2apic=0 in cmdline and disable x2apic in BIOS"); >> > +} >> >> Putting aside the coding style violations (figure braces on their own lines, > no >> hard tabs), you tie this to the wrong thing: You need interrupt remapping to >> be enabled, whereas iommu_enable may only mean DMA remapping. And >> I'm afraid you'd run into an ordering problem (iommu_intremap getting set >> to its final value vs. the code above being run) if you tried to correct > this. > > I don't understand the ordering problem you referred to, the patch is just > change the panic info, > and tell the user what they should do to avoid the panic, I didn't change > the logic of the current code. > Without my patch, the current code will also print the panic info like this: > > (XEN) > (XEN) Panic on CPU 0: > (XEN) Couldn't enable IOMMU and iommu=required/force > (XEN) > > It's odd because the user have set ' iommu=0', anyway, it should be fixed. Not sure what you consider odd here - x2apic_bsp_setup() is where force_iommu gets set to 1, i.e. the user specifying "iommu=0" gets intentionally overridden in this case. > How about change the panic info to this. > +} else { > +if ( !iommu_enable) { > +panic("x2APIC should be disabled while iommu=0 is set," > + "try to set x2apic=0 option and disable x2apic in BIOS to > avoid this"); > +} > > or could you give some suggestion? You only altered the text, not the condition. As said before, I think this really should be keyed off of iommu_intremap. I can't exclude that the existing logic isn't really correct either, but if that's the case the solution is not to add another random panic() invocation, but to fix that logic. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/1] hvm.c: Prevent gcc uninitialised var warning
On 23/03/15 15:01, Don Slutz wrote: > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 reports: > -- > hvm.c: In function `hvm_create_ioreq_server': > hvm.c:487:18: error: `bufioreq_pfn' may be used uninitialised in this function > [-Werror=uninitialized] > hvm.c:718:30: note: `bufioreq_pfn' was declared here > -- > > My code analysis says that gcc is wrong, but initilize the variable > to prevent the gcc warning. > > Reported-by: Ian Murray > Signed-off-by: Don Slutz GCC is correct and there is path through this function where bufioreq_pfn is used while uninitialised (non-debug build, is_default = 1, handle_bufioreq = 0). As an aside, the compiler is in a very easy position to spot this. The error means that GCC has positively identified a basic block which does use bufioreq_pfn before it has been initialised. I observe that the patch has been committed, but it merely hides the problem and doesn't fix it; hvm_free_ioreq_gmfn(d, bufioreq_pfn) will still clobber arbitrary memory. No combination of parameters should be able to cause UB like this. In this case, the error handling is incredibly fragile. All the unmapping should be based on information in s->{,buf}ioreq which is set upon successful map, rather than the booleans indicating whether the function should have set up the mappings. This allows all the fail$N labels to collapse into a single, more simple, exit path. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH OSSTEST] Arrange for core dumps to be placed in /var/core and collect them
On Wed, 2015-03-25 at 09:40 +, Ian Campbell wrote: > Do you have any thoughts on how to best universally arrange for the > rlimits to be increased? Probably inserting a ulimit call into the > libvirt initscript would be an easy first step, and since that's the > thing we most often see crashing it seems. I'll send such a patch > shortly. > > Adding it to xencommons would be OK too, which just leaves catching e.g. > daemonised xl processes launched via ssh. Hacking /etc/profile or > something in TestSupport::cmdex perhaps? I handled this by writing an /etc/security/limits.d/ file during host install, which seems to have done the trick. I just sent a patch which along with the libvirt one I sent earlier today covers everything I think we care about except our daemons which are launched from xencommons. The initscript in xen.git doesn't offer a useful hook for this, we could add something to the end of /etc/default/xencommons, which is sourced by the script, but that seems like a bit of an abuse. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 06/12] xen: arm: correctly handle vtimer traps from userspace
Previously 32-bit userspace on 32-bit kernel and 64-bit userspace on 64-bit kernel could access these registers irrespective of whether the kernel had configured them to be allowed to. To fix this: - Userspace access to CNTP_CTL_EL0 and CNTP_TVAL_EL0 should be gated on CNTKCTL_EL1.EL0PTEN. - Userspace access to CNTPCT_EL0 should be gated on CNTKCTL_EL1.EL0PCTEN. When we do not handle an access we now silently inject an undef even in debug mode since the debugging messages are not helpful (we have handled the access, by explicitly choosing not to). Since HSR_EC_CP15_64 cannot be taken from a guest in AArch64 mode except due to a hardware bug switch the associated check to a BUG_ON (which will be switched to something more appropriate in a subsequent patch) Fix a coding style issue in HSR_CPREG64(CNTPCT) while touching similar code. Signed-off-by: Ian Campbell --- v2: Replaces "xen: arm: turn vtimer traps for cp32/64 and sysreg into #undef" --- xen/arch/arm/traps.c| 28 +--- xen/arch/arm/vtimer.c | 38 +++--- xen/include/asm-arm/processor.h |6 ++ 3 files changed, 38 insertions(+), 34 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 4eda561..c5ffcc5 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1598,11 +1598,7 @@ static void do_cp15_32(struct cpu_user_regs *regs, case HSR_CPREG32(CNTP_CTL): case HSR_CPREG32(CNTP_TVAL): if ( !vtimer_emulate(regs, hsr) ) -{ -dprintk(XENLOG_ERR, -"failed emulation of 32-bit vtimer CP register access\n"); -domain_crash_synchronous(); -} +goto undef_cp15_32; break; case HSR_CPREG32(ACTLR): if ( cp32.read ) @@ -1645,6 +1641,7 @@ static void do_cp15_32(struct cpu_user_regs *regs, gdprintk(XENLOG_ERR, "unhandled 32-bit CP15 access %#x\n", hsr.bits & HSR_CP32_REGS_MASK); #endif + undef_cp15_32: inject_undef_exception(regs, hsr.len); return; } @@ -1665,11 +1662,7 @@ static void do_cp15_64(struct cpu_user_regs *regs, case HSR_CPREG64(CNTPCT): case HSR_CPREG64(CNTP_CVAL): if ( !vtimer_emulate(regs, hsr) ) -{ -dprintk(XENLOG_ERR, -"failed emulation of 64-bit vtimer CP register access\n"); -domain_crash_synchronous(); -} +goto undef_cp15_64; break; default: { @@ -1683,6 +1676,7 @@ static void do_cp15_64(struct cpu_user_regs *regs, gdprintk(XENLOG_ERR, "unhandled 64-bit CP15 access %#x\n", hsr.bits & HSR_CP64_REGS_MASK); #endif + undef_cp15_64: inject_undef_exception(regs, hsr.len); return; } @@ -1851,11 +1845,7 @@ static void do_sysreg(struct cpu_user_regs *regs, case HSR_SYSREG_CNTP_TVAL_EL0: case HSR_SYSREG_CNTP_CVAL_EL0: if ( !vtimer_emulate(regs, hsr) ) -{ -dprintk(XENLOG_ERR, -"failed emulation of 64-bit vtimer sysreg access\n"); -domain_crash_synchronous(); -} +goto undef_sysreg; break; case HSR_SYSREG_ICC_SGI1R_EL1: if ( !vgic_emulate(regs, hsr) ) @@ -1874,8 +1864,8 @@ static void do_sysreg(struct cpu_user_regs *regs, default: bad_sysreg: { -struct hsr_sysreg sysreg = hsr.sysreg; #ifndef NDEBUG +struct hsr_sysreg sysreg = hsr.sysreg; gdprintk(XENLOG_ERR, "%s %d, %d, c%d, c%d, %d %s x%d @ 0x%"PRIregister"\n", @@ -1888,7 +1878,8 @@ static void do_sysreg(struct cpu_user_regs *regs, gdprintk(XENLOG_ERR, "unhandled 64-bit sysreg access %#x\n", hsr.bits & HSR_SYSREG_REGS_MASK); #endif -inject_undef_exception(regs, sysreg.len); + undef_sysreg: +inject_undef_exception(regs, hsr.len); return; } } @@ -2064,8 +2055,7 @@ asmlinkage void do_trap_hypervisor(struct cpu_user_regs *regs) do_cp15_32(regs, hsr); break; case HSR_EC_CP15_64: -if ( !is_32bit_domain(current->domain) ) -goto bad_trap; +BUG_ON(!psr_mode_is_32bit(regs->cpsr)); perfc_incr(trap_cp15_64); do_cp15_64(regs, hsr); break; diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c index 0c67f20..5aca65e 100644 --- a/xen/arch/arm/vtimer.c +++ b/xen/arch/arm/vtimer.c @@ -154,9 +154,14 @@ int virt_timer_restore(struct vcpu *v) return 0; } -static void vtimer_cntp_ctl(struct cpu_user_regs *regs, uint32_t *r, int read) +static int vtimer_cntp_ctl(struct cpu_user_regs *regs, uint32_t *r, int read) { struct vcpu *v = current; + +if ( psr_mode_is_user(regs) && + !(READ_SYSREG(CNTKCTL_EL1) & CNTKCTL_EL1_EL0PTEN) ) +return 0; + if ( read ) { *r =
[Xen-devel] [PATCH 03/12] xen: arm: Use ARMv8 names for CNTHCTL_EL2 bits
Rather than using the v8 register names and the v7 bit names, which makes things needlessly difficult when reading. Signed-off-by: Ian Campbell --- v3: New patch. --- xen/arch/arm/time.c |2 +- xen/include/asm-arm/processor.h |4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index 352e25e..ce6d3fd 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -230,7 +230,7 @@ void __cpuinit init_timer_interrupt(void) /* Sensible defaults */ WRITE_SYSREG64(0, CNTVOFF_EL2); /* No VM-specific offset */ /* Do not let the VMs program the physical timer, only read the physical counter */ -WRITE_SYSREG32(CNTHCTL_PA, CNTHCTL_EL2); +WRITE_SYSREG32(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2); WRITE_SYSREG32(0, CNTP_CTL_EL0);/* Physical timer disabled */ WRITE_SYSREG32(0, CNTHP_CTL_EL2); /* Hypervisor's timer disabled */ isb(); diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h index fcd26fb..570d2a1 100644 --- a/xen/include/asm-arm/processor.h +++ b/xen/include/asm-arm/processor.h @@ -555,8 +555,8 @@ union hsr { #define FSC_LL_MASK(_AC(0x03,U)<<0) /* Time counter hypervisor control register */ -#define CNTHCTL_PA (1u<<0) /* Kernel/user access to physical counter */ -#define CNTHCTL_TA (1u<<1) /* Kernel/user access to CNTP timer */ +#define CNTHCTL_EL2_EL1PCTEN (1u<<0) /* Kernel/user access to physical counter */ +#define CNTHCTL_EL2_EL1PCEN (1u<<1) /* Kernel/user access to CNTP timer regs */ /* Timer control registers */ #define CNTx_CTL_ENABLE (1u<<0) /* Enable timer */ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 09/12] xen: arm: correctly handle sysreg accesses from userspace
Previously we implemented all registers as RAZ/WI even if they shouldn't be accessible to userspace. Accesses to the *_EL1 registers from EL0 are trapped to EL1 by the hardware, so add a BUG_ON. Likewise accesses from 32-bit EL1 cannot happen. PMUSERENR_EL0 and MDCCSR_EL0 are R/O to EL0. Other PM*_EL0 registers are accessible at EL0 only if PMUSERENR_EL0.EN is set, since we emulate that as RAZ/WI we know that bit cannot be set. Signed-off-by: Ian Campbell --- xen/arch/arm/traps.c | 54 +++-- xen/include/asm-arm/sysregs.h |1 + 2 files changed, 42 insertions(+), 13 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 909e880..a65b32a 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1852,11 +1852,40 @@ static void do_sysreg(struct cpu_user_regs *regs, switch ( hsr.bits & HSR_SYSREG_REGS_MASK ) { /* RAZ/WI registers: */ + /* - Debug */ case HSR_SYSREG_MDSCR_EL1: /* - Perf monitors */ case HSR_SYSREG_PMINTENSET_EL1: case HSR_SYSREG_PMINTENCLR_EL1: +/* - Breakpoints */ +HSR_SYSREG_DBG_CASES(DBGBVR): +HSR_SYSREG_DBG_CASES(DBGBCR): +/* - Watchpoints */ +HSR_SYSREG_DBG_CASES(DBGWVR): +HSR_SYSREG_DBG_CASES(DBGWCR): +/* - Double Lock Register */ +case HSR_SYSREG_OSDLR_EL1: +/* EL1 only */ +BUG_ON(psr_mode_is_user(regs)); +goto sysreg_raz_wi; + +case HSR_SYSREG_PMUSERENR_EL0: +/* RO at EL0. RAZ/WI at EL1 */ +if ( psr_mode_is_user(regs) && !hsr.sysreg.read ) +goto undef_sysreg; +goto sysreg_raz_wi; + +case HSR_SYSREG_MDCCSR_EL0: +/* + * Accessible at EL0 only if MDSCR_EL1.TDCC is set to 0. We emulate that + * register as RAZ/WI above. So RO at both EL0 and EL1. + */ +if ( !hsr.sysreg.read ) +goto undef_sysreg; + +*x = 0; +break; case HSR_SYSREG_PMCR_EL0: case HSR_SYSREG_PMCNTENSET_EL0: case HSR_SYSREG_PMCNTENCLR_EL0: @@ -1868,16 +1897,16 @@ static void do_sysreg(struct cpu_user_regs *regs, case HSR_SYSREG_PMCCNTR_EL0: case HSR_SYSREG_PMXEVTYPER_EL0: case HSR_SYSREG_PMXEVCNTR_EL0: -case HSR_SYSREG_PMUSERENR_EL0: case HSR_SYSREG_PMOVSSET_EL0: -/* - Breakpoints */ -HSR_SYSREG_DBG_CASES(DBGBVR): -HSR_SYSREG_DBG_CASES(DBGBCR): -/* - Watchpoints */ -HSR_SYSREG_DBG_CASES(DBGWVR): -HSR_SYSREG_DBG_CASES(DBGWCR): -/* - Double Lock Register */ -case HSR_SYSREG_OSDLR_EL1: +/* + * Accessible at EL0 only if PMUSERENR_EL0.EN is set. We + * emulate that register as 0 above. + */ +if ( psr_mode_is_user(regs) ) +goto undef_sysreg; +/* Fall thru */ + + sysreg_raz_wi: if ( hsr.sysreg.read ) *x = 0; /* else: write ignored */ @@ -1885,8 +1914,9 @@ static void do_sysreg(struct cpu_user_regs *regs, /* Write only, Write ignore registers: */ case HSR_SYSREG_OSLAR_EL1: +BUG_ON(psr_mode_is_user(regs)); if ( hsr.sysreg.read ) -goto bad_sysreg; +goto undef_sysreg; /* else: write ignored */ break; case HSR_SYSREG_CNTP_CTL_EL0: @@ -1910,7 +1940,6 @@ static void do_sysreg(struct cpu_user_regs *regs, "Emulation of sysreg ICC_SGI0R_EL1/ASGI1R_EL1 not supported\n"); inject_undef64_exception(regs, hsr.len); default: - bad_sysreg: { #ifndef NDEBUG struct hsr_sysreg sysreg = hsr.sysreg; @@ -2153,8 +2182,7 @@ asmlinkage void do_trap_hypervisor(struct cpu_user_regs *regs) inject_undef64_exception(regs, hsr.len); break; case HSR_EC_SYSREG: -if ( is_32bit_domain(current->domain) ) -goto bad_trap; +BUG_ON(psr_mode_is_32bit(regs->cpsr)); perfc_incr(trap_sysreg); do_sysreg(regs, hsr); break; diff --git a/xen/include/asm-arm/sysregs.h b/xen/include/asm-arm/sysregs.h index df8e070..3b9ddd3 100644 --- a/xen/include/asm-arm/sysregs.h +++ b/xen/include/asm-arm/sysregs.h @@ -43,6 +43,7 @@ #define HSR_SYSREG_MDSCR_EL1 HSR_SYSREG(2,0,c0,c2,2) #define HSR_SYSREG_OSLAR_EL1 HSR_SYSREG(2,0,c1,c0,4) #define HSR_SYSREG_OSDLR_EL1 HSR_SYSREG(2,0,c1,c3,4) +#define HSR_SYSREG_MDCCSR_EL0 HSR_SYSREG(2,3,c0,c1,0) #define HSR_SYSREG_DBGBVRn_EL1(n) HSR_SYSREG(2,0,c0,c##n,4) #define HSR_SYSREG_DBGBCRn_EL1(n) HSR_SYSREG(2,0,c0,c##n,5) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 01/12] xen: arm: Correct PMXEV cp register definitions
p15,0,c9,c13,1 is PMXEVTYPER not PMXEVCNTR. p15,0,c9,c13,2 is PMXEVCNTR not PMXEVCNR. Signed-off-by: Ian Campbell Reviewed-by: Julien Grall --- xen/arch/arm/traps.c |2 +- xen/include/asm-arm/cpregs.h |4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index ad046e8..50d67aa 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1633,8 +1633,8 @@ static void do_cp15_32(struct cpu_user_regs *regs, case HSR_CPREG32(PMCEID0): case HSR_CPREG32(PMCEID1): case HSR_CPREG32(PMCCNTR): +case HSR_CPREG32(PMXEVTYPER): case HSR_CPREG32(PMXEVCNTR): -case HSR_CPREG32(PMXEVCNR): case HSR_CPREG32(PMUSERENR): case HSR_CPREG32(PMINTENSET): case HSR_CPREG32(PMINTENCLR): diff --git a/xen/include/asm-arm/cpregs.h b/xen/include/asm-arm/cpregs.h index f1100c8..afe9148 100644 --- a/xen/include/asm-arm/cpregs.h +++ b/xen/include/asm-arm/cpregs.h @@ -222,8 +222,8 @@ #define PMCEID0 p15,0,c9,c12,6 /* Perf. Mon. Common Event Identification register 0 */ #define PMCEID1 p15,0,c9,c12,7 /* Perf. Mon. Common Event Identification register 1 */ #define PMCCNTR p15,0,c9,c13,0 /* Perf. Mon. Cycle Count Register */ -#define PMXEVCNTR p15,0,c9,c13,1 /* Perf. Mon. Event Type Select Register */ -#define PMXEVCNRp15,0,c9,c13,2 /* Perf. Mon. Event Count Register */ +#define PMXEVTYPER p15,0,c9,c13,1 /* Perf. Mon. Event Type Select Register */ +#define PMXEVCNTR p15,0,c9,c13,2 /* Perf. Mon. Event Count Register */ #define PMUSERENR p15,0,c9,c14,0 /* Perf. Mon. User Enable Register */ #define PMINTENSET p15,0,c9,c14,1 /* Perf. Mon. Interrupt Enable Set Register */ #define PMINTENCLR p15,0,c9,c14,2 /* Perf. Mon. Interrupt Enable Clear Register */ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 05/12] xen: arm: Handle 32-bit EL0 on 64-bit EL1 when advancing PC after trap
Fix a coding style issue while in the area. Signed-off-by: Ian Campbell Reviewed-by: Julien Grall --- xen/arch/arm/traps.c |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 83abafa..4eda561 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1487,7 +1487,7 @@ static int check_conditional_instr(struct cpu_user_regs *regs, union hsr hsr) { unsigned long it; -BUG_ON( !is_32bit_domain(current->domain) || !(cpsr&PSR_THUMB) ); +BUG_ON( !psr_mode_is_32bit(regs->cpsr) || !(cpsr&PSR_THUMB) ); it = ( (cpsr >> (10-2)) & 0xfc) | ((cpsr >> 25) & 0x3 ); @@ -1496,7 +1496,7 @@ static int check_conditional_instr(struct cpu_user_regs *regs, union hsr hsr) return 1; /* The cond for this instruction works out as the top 4 bits. */ -cond = ( it >> 4 ); +cond = ( it >> 4 ); } cpsr_cond = cpsr >> 28; @@ -1514,10 +1514,10 @@ static void advance_pc(struct cpu_user_regs *regs, union hsr hsr) unsigned long itbits, cond, cpsr = regs->cpsr; /* PSR_IT_MASK bits can only be set for 32-bit processors in Thumb mode. */ -BUG_ON( (!is_32bit_domain(current->domain)||!(cpsr&PSR_THUMB)) +BUG_ON( (!psr_mode_is_32bit(cpsr)||!(cpsr&PSR_THUMB)) && (cpsr&PSR_IT_MASK) ); -if ( is_32bit_domain(current->domain) && (cpsr&PSR_IT_MASK) ) +if ( cpsr&PSR_IT_MASK ) { /* The ITSTATE[7:0] block is contained in CPSR[15:10],CPSR[26:25] * -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v3 0/12] xen: arm: reenable support for 32-bit userspace running in 64-bit guest.
XSA-102/CVE-2014-5147[0] concerned a crash when trapping from 32-bit userspace in a 64-bit guest. Part of that security patch was c0020e09970 "xen: arm: Handle traps from 32-bit userspace on 64-bit kernel as undef fix" which turned the exploitable crash into a #undef to the guest (so as to kill the process but not the host) as a workaround for the issue. However while this prevented the exploit it did not make 32-bit userspaces which were prone to triggering the issue actually work. This series consists of some patches which I originally wrote for XSA-102 to fix the issue properly before it was determined that those fixes were too invasive by far for a security update. At the end of the series is a new patch which removes the XSA-102 workaround since all problematic traps should now be handled. Since these were originally intended to be the security fix they have had a fair bit of scrutiny already in private . However since there is now a risk of reintroducing XSA-102 I would appreciate a pretty thorough second pair of eyes on it this time around. I've tested this with a local utility which tries to access the various cp and system registers from both 32- and 64-bit processes and checks that they either work or give the expected traps. Since this tool is effectively an exploit for XSA-102 I'm not sharing here but if you ask nicely and appear to be wearing the correct colour hat I might share it with you (it's not terribly impressive, so don't get too excited). Since last time I've implemented Julien's review feedback including: * added the GUEST_BUG_ON patch to the end to replace the BUG_ONs due to invalid h/w state, which gets more useful debug if that occurs. * handled CNTP_CVAL_EL0. Ian. [0] http://xenbits.xen.org/xsa/advisory-102.html ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 07/12] xen: arm: Handle CP15 register traps from userspace
Previously userspace access to PM* would have been incorrectly (but benignly) implemented as RAZ/WI when running on a 32-bit kernel and would cause a hypervisor exception (host crash) when running a 64-bit kernel (this was already solved via the fix to XSA-102). CLIDR, CCSIDR, DCCISW, ACTLR, PMINTENSET, PMINTENCLR are EL1 only, attempts to access from EL0 will trap to EL1 not to us, hence BUG_ON is appropriate now. PMUSERENR is R/O at EL0 and we implement as RAZ/WI at EL1 as before. The remaining PM* registers are accessible to EL0 only if PMUSERENR_EL0.EN is set, since we emulate this as RAZ/WI the bit is never set so we inject a trap on attempted access. We weren't previously handling PMCCNTR. Signed-off-by: Ian Campbell --- xen/arch/arm/traps.c | 32 +++- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index c5ffcc5..e49ae79 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1565,6 +1565,7 @@ static void do_cp15_32(struct cpu_user_regs *regs, switch ( hsr.bits & HSR_CP32_REGS_MASK ) { case HSR_CPREG32(CLIDR): +BUG_ON(psr_mode_is_user(regs)); if ( !cp32.read ) { dprintk(XENLOG_ERR, @@ -1574,6 +1575,7 @@ static void do_cp15_32(struct cpu_user_regs *regs, *r = READ_SYSREG32(CLIDR_EL1); break; case HSR_CPREG32(CCSIDR): +BUG_ON(psr_mode_is_user(regs)); if ( !cp32.read ) { dprintk(XENLOG_ERR, @@ -1583,6 +1585,7 @@ static void do_cp15_32(struct cpu_user_regs *regs, *r = READ_SYSREG32(CCSIDR_EL1); break; case HSR_CPREG32(DCCISW): +BUG_ON(psr_mode_is_user(regs)); if ( cp32.read ) { dprintk(XENLOG_ERR, @@ -1601,6 +1604,7 @@ static void do_cp15_32(struct cpu_user_regs *regs, goto undef_cp15_32; break; case HSR_CPREG32(ACTLR): +BUG_ON(psr_mode_is_user(regs)); if ( cp32.read ) *r = v->arch.actlr; break; @@ -1613,6 +1617,18 @@ static void do_cp15_32(struct cpu_user_regs *regs, * always support PMCCNTR (the cyle counter): we just RAZ/WI for all * PM register, which doesn't crash the kernel at least */ +case HSR_CPREG32(PMUSERENR): +/* RO at EL0. RAZ/WI at EL1 */ +if ( psr_mode_is_user(regs) && !hsr.cp32.read ) +goto undef_cp15_32; +goto cp15_32_raz_wi; + +case HSR_CPREG32(PMINTENSET): +case HSR_CPREG32(PMINTENCLR): +/* EL1 only */ +BUG_ON(psr_mode_is_user(regs)); +goto cp15_32_raz_wi; + case HSR_CPREG32(PMCR): case HSR_CPREG32(PMCNTENSET): case HSR_CPREG32(PMCNTENCLR): @@ -1624,12 +1640,19 @@ static void do_cp15_32(struct cpu_user_regs *regs, case HSR_CPREG32(PMCCNTR): case HSR_CPREG32(PMXEVTYPER): case HSR_CPREG32(PMXEVCNTR): -case HSR_CPREG32(PMUSERENR): -case HSR_CPREG32(PMINTENSET): -case HSR_CPREG32(PMINTENCLR): case HSR_CPREG32(PMOVSSET): +/* + * Accessible at EL0 only if PMUSERENR_EL0.EN is set. We + * emulate that register as 0 above. + */ +if ( psr_mode_is_user(regs) ) +goto undef_cp15_32; +/* Fall thru */ + + cp15_32_raz_wi: if ( cp32.read ) *r = 0; +/* else: write ignored */ break; default: @@ -2049,8 +2072,7 @@ asmlinkage void do_trap_hypervisor(struct cpu_user_regs *regs) advance_pc(regs, hsr); break; case HSR_EC_CP15_32: -if ( !is_32bit_domain(current->domain) ) -goto bad_trap; +BUG_ON(!psr_mode_is_32bit(regs->cpsr)); perfc_incr(trap_cp15_32); do_cp15_32(regs, hsr); break; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 08/12] xen: arm: Handle CP14 32-bit register accesses from userspace
Accesses to these from 32-bit userspace would cause a hypervisor exception (host crash) when running a 64-bit kernel, which is worked around by the fix to XSA-102. On 32-bit kernels they would be implemented as RAZ/WI which is incorrect but harmless. Update as follows: - DBGDSCRINT should be R/O. - DBGDSCREXT should be EL1 only. - DBGOSLAR is RO and EL1 only. - DBGVCR, DBGB[VC]R*, DBGW[VC]R*, and DBGOSDLR are EL1 only. DBGDIDR and DBGDSCRINT are accessible from EL0 if DBGDSCRext.UDCCdis. Since we emulate that as RAZ/WI we allow access. When we do not allow an access we now silently inject an undef even in debug mode since the debugging messages are not helpful (we have handled the access, by explicitly choosing not to). Signed-off-by: Ian Campbell --- v3: Add comment to DBGDSCRINT case. --- xen/arch/arm/traps.c | 38 +++--- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index e49ae79..909e880 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -1722,10 +1722,12 @@ static void do_cp14_32(struct cpu_user_regs *regs, union hsr hsr) switch ( hsr.bits & HSR_CP32_REGS_MASK ) { case HSR_CPREG32(DBGDIDR): - -/* Read-only register */ +/* + * Read-only register. Accessible by EL0 if DBGDSCRext.UDCCdis + * is set to 0, which we emulated below. + */ if ( !cp32.read ) -goto bad_cp; +goto undef_cp14_32; /* Implement the minimum requirements: * - Number of watchpoints: 1 @@ -1738,15 +1740,28 @@ static void do_cp14_32(struct cpu_user_regs *regs, union hsr hsr) break; case HSR_CPREG32(DBGDSCRINT): +/* + * Read-only register. Accessible by EL0 if DBGDSCRext.UDCCdis + * is set to 0, which we emulated below. + */ +if ( !cp32.read ) +goto undef_cp14_32; + +*r = 0; +break; + case HSR_CPREG32(DBGDSCREXT): +if ( usr_mode(regs) ) +goto undef_cp14_32; + /* Implement debug status and control register as RAZ/WI. * The OS won't use Hardware debug if MDBGen not set */ if ( cp32.read ) *r = 0; break; + case HSR_CPREG32(DBGVCR): -case HSR_CPREG32(DBGOSLAR): case HSR_CPREG32(DBGBVR0): case HSR_CPREG32(DBGBCR0): case HSR_CPREG32(DBGWVR0): @@ -1754,13 +1769,22 @@ static void do_cp14_32(struct cpu_user_regs *regs, union hsr hsr) case HSR_CPREG32(DBGBVR1): case HSR_CPREG32(DBGBCR1): case HSR_CPREG32(DBGOSDLR): +if ( usr_mode(regs) ) +goto undef_cp14_32; /* RAZ/WI */ if ( cp32.read ) *r = 0; break; +case HSR_CPREG32(DBGOSLAR): +if ( usr_mode(regs) ) +goto undef_cp14_32; +/* WO */ +if ( cp32.read ) +goto undef_cp14_32; +/* else: ignore */ +break; default: -bad_cp: #ifndef NDEBUG gdprintk(XENLOG_ERR, "%s p14, %d, r%d, cr%d, cr%d, %d @ 0x%"PRIregister"\n", @@ -1769,6 +1793,7 @@ bad_cp: gdprintk(XENLOG_ERR, "unhandled 32-bit cp14 access %#x\n", hsr.bits & HSR_CP32_REGS_MASK); #endif +undef_cp14_32: inject_undef_exception(regs, hsr.len); return; } @@ -2082,8 +2107,7 @@ asmlinkage void do_trap_hypervisor(struct cpu_user_regs *regs) do_cp15_64(regs, hsr); break; case HSR_EC_CP14_32: -if ( !is_32bit_domain(current->domain) ) -goto bad_trap; +BUG_ON(!psr_mode_is_32bit(regs->cpsr)); perfc_incr(trap_cp14_32); do_cp14_32(regs, hsr); break; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 10/12] xen: arm: handle remaining traps from userspace
CP14 dbg and general CP register access are both handled with unconditional injection of #undef from their respective handlers, so allow these even from 32-bit userspace on a 64-bit kernel. SMC32 and HVC32 should only come from a guest in AArch32 mode and SMC64 and HVC64 should only come from a guest in AArch64 mode. Add appropriate BUG_ONs to all cases. After this bad_trap is no longer used. Signed-off-by: Ian Campbell --- v3: Reintroduce accidentally dropped undef injection from smc64 case. --- xen/arch/arm/traps.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index a65b32a..9e7337f 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -2141,23 +2141,22 @@ asmlinkage void do_trap_hypervisor(struct cpu_user_regs *regs) do_cp14_32(regs, hsr); break; case HSR_EC_CP14_DBG: -if ( !is_32bit_domain(current->domain) ) -goto bad_trap; +BUG_ON(!psr_mode_is_32bit(regs->cpsr)); perfc_incr(trap_cp14_dbg); do_cp14_dbg(regs, hsr); break; case HSR_EC_CP: -if ( !is_32bit_domain(current->domain) ) -goto bad_trap; +BUG_ON(!psr_mode_is_32bit(regs->cpsr)); perfc_incr(trap_cp); do_cp(regs, hsr); break; case HSR_EC_SMC32: +BUG_ON(!psr_mode_is_32bit(regs->cpsr)); perfc_incr(trap_smc32); -inject_undef32_exception(regs); +inject_undef_exception(regs, hsr.len); break; case HSR_EC_HVC32: -perfc_incr(trap_hvc32); +BUG_ON(!psr_mode_is_32bit(regs->cpsr)); #ifndef NDEBUG if ( (hsr.iss & 0xff00) == 0xff00 ) return do_debug_trap(regs, hsr.iss & 0x00ff); @@ -2168,6 +2167,7 @@ asmlinkage void do_trap_hypervisor(struct cpu_user_regs *regs) break; #ifdef CONFIG_ARM_64 case HSR_EC_HVC64: +BUG_ON(psr_mode_is_32bit(regs->cpsr)); perfc_incr(trap_hvc64); #ifndef NDEBUG if ( (hsr.iss & 0xff00) == 0xff00 ) @@ -2178,6 +2178,7 @@ asmlinkage void do_trap_hypervisor(struct cpu_user_regs *regs) do_trap_hypercall(regs, ®s->x16, hsr.iss); break; case HSR_EC_SMC64: +BUG_ON(psr_mode_is_32bit(regs->cpsr)); perfc_incr(trap_smc64); inject_undef64_exception(regs, hsr.len); break; @@ -2204,7 +2205,6 @@ asmlinkage void do_trap_hypervisor(struct cpu_user_regs *regs) #endif default: - bad_trap: printk("Hypervisor Trap. HSR=0x%x EC=0x%x IL=%x Syndrome=0x%"PRIx32"\n", hsr.bits, hsr.ec, hsr.len, hsr.iss); do_unexpected_trap("Hypervisor", regs); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 04/12] xen: arm: Factor out psr_mode_is_user
This embodies the logic on arm64 that userspace can be either 32-bit or 64-bit. It will be used in other places shortly. Note that the logic differs slightly because the original (in inject_abt64_exception) knew that the kernel was 64-bit and could therefore assume that any 32-bit mode was userspace. Instead the refactored code explicitly checks for usr mode. Signed-off-by: Ian Campbell Reviewed-by: Julien Grall --- xen/arch/arm/traps.c |9 + xen/include/asm-arm/regs.h |8 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 9e4a60f..83abafa 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -455,14 +455,7 @@ static void inject_abt64_exception(struct cpu_user_regs *regs, .len = instr_len, }; -/* - * Trap may have been taken from EL0, which might be in AArch32 - * mode (PSR_MODE_BIT set), or in AArch64 mode (PSR_MODE_EL0t). - * - * Since we know the kernel must be 64-bit any trap from a 32-bit - * mode must have been from EL0. - */ -if ( psr_mode_is_32bit(regs->cpsr) || psr_mode(regs->cpsr,PSR_MODE_EL0t) ) +if ( psr_mode_is_user(regs) ) esr.ec = prefetch ? HSR_EC_INSTR_ABORT_LOWER_EL : HSR_EC_DATA_ABORT_LOWER_EL; else diff --git a/xen/include/asm-arm/regs.h b/xen/include/asm-arm/regs.h index 0951857..56d53d6 100644 --- a/xen/include/asm-arm/regs.h +++ b/xen/include/asm-arm/regs.h @@ -24,9 +24,17 @@ #ifdef CONFIG_ARM_32 #define hyp_mode(r) psr_mode((r)->cpsr,PSR_MODE_HYP) +#define psr_mode_is_user(r) usr_mode(r) #else #define hyp_mode(r) (psr_mode((r)->cpsr,PSR_MODE_EL2h) || \ psr_mode((r)->cpsr,PSR_MODE_EL2t)) + +/* + * Trap may have been taken from EL0, which might be in AArch32 usr + * mode, or in AArch64 mode (PSR_MODE_EL0t). + */ +#define psr_mode_is_user(r) \ +(psr_mode((r)->cpsr,PSR_MODE_EL0t) || usr_mode(r)) #endif #define guest_mode(r) \ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel