Re: [Xen-devel] [PATCH v2] x86/stackframe/32: repair 32-bit Xen PV
On 04.11.2019 23:44, Thomas Gleixner wrote: > On Tue, 29 Oct 2019, Jan Beulich wrote: > >> Once again RPL checks have been introduced which don't account for a >> 32-bit kernel living in ring 1 when running in a PV Xen domain. >> >> The case in FIXUP_FRAME has been preventing boot; adjust BUG_IF_WRONG_CR3 >> as well just in case. > > Either it's required and correct or it's not. Just in case is not helpful > at all. _If_ this macro sits on any path reachable when running PV under Xen, then it's wrong. If any such use gets added down the road, then it's latently wrong, which is bad enough imo, and hence warrants the change even without analyzing whether there's already an affected path. >> Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs") >> Signed-off-by: Jan Beulich > >> --- a/arch/x86/entry/entry_32.S >> +++ b/arch/x86/entry/entry_32.S >> @@ -48,6 +48,13 @@ >> >> #include "calling.h" >> >> +/* >> + * When running on Xen PV, the actual %cs register's RPL in the kernel is 1, >> + * not 0. If we need to distinguish between a %cs from kernel mode and a %cs >> + * from user mode, we can do test $2 instead of test $3. >> + */ >> +#define USER_SEGMENT_RPL_MASK 2 > > No. The define want's to be right next to the SEGMENT_RPL_MASK define > including a less ASM focussed comment like this: > > /* > * When running on Xen PV, the actual priviledge level of the kernel is 1, > * not 0. Testing the Requested Priviledge Level in a segment selector to > * determine whether the context is user mode or kernel mode with > * SEGMENT_RPL_MASK is wrong because the PV kernels priviledge level > * matches the 0x03 mask. > * > * Testing with USER_SEGMENT_RPL_MASK is valid for both native and Xen PV > * kernels because Priviledge Level 2 is never used. > */ > > Hmm? I simply used almost exactly what Andy had suggested as a comment. He also didn't object to the definition sitting here (it's not needed after all outside of this file). Can the two of you please reach agreement, for me to act upon? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Commit moratorium for getting a push
Committers, Please don't push any new patch to staging because we want to have a push to master for 4.13-RC2. Another email will be sent once the moratorium is lifted. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1 03/10] KVM: Prepare kvm_is_reserved_pfn() for PG_reserved changes
On Thu, Oct 24, 2019 at 5:11 AM David Hildenbrand wrote: > > Right now, ZONE_DEVICE memory is always set PG_reserved. We want to > change that. > > KVM has this weird use case that you can map anything from /dev/mem > into the guest. pfn_valid() is not a reliable check whether the memmap > was initialized and can be touched. pfn_to_online_page() makes sure > that we have an initialized memmap (and don't have ZONE_DEVICE memory). > > Rewrite kvm_is_reserved_pfn() to make sure the function produces the > same result once we stop setting ZONE_DEVICE pages PG_reserved. > > Cc: Paolo Bonzini > Cc: "Radim Krčmář" > Cc: Michal Hocko > Cc: Dan Williams > Cc: KarimAllah Ahmed > Signed-off-by: David Hildenbrand > --- > virt/kvm/kvm_main.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index e9eb666eb6e8..9d18cc67d124 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -151,9 +151,15 @@ __weak int kvm_arch_mmu_notifier_invalidate_range(struct > kvm *kvm, > > bool kvm_is_reserved_pfn(kvm_pfn_t pfn) > { > - if (pfn_valid(pfn)) > - return PageReserved(pfn_to_page(pfn)); > + struct page *page = pfn_to_online_page(pfn); > > + /* > +* We treat any pages that are not online (not managed by the buddy) > +* as reserved - this includes ZONE_DEVICE pages and pages without > +* a memmap (e.g., mapped via /dev/mem). > +*/ > + if (page) > + return PageReserved(page); > return true; > } So after this all the pfn_valid() usage in kvm_main.c is replaced with pfn_to_online_page()? Looks correct to me. However, I'm worried that kvm is taking reference on ZONE_DEVICE pages through some other path resulting in this: https://lore.kernel.org/linux-nvdimm/20190919154708.ga24...@angband.pl/ I'll see if this patch set modulates or maintains that failure mode. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 09/15] xen/gntdev: use mmu_range_notifier_insert
On Mon, Nov 04, 2019 at 05:03:31PM -0500, Boris Ostrovsky wrote: > On 10/28/19 4:10 PM, Jason Gunthorpe wrote: > > @@ -445,17 +438,9 @@ static void gntdev_vma_close(struct vm_area_struct > > *vma) > > struct gntdev_priv *priv = file->private_data; > > > > pr_debug("gntdev_vma_close %p\n", vma); > > - if (use_ptemod) { > > - /* It is possible that an mmu notifier could be running > > -* concurrently, so take priv->lock to ensure that the vma won't > > -* vanishing during the unmap_grant_pages call, since we will > > -* spin here until that completes. Such a concurrent call will > > -* not do any unmapping, since that has been done prior to > > -* closing the vma, but it may still iterate the unmap_ops list. > > -*/ > > - mutex_lock(&priv->lock); > > + if (use_ptemod && map->vma == vma) { > > > Is it possible for map->vma not to be equal to vma? It could be NULL at least if use_ptemod is not set. Otherwise, I'm not sure, the confusing bit is that the map comes from here: map = gntdev_find_map_index(priv, index, count); It looks like the intent is that the map->vma is always set to the only vma that has the map as private_data. So, I suppose it can be relaxed to a null test and a WARN_ON that it hasn't changed? Jason ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1 02/10] KVM: x86/mmu: Prepare kvm_is_mmio_pfn() for PG_reserved changes
On Thu, Oct 24, 2019 at 5:10 AM David Hildenbrand wrote: > > Right now, ZONE_DEVICE memory is always set PG_reserved. We want to > change that. > > KVM has this weird use case that you can map anything from /dev/mem > into the guest. pfn_valid() is not a reliable check whether the memmap > was initialized and can be touched. pfn_to_online_page() makes sure > that we have an initialized memmap (and don't have ZONE_DEVICE memory). > > Rewrite kvm_is_mmio_pfn() to make sure the function produces the > same result once we stop setting ZONE_DEVICE pages PG_reserved. > > Cc: Paolo Bonzini > Cc: "Radim Krčmář" > Cc: Sean Christopherson > Cc: Vitaly Kuznetsov > Cc: Wanpeng Li > Cc: Jim Mattson > Cc: Joerg Roedel > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: Borislav Petkov > Cc: "H. Peter Anvin" > Cc: KarimAllah Ahmed > Cc: Michal Hocko > Cc: Dan Williams > Signed-off-by: David Hildenbrand > --- > arch/x86/kvm/mmu.c | 29 + > 1 file changed, 17 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 24c23c66b226..f03089a336de 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2962,20 +2962,25 @@ static bool mmu_need_write_protect(struct kvm_vcpu > *vcpu, gfn_t gfn, > > static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) > { > + struct page *page = pfn_to_online_page(pfn); > + > + /* > +* ZONE_DEVICE pages are never online. Online pages that are reserved > +* either indicate the zero page or MMIO pages. > +*/ > + if (page) > + return !is_zero_pfn(pfn) && PageReserved(pfn_to_page(pfn)); > + > + /* > +* Anything with a valid (but not online) memmap could be ZONE_DEVICE. > +* Treat only UC/UC-/WC pages as MMIO. You might clarify that ZONE_DEVICE pages that belong to WB cacheable pmem have the correct memtype established by devm_memremap_pages(). Other than that, feel free to add: Reviewed-by: Dan Williams ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [libvirt test] 143589: regressions - FAIL
flight 143589 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/143589/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-libvirt 12 guest-start fail REGR. vs. 143023 test-amd64-i386-libvirt 12 guest-start fail REGR. vs. 143023 test-amd64-i386-libvirt-xsm 12 guest-start fail REGR. vs. 143023 test-amd64-amd64-libvirt-xsm 12 guest-start fail REGR. vs. 143023 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail REGR. vs. 143023 test-amd64-amd64-libvirt-pair 21 guest-start/debian fail REGR. vs. 143023 test-amd64-i386-libvirt-pair 21 guest-start/debian fail REGR. vs. 143023 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 debian-hvm-install fail REGR. vs. 143023 test-arm64-arm64-libvirt-xsm 12 guest-start fail REGR. vs. 143023 test-arm64-arm64-libvirt 12 guest-start fail REGR. vs. 143023 test-arm64-arm64-libvirt-qcow2 10 debian-di-install fail REGR. vs. 143023 test-amd64-amd64-libvirt-vhd 10 debian-di-installfail REGR. vs. 143023 test-armhf-armhf-libvirt 12 guest-start fail REGR. vs. 143023 test-armhf-armhf-libvirt-raw 10 debian-di-installfail REGR. vs. 143023 version targeted for testing: libvirt 73f91d659b07df8ab267fed1ea4949245a7b57af baseline version: libvirt 2cff65e4c60ed7b3c0c6a97d526d1f8d52c0e919 Last test of basis 143023 2019-10-22 04:19:26 Z 13 days Failing since143051 2019-10-23 04:18:57 Z 12 days 10 attempts Testing same since 143589 2019-11-02 16:15:11 Z1 days1 attempts People who touched revisions under test: Andrea Bolognani Eric Blake Jim Fehlig Ján Tomko Maya Rashish Michal Privoznik Pavel Hrdina Peter Krempa jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm fail test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmfail test-amd64-amd64-libvirt-xsm fail test-arm64-arm64-libvirt-xsm fail test-amd64-i386-libvirt-xsm fail test-amd64-amd64-libvirt fail test-arm64-arm64-libvirt fail test-armhf-armhf-libvirt fail test-amd64-i386-libvirt fail test-amd64-amd64-libvirt-pairfail test-amd64-i386-libvirt-pair fail test-arm64-arm64-libvirt-qcow2 fail test-armhf-armhf-libvirt-raw fail test-amd64-amd64-libvirt-vhd fail sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 1212 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1 01/10] mm/memory_hotplug: Don't allow to online/offline memory blocks with holes
On Thu, Oct 24, 2019 at 5:10 AM David Hildenbrand wrote: > > Our onlining/offlining code is unnecessarily complicated. Only memory > blocks added during boot can have holes (a range that is not > IORESOURCE_SYSTEM_RAM). Hotplugged memory never has holes (e.g., see > add_memory_resource()). All boot memory is alread online. s/alread/already/ ...also perhaps clarify "already online" by what point in time and why that is relevant. For example a description of the difference between the SetPageReserved() in the bootmem path and the one in the hotplug path. > Therefore, when we stop allowing to offline memory blocks with holes, we > implicitly no longer have to deal with onlining memory blocks with holes. Maybe an explicit reference of the code areas that deal with holes would help to back up that assertion. Certainly it would have saved me some time for the review. > This allows to simplify the code. For example, we no longer have to > worry about marking pages that fall into memory holes PG_reserved when > onlining memory. We can stop setting pages PG_reserved. ...but not for bootmem, right? > > Offlining memory blocks added during boot is usually not guranteed to work s/guranteed/guaranteed/ > either way (unmovable data might have easily ended up on that memory during > boot). So stopping to do that should not really hurt (+ people are not > even aware of a setup where that used to work Maybe put a "Link: https://lkml.kernel.org/r/$msg_id"; to that discussion? > and that the existing code > still works correctly with memory holes). For the use case of offlining > memory to unplug DIMMs, we should see no change. (holes on DIMMs would be > weird). However, less memory can be offlined than was theoretically allowed previously, so I don't understand the "we should see no change" comment. I still agree that's a price worth paying to get the code cleanups and if someone screams we can look at adding it back, but the fact that it was already fragile seems decent enough protection. > > Please note that hardware errors (PG_hwpoison) are not memory holes and > not affected by this change when offlining. > > Cc: Andrew Morton > Cc: Michal Hocko > Cc: Oscar Salvador > Cc: Pavel Tatashin > Cc: Dan Williams > Cc: Anshuman Khandual > Signed-off-by: David Hildenbrand > --- > mm/memory_hotplug.c | 26 -- > 1 file changed, 24 insertions(+), 2 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 561371ead39a..8d81730cf036 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1447,10 +1447,19 @@ static void node_states_clear_node(int node, struct > memory_notify *arg) > node_clear_state(node, N_MEMORY); > } > > +static int count_system_ram_pages_cb(unsigned long start_pfn, > +unsigned long nr_pages, void *data) > +{ > + unsigned long *nr_system_ram_pages = data; > + > + *nr_system_ram_pages += nr_pages; > + return 0; > +} > + > static int __ref __offline_pages(unsigned long start_pfn, > unsigned long end_pfn) > { > - unsigned long pfn, nr_pages; > + unsigned long pfn, nr_pages = 0; > unsigned long offlined_pages = 0; > int ret, node, nr_isolate_pageblock; > unsigned long flags; > @@ -1461,6 +1470,20 @@ static int __ref __offline_pages(unsigned long > start_pfn, > > mem_hotplug_begin(); > > + /* > +* Don't allow to offline memory blocks that contain holes. > +* Consecuently, memory blocks with holes can never get onlined s/Consecuently/Consequently/ > +* (hotplugged memory has no holes and all boot memory is online). > +* This allows to simplify the onlining/offlining code quite a lot. > +*/ The last sentence of this comment makes sense in the context of this patch, but I don't think it stands by itself in the code base after the fact. The person reading the comment can't see the simplifications because the code is already gone. I'd clarify it to talk about why it is safe to not mess around with PG_Reserved in the hotplug path because of this check. After those clarifications you can add: Reviewed-by: Dan Williams ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [BUG] Invalid guest state in Xen master, dom0 linux-5.3.6, domU windows 10
On 05/11/2019 00:25, Andrew Cooper wrote: > On 04/11/2019 23:42, Andrew Cooper wrote: >> On 04/11/2019 23:20, Håkon Alstadheim wrote: >>> (XEN) RFLAGS=0x0193 (0x0193) DR7 = 0x0400 >>> >>> (XEN) *** Insn bytes from b8868f61d69a: 44 00 00 8c d0 9c 81 0c 24 >>> 00 01 00 00 9d 8e d0 9c 81 24 24 ff fe ff ff 9d c3 cc cc cc >>> cc cc >> Ok. One question answered, several more WTF. >> >> <.data>: >> 0: 44 00 00 add %r8b,(%rax) >> 3: 8c d0 mov %ss,%eax >> 5: 9c pushfq >> 6: 81 0c 24 00 01 00 00 orl $0x100,(%rsp) >> d: 9d popfq >> e: 8e d0 mov %eax,%ss >> 10: f1 icebp >> 11: 9c pushfq >> 12: 81 24 24 ff fe ff ff andl $0xfeff,(%rsp) >> 19: 9d popfq >> 1a: c3 retq >> 1b: cc int3 >> 1c: cc int3 >> 1d: cc int3 >> 1e: cc int3 >> 1f: cc int3 >> >> >> This is a serious exercising of architectural corner cases, by layering >> a single step exception on top of a MovSS-deferred ICEBP. >> >> Now I've looked closer, this isn't a CVE-2018-8897 exploit as no >> breakpoints are configured in %dr7, so I'm going to revise my guess some >> new debugger-detection in DRM software. > I've reproduced the VMEntry failure you were seeing. Now to figure out > if there is sufficient control available to provide architectural > behaviour to guest, because I'm not entirely certain there is, given > some of ICEBP's extra magic properties. So, this is just another case of an issue I've already tried fixing once and haven't had time to fix in a way which doesn't break other pieces of functionality. I clearly need to dust off that series and get it working properly. In the short term, this will work around your problem. diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index f86af09898..10daaa6f33 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -522,8 +522,7 @@ static inline void hvm_invlpg(struct vcpu *v, unsigned long linear) (X86_CR4_VMXE | X86_CR4_PAE | X86_CR4_MCE)) /* These exceptions must always be intercepted. */ -#define HVM_TRAP_MASK ((1U << TRAP_debug) | \ - (1U << TRAP_alignment_check) | \ +#define HVM_TRAP_MASK ((1U << TRAP_alignment_check) | \ (1U << TRAP_machine_check)) static inline int hvm_cpu_up(void) However, be aware that it will reintroduce http://xenbits.xen.org/xsa/advisory-156.html so isn't recommended for general use. Seeing as this looks to be some DRM software, it isn't likely to mount an attack like that, as it would livelock a native system just as badly as it livelocks a virtualised system. (Sadly, it looks like CVE-2015-8104 is the gift which keeps on giving us new corner cases in VT-x when it comes to the handling of debug exceptions. I've already found several acknowledged by Intel, and one which they are still trying to figure out how to fix.) ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [BUG] Invalid guest state in Xen master, dom0 linux-5.3.6, domU windows 10
On 04/11/2019 23:42, Andrew Cooper wrote: > On 04/11/2019 23:20, Håkon Alstadheim wrote: >> (XEN) RFLAGS=0x0193 (0x0193) DR7 = 0x0400 >> >> (XEN) *** Insn bytes from b8868f61d69a: 44 00 00 8c d0 9c 81 0c 24 >> 00 01 00 00 9d 8e d0 9c 81 24 24 ff fe ff ff 9d c3 cc cc cc >> cc cc > Ok. One question answered, several more WTF. > > <.data>: > 0: 44 00 00 add %r8b,(%rax) > 3: 8c d0 mov %ss,%eax > 5: 9c pushfq > 6: 81 0c 24 00 01 00 00 orl $0x100,(%rsp) > d: 9d popfq > e: 8e d0 mov %eax,%ss > 10: f1 icebp > 11: 9c pushfq > 12: 81 24 24 ff fe ff ff andl $0xfeff,(%rsp) > 19: 9d popfq > 1a: c3 retq > 1b: cc int3 > 1c: cc int3 > 1d: cc int3 > 1e: cc int3 > 1f: cc int3 > > > This is a serious exercising of architectural corner cases, by layering > a single step exception on top of a MovSS-deferred ICEBP. > > Now I've looked closer, this isn't a CVE-2018-8897 exploit as no > breakpoints are configured in %dr7, so I'm going to revise my guess some > new debugger-detection in DRM software. I've reproduced the VMEntry failure you were seeing. Now to figure out if there is sufficient control available to provide architectural behaviour to guest, because I'm not entirely certain there is, given some of ICEBP's extra magic properties. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [BUG] Invalid guest state in Xen master, dom0 linux-5.3.6, domU windows 10
On 04/11/2019 23:20, Håkon Alstadheim wrote: > > (XEN) RFLAGS=0x0193 (0x0193) DR7 = 0x0400 > > (XEN) *** Insn bytes from b8868f61d69a: 44 00 00 8c d0 9c 81 0c 24 > 00 01 00 00 9d 8e d0 9c 81 24 24 ff fe ff ff 9d c3 cc cc cc > cc cc Ok. One question answered, several more WTF. <.data>: 0: 44 00 00 add %r8b,(%rax) 3: 8c d0 mov %ss,%eax 5: 9c pushfq 6: 81 0c 24 00 01 00 00 orl $0x100,(%rsp) d: 9d popfq e: 8e d0 mov %eax,%ss 10: f1 icebp 11: 9c pushfq 12: 81 24 24 ff fe ff ff andl $0xfeff,(%rsp) 19: 9d popfq 1a: c3 retq 1b: cc int3 1c: cc int3 1d: cc int3 1e: cc int3 1f: cc int3 This is a serious exercising of architectural corner cases, by layering a single step exception on top of a MovSS-deferred ICEBP. Now I've looked closer, this isn't a CVE-2018-8897 exploit as no breakpoints are configured in %dr7, so I'm going to revise my guess some new debugger-detection in DRM software. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-4.11-testing test] 143586: regressions - FAIL
flight 143586 xen-4.11-testing real [real] http://logs.test-lab.xenproject.org/osstest/logs/143586/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qcow2 19 guest-start/debian.repeat fail REGR. vs. 143158 test-amd64-amd64-libvirt-vhd 17 guest-start/debian.repeat fail REGR. vs. 143158 test-amd64-i386-xl-raw 19 guest-start/debian.repeat fail REGR. vs. 143158 test-armhf-armhf-xl-vhd 15 guest-start/debian.repeat fail REGR. vs. 143158 test-armhf-armhf-libvirt-raw 15 guest-start/debian.repeat fail REGR. vs. 143158 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-libvirt 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-amd64-xl-qemut-win10-i386 10 window
Re: [Xen-devel] [PATCH v2] x86/stackframe/32: repair 32-bit Xen PV
On Tue, 29 Oct 2019, Jan Beulich wrote: > Once again RPL checks have been introduced which don't account for a > 32-bit kernel living in ring 1 when running in a PV Xen domain. > > The case in FIXUP_FRAME has been preventing boot; adjust BUG_IF_WRONG_CR3 > as well just in case. Either it's required and correct or it's not. Just in case is not helpful at all. > Fixes: 3c88c692c287 ("x86/stackframe/32: Provide consistent pt_regs") > Signed-off-by: Jan Beulich > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -48,6 +48,13 @@ > > #include "calling.h" > > +/* > + * When running on Xen PV, the actual %cs register's RPL in the kernel is 1, > + * not 0. If we need to distinguish between a %cs from kernel mode and a %cs > + * from user mode, we can do test $2 instead of test $3. > + */ > +#define USER_SEGMENT_RPL_MASK 2 No. The define want's to be right next to the SEGMENT_RPL_MASK define including a less ASM focussed comment like this: /* * When running on Xen PV, the actual priviledge level of the kernel is 1, * not 0. Testing the Requested Priviledge Level in a segment selector to * determine whether the context is user mode or kernel mode with * SEGMENT_RPL_MASK is wrong because the PV kernels priviledge level * matches the 0x03 mask. * * Testing with USER_SEGMENT_RPL_MASK is valid for both native and Xen PV * kernels because Priviledge Level 2 is never used. */ Hmm? Thanks, tglx ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1 09/10] mm/memory_hotplug: Don't mark pages PG_reserved when initializing the memmap
On 10/24/19 8:09 AM, David Hildenbrand wrote: > diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c > index 4f2e78a5e4db..af69f057913a 100644 > --- a/drivers/xen/balloon.c > +++ b/drivers/xen/balloon.c > @@ -374,6 +374,13 @@ static void xen_online_page(struct page *page, unsigned > int order) > mutex_lock(&balloon_mutex); > for (i = 0; i < size; i++) { > p = pfn_to_page(start_pfn + i); > + /* > + * TODO: The core used to mark the pages reserved. Most probably > + * we can stop doing that now. However, especially > + * alloc_xenballooned_pages() left PG_reserved set > + * on pages that can get mapped to user space. > + */ > + __SetPageReserved(p); I suspect this is not needed. Pages can get into balloon either from here or from non-hotplug path (e.g. decrease_reservation()) and so when we get a page from the balloon we would get a random page that may or may not have Reserved bit set. -boris > balloon_append(p); > } > mutex_unlock(&balloon_mutex); > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 09/15] xen/gntdev: use mmu_range_notifier_insert
On 10/28/19 4:10 PM, Jason Gunthorpe wrote: > @@ -445,17 +438,9 @@ static void gntdev_vma_close(struct vm_area_struct *vma) > struct gntdev_priv *priv = file->private_data; > > pr_debug("gntdev_vma_close %p\n", vma); > - if (use_ptemod) { > - /* It is possible that an mmu notifier could be running > - * concurrently, so take priv->lock to ensure that the vma won't > - * vanishing during the unmap_grant_pages call, since we will > - * spin here until that completes. Such a concurrent call will > - * not do any unmapping, since that has been done prior to > - * closing the vma, but it may still iterate the unmap_ops list. > - */ > - mutex_lock(&priv->lock); > + if (use_ptemod && map->vma == vma) { Is it possible for map->vma not to be equal to vma? -boris > + mmu_range_notifier_remove(&map->notifier); > map->vma = NULL; > - mutex_unlock(&priv->lock); > } > vma->vm_private_data = NULL; > gntdev_put_map(priv, map); > ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 00/15] Consolidate the mmu notifier interval_tree and locking
On Fri, Nov 01, 2019 at 01:54:45PM -0700, Ralph Campbell wrote: > You can add my Tested-by for the mm and nouveau changes. > IOW, patches 1-4, 10-11, and 15. > > Tested-by: Ralph Campbell Got it, thanks Jason ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [BUG] Invalid guest state in Xen master, dom0 linux-5.3.6, domU windows 10
On 04/11/2019 15:40, Andrew Cooper wrote: > On 04/11/2019 15:33, Håkon Alstadheim wrote: >> Den 04.11.2019 14:31, skrev Andrew Cooper: >>> On 03/11/2019 10:23, Håkon Alstadheim wrote: >>> (XEN) [2019-11-02 14:09:46] d2v0 vmentry failure (reason 0x8021): Invalid guest state (0) (XEN) [2019-11-02 14:09:46] * VMCS Area ** (XEN) [2019-11-02 14:09:46] *** Guest State *** (XEN) [2019-11-02 14:09:46] CR0: actual=0x80050031, shadow=0x80050031, gh_mask= (XEN) [2019-11-02 14:09:46] CR4: actual=0x00172678, shadow=0x00170678, gh_mask=ffe8f860 (XEN) [2019-11-02 14:09:46] CR3 = 0x001aa002 (XEN) [2019-11-02 14:09:46] RSP = 0x8c0f4dd71e98 (0x8c0f4dd71e98) RIP = 0xd18a040bb75e (0xd18a040bb75e) (XEN) [2019-11-02 14:09:46] RFLAGS=0x0187 (0x0187) DR7 = 0x0400 (XEN) [2019-11-02 14:09:46] Sysenter RSP= CS:RIP=: (XEN) [2019-11-02 14:09:46] sel attr limit base (XEN) [2019-11-02 14:09:46] CS: 0010 0209b (XEN) [2019-11-02 14:09:46] DS: 002b 0c0f3 (XEN) [2019-11-02 14:09:46] SS: 0018 04093 (XEN) [2019-11-02 14:09:46] ES: 002b 0c0f3 (XEN) [2019-11-02 14:09:46] FS: 0053 040f3 3c00 (XEN) [2019-11-02 14:09:46] GS: 002b 0c0f3 f8044ff8 (XEN) [2019-11-02 14:09:46] GDTR: 0057 f80459c61fb0 (XEN) [2019-11-02 14:09:46] LDTR: 1c000 (XEN) [2019-11-02 14:09:46] IDTR: 012f d18a014a0980 (XEN) [2019-11-02 14:09:46] TR: 0040 0008b 0067 f80459c6 (XEN) [2019-11-02 14:09:46] EFER(VMCS) = 0x0d01 PAT = 0x0007010600070106 (XEN) [2019-11-02 14:09:46] PreemptionTimer = 0x SM Base = 0x (XEN) [2019-11-02 14:09:46] DebugCtl = 0x DebugExceptions = 0x (XEN) [2019-11-02 14:09:46] Interruptibility = 0002 ActivityState = (XEN) [2019-11-02 14:09:46] InterruptStatus = (XEN) [2019-11-02 14:09:46] *** Host State *** (XEN) [2019-11-02 14:09:46] RIP = 0x82d080341950 (vmx_asm_vmexit_handler) RSP = 0x83083ff0ff70 (XEN) [2019-11-02 14:09:46] CS=e008 SS= DS= ES= FS= GS= TR=e040 (XEN) [2019-11-02 14:09:46] FSBase= GSBase= TRBase=83083ff14000 (XEN) [2019-11-02 14:09:46] GDTBase=83083ff03000 IDTBase=83083ff07000 (XEN) [2019-11-02 14:09:46] CR0=80050033 CR3=00054dbea000 CR4=001526e0 (XEN) [2019-11-02 14:09:46] Sysenter RSP=83083ff0ffa0 CS:RIP=e008:82d080395440 (XEN) [2019-11-02 14:09:46] EFER = 0x0d01 PAT = 0x050100070406 (XEN) [2019-11-02 14:09:46] *** Control State *** (XEN) [2019-11-02 14:09:46] PinBased=00bf CPUBased=b62065fa SecondaryExec=17eb (XEN) [2019-11-02 14:09:46] EntryControls=d3ff ExitControls=002fefff (XEN) [2019-11-02 14:09:46] ExceptionBitmap=00060002 PFECmask= PFECmatch= (XEN) [2019-11-02 14:09:46] VMEntry: intr_info=8501 errcode= ilen=0001 (XEN) [2019-11-02 14:09:46] VMExit: intr_info=8501 errcode= ilen=0001 (XEN) [2019-11-02 14:09:46] reason=8021 qualification= (XEN) [2019-11-02 14:09:46] IDTVectoring: info= errcode= (XEN) [2019-11-02 14:09:46] TSC Offset = 0xf45ded46dd57 TSC Multiplier = 0x (XEN) [2019-11-02 14:09:46] TPR Threshold = 0x00 PostedIntrVec = 0xf5 (XEN) [2019-11-02 14:09:46] EPT pointer = 0x00054e3a701e EPTP index = 0x (XEN) [2019-11-02 14:09:46] PLE Gap=0080 Window=1000 (XEN) [2019-11-02 14:09:46] Virtual processor ID = 0x5a02 VMfunc controls = (XEN) [2019-11-02 14:09:46] ** (XEN) [2019-11-02 14:09:46] domain_crash called from vmx.c:3335 (XEN) [2019-11-02 14:09:46] Domain 2 (vcpu#0) crashed on cpu#2: >>> Interruptibility = 0002 (Blocked by Mov SS) and VMEntry: >>> intr_info=8501 (ICEBP) >>> >>> Dare I ask what you're running in your windows guest? Unless it is a >>> vulnerability test suite, I'm rather concerned. >> Because I have pulled out all stops ? Well no particular reason. I've >> asked my kids nicely not to poke any /more/ holes in the security on >> the system. Probably should tighten up the ship. I have some conflict >> going on between the hardware pci USB cards in the machine, so I >> thought I'd see what would happen if I gave ASUS and whatever no-name >> Tai
[Xen-devel] [qemu-mainline test] 143566: regressions - FAIL
flight 143566 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/143566/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-pvshim 18 guest-localmigrate/x10 fail REGR. vs. 142915 test-amd64-amd64-xl-qcow2 19 guest-start/debian.repeat fail REGR. vs. 142915 test-amd64-amd64-libvirt-vhd 17 guest-start/debian.repeat fail REGR. vs. 142915 test-armhf-armhf-xl-credit1 7 xen-boot fail REGR. vs. 142915 test-amd64-i386-xl-raw 19 guest-start/debian.repeat fail REGR. vs. 142915 test-armhf-armhf-libvirt-raw 15 guest-start/debian.repeat fail REGR. vs. 142915 test-armhf-armhf-xl-vhd 15 guest-start/debian.repeat fail REGR. vs. 142915 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 16 guest-localmigrate fail REGR. vs. 142915 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat fail blocked in 142915 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 142915 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 142915 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 142915 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 142915 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 142915 test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 13 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass version targeted for testing: qemuub7c9a7f353c0e260519bf735ff0d4aa01e72784b baseline version: qemuue9d42461920f6f40f4d847a5ba18e90d095ed0b9
[Xen-devel] [OSSTEST PATCH] adhoc-revtuple-generator: Bisect over 5000 commits (really)
In e9b0653875b3 we changed one of the `1000' values to `5000'. But this magic number had been duplicated. Urgh! The result is that adhoc-revtuple-generator might generate a weirdly truncated output which causes cs-bisection-stop to fail with messages like this: *** not RelvUp at 3d40147282670d597b336be5599b5cc4c2ff7ddd at ./cs-bisection-step line 554. *** not RelvDown at 2fa3479cfadb0bb3fe694dbfd29f2350eb2570df at ./cs-bisection-step line 554. *** not RelvUp at 2fa3479cfadb0bb3fe694dbfd29f2350eb2570df at ./cs-bisection-step line 554. ... Use of uninitialized value in concatenation (.) or string at ./cs-bisection-step line 747. Should test . BROKEN see earlier errors. at ./cs-bisection-step line 1454, line 10089. Fix this by (i) plumbing the magic value we already edited properly back to the (command-line controlled) global variable (ii) changing the global variable from 1000 to 5000. git-grep '\b1000\b' still produces a fair amount of output but most of it is timeouts, which is fair enough. There is also a flight count limit in sg-report-flight, which limits how far back it is willing to look. We don't want to change that here. With this change, cs-bisection-step on the currently-failing freebsd build job does this: Searching for interesting versions Result found: flight 141420 (pass), for basis pass Result found: flight 143397 (fail), for basis failure Need to reproduce basis pass (pass); had 1 already. Should test 2fa3479cfadb0bb3fe694dbfd29f2350eb2570df. This looks plausible: it is picking up where it left off before the basis pass fell over its horizon. CC: Roger Pau Monné CC: Jürgen Groß Signed-off-by: Ian Jackson --- adhoc-revtuple-generator | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/adhoc-revtuple-generator b/adhoc-revtuple-generator index ac0f2463..c8d6f4ad 100755 --- a/adhoc-revtuple-generator +++ b/adhoc-revtuple-generator @@ -28,7 +28,7 @@ use Osstest; use Osstest::TestSupport; use Osstest::Executive; -our $num= 1000; +our $num= 5000; our $doupdate= 1; our $showrev= 0; @@ -553,7 +553,7 @@ sub main () { my @trees_continuous; foreach my $tree (@trees) { my $gen= tree_get_gen($tree); -my $count= 5000; +my $count= $num; my $found= 0; my $top= undef; while ($count-- > 0) { -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/vcpu: Sanitise VCPUOP_initialise call hierachy
On 31.10.2019 20:28, Andrew Cooper wrote: > This code is especially tangled. VCPUOP_initialise calls into > arch_initialise_vcpu() which calls back into default_initialise_vcpu() which > is common code. > > This path is actually dead code on ARM, because VCPUOP_initialise is filtered > out by do_arm_vcpu_op(). > > The only valid way to start a secondary CPU on ARM is via the PSCI interface. > The same could in principle be said about INIT-SIPI-SIPI for x86 HVM, if HVM > guests hadn't already interited a paravirt way of starting CPUs. > > Either way, it is quite likely that no future architectures implemented in Xen > are going to want to use a PV interface, as some standardised (v)CPU bringup > mechanism will already exist. > > Arrange the code in do_vcpu_op() to allow arch_initialise_vcpu() to be > optional. Opt in for x86, and opt out for ARM. > > Deleting ARM's arch_initialise_vcpu() allows for default_initialise_vcpu() to > be folded into its (now) sole x86 caller, which reduces the compiled code > volume in all builds. > > No functional change. > > Signed-off-by: Andrew Cooper I can see the merits of this, but I can also understand Julien's reservations. Hence I guess whether to ack this will depend on the discussion with him getting settled. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/events: remove event handling recursion detection
On 04.11.19 16:19, Jan Beulich wrote: On 04.11.2019 16:09, Jürgen Groß wrote: On 04.11.19 15:35, Jan Beulich wrote: On 04.11.2019 14:58, Juergen Gross wrote: __xen_evtchn_do_upcall() contains guards against being called recursively. This mechanism was introduced in the early pvops times (kernel 2.6.26) when there were still Xen versions around not honoring disabled interrupts for sending events to pv guests. This was changed in Xen 3.0, which is much older than any Xen version supported by the kernel, so the recursion detection can be removed. Would you mind pointing out which exact change(s) this was(were)? Linux kernel: 229664bee6126e01f8662976a5fe2e79813b77c8 Xen: d8263e8dbaf5ef1445bee0662143a0fcb6d43466 Are you sure about the latter, touching only header files underneath xen/, and there mostly public interface ones? No, you are right, this was a false interpretation of mine. It had always been my understanding that the recursion detection was mainly to guard against drivers re-enabling interrupts transiently in their handlers (which in turn may no longer be an issue in modern Linux kernels). This would have been doable with a simple bool. The more complex xchg based logic was IMO for recursion detection at any point. Well, the respective XenoLinux c/s 13098 has no mention of this, i.e. it simply leaves open what the actual reason was: "[LINUX] Disallow nested event delivery. This eliminates the risk of overflowing the kernel stack and is a reasonable policy given that we have no concept of priorities among event sources." For XenoLinux it makes at least a little bit sense, as interrupts were enabled during calls of some handlers AFAIK. The complexity is rather strange, though, as the bool would have been much easier to understand. I'll adapt the commit message. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot
On 04.11.2019 16:31, Andrew Cooper wrote: > On 04/11/2019 13:32, Jan Beulich wrote: >> On 01.11.2019 21:25, Andrew Cooper wrote: >>> --- a/xen/arch/x86/boot/head.S >>> +++ b/xen/arch/x86/boot/head.S >>> @@ -630,6 +630,10 @@ trampoline_setup: >>> >>> 1: >>> /* Interrogate CPU extended features via CPUID. */ >>> +mov $1, %eax >>> +cpuid >>> +mov %ecx, sym_fs(boot_cpu_data) + >>> CPUINFO_FEATURE_OFFSET(X86_FEATURE_HYPERVISOR) >> I understand the ECX output is all we need right now. But wouldn't >> it be better to then store EDX as well (and similarly ECX of >> 0x8001 output)? > > No - I don't think so. > > I did debate applying an and mask for the features we only intend to be > usable this early, to avoid getting buggy code which checks for > unexpected features this early. Indeed doing so would seem a good reason to not also store the EDX value here. >> Also, along the lines of a question back to Sergey on his >> standalone patch, wouldn't it be better to take the opportunity >> and check here that CPUID leaf 1 is actually valid? > > There is no such thing as a 64-bit capable CPU without leaf 1. About anything can be constructed under a hypervisor. But well, I guess I'll stop mumbling on this aspect. >> Of course the question on the (intended) effect of >> "cpuid=no-hypervisor" also remains. As said before, I think this >> should be honored by all of our code that possibly can (i.e. at >> least everything following command line parsing). > > There is no chance of making that work in a sane way - we use > cpu_has_hypervisor for quite a few things before the command line gets > parsed. You said so the other day, but iirc when checking I wasn't able to identify any such case, let alone "quite a few". Anyway, feel free to add Acked-by: Jan Beulich Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [BUG] Invalid guest state in Xen master, dom0 linux-5.3.6, domU windows 10
On 04/11/2019 15:33, Håkon Alstadheim wrote: > > Den 04.11.2019 14:31, skrev Andrew Cooper: >> On 03/11/2019 10:23, Håkon Alstadheim wrote: >> >>> (XEN) [2019-11-02 14:09:46] d2v0 vmentry failure (reason 0x8021): >>> Invalid guest state (0) >>> (XEN) [2019-11-02 14:09:46] * VMCS Area ** >>> (XEN) [2019-11-02 14:09:46] *** Guest State *** >>> (XEN) [2019-11-02 14:09:46] CR0: actual=0x80050031, >>> shadow=0x80050031, gh_mask= >>> (XEN) [2019-11-02 14:09:46] CR4: actual=0x00172678, >>> shadow=0x00170678, gh_mask=ffe8f860 >>> (XEN) [2019-11-02 14:09:46] CR3 = 0x001aa002 >>> (XEN) [2019-11-02 14:09:46] RSP = 0x8c0f4dd71e98 >>> (0x8c0f4dd71e98) RIP = 0xd18a040bb75e (0xd18a040bb75e) >>> (XEN) [2019-11-02 14:09:46] RFLAGS=0x0187 (0x0187) DR7 = >>> 0x0400 >>> (XEN) [2019-11-02 14:09:46] Sysenter RSP= >>> CS:RIP=: >>> (XEN) [2019-11-02 14:09:46] sel attr limit base >>> (XEN) [2019-11-02 14:09:46] CS: 0010 0209b >>> (XEN) [2019-11-02 14:09:46] DS: 002b 0c0f3 >>> (XEN) [2019-11-02 14:09:46] SS: 0018 04093 >>> (XEN) [2019-11-02 14:09:46] ES: 002b 0c0f3 >>> (XEN) [2019-11-02 14:09:46] FS: 0053 040f3 3c00 >>> (XEN) [2019-11-02 14:09:46] GS: 002b 0c0f3 f8044ff8 >>> (XEN) [2019-11-02 14:09:46] GDTR: 0057 f80459c61fb0 >>> (XEN) [2019-11-02 14:09:46] LDTR: 1c000 >>> (XEN) [2019-11-02 14:09:46] IDTR: 012f d18a014a0980 >>> (XEN) [2019-11-02 14:09:46] TR: 0040 0008b 0067 f80459c6 >>> (XEN) [2019-11-02 14:09:46] EFER(VMCS) = 0x0d01 PAT = >>> 0x0007010600070106 >>> (XEN) [2019-11-02 14:09:46] PreemptionTimer = 0x SM Base = >>> 0x >>> (XEN) [2019-11-02 14:09:46] DebugCtl = 0x >>> DebugExceptions = 0x >>> (XEN) [2019-11-02 14:09:46] Interruptibility = 0002 ActivityState >>> = >>> (XEN) [2019-11-02 14:09:46] InterruptStatus = >>> (XEN) [2019-11-02 14:09:46] *** Host State *** >>> (XEN) [2019-11-02 14:09:46] RIP = 0x82d080341950 >>> (vmx_asm_vmexit_handler) RSP = 0x83083ff0ff70 >>> (XEN) [2019-11-02 14:09:46] CS=e008 SS= DS= ES= FS= >>> GS= TR=e040 >>> (XEN) [2019-11-02 14:09:46] FSBase= >>> GSBase= TRBase=83083ff14000 >>> (XEN) [2019-11-02 14:09:46] GDTBase=83083ff03000 >>> IDTBase=83083ff07000 >>> (XEN) [2019-11-02 14:09:46] CR0=80050033 CR3=00054dbea000 >>> CR4=001526e0 >>> (XEN) [2019-11-02 14:09:46] Sysenter RSP=83083ff0ffa0 >>> CS:RIP=e008:82d080395440 >>> (XEN) [2019-11-02 14:09:46] EFER = 0x0d01 PAT = >>> 0x050100070406 >>> (XEN) [2019-11-02 14:09:46] *** Control State *** >>> (XEN) [2019-11-02 14:09:46] PinBased=00bf CPUBased=b62065fa >>> SecondaryExec=17eb >>> (XEN) [2019-11-02 14:09:46] EntryControls=d3ff >>> ExitControls=002fefff >>> (XEN) [2019-11-02 14:09:46] ExceptionBitmap=00060002 PFECmask= >>> PFECmatch= >>> (XEN) [2019-11-02 14:09:46] VMEntry: intr_info=8501 >>> errcode= ilen=0001 >>> (XEN) [2019-11-02 14:09:46] VMExit: intr_info=8501 >>> errcode= ilen=0001 >>> (XEN) [2019-11-02 14:09:46] reason=8021 >>> qualification= >>> (XEN) [2019-11-02 14:09:46] IDTVectoring: info= >>> errcode= >>> (XEN) [2019-11-02 14:09:46] TSC Offset = 0xf45ded46dd57 TSC >>> Multiplier = 0x >>> (XEN) [2019-11-02 14:09:46] TPR Threshold = 0x00 PostedIntrVec = 0xf5 >>> (XEN) [2019-11-02 14:09:46] EPT pointer = 0x00054e3a701e EPTP >>> index = 0x >>> (XEN) [2019-11-02 14:09:46] PLE Gap=0080 Window=1000 >>> (XEN) [2019-11-02 14:09:46] Virtual processor ID = 0x5a02 VMfunc >>> controls = >>> (XEN) [2019-11-02 14:09:46] ** >>> (XEN) [2019-11-02 14:09:46] domain_crash called from vmx.c:3335 >>> (XEN) [2019-11-02 14:09:46] Domain 2 (vcpu#0) crashed on cpu#2: >> Interruptibility = 0002 (Blocked by Mov SS) and VMEntry: >> intr_info=8501 (ICEBP) >> >> Dare I ask what you're running in your windows guest? Unless it is a >> vulnerability test suite, I'm rather concerned. > > Because I have pulled out all stops ? Well no particular reason. I've > asked my kids nicely not to poke any /more/ holes in the security on > the system. Probably should tighten up the ship. I have some conflict > going on between the hardware pci USB cards in the machine, so I > thought I'd see what would happen if I gave ASUS and whatever no-name > Taiwanese I have in there free rein. Nothing gained as far as I can > see, so I'll see about closing some of the more gaping holes. A
Re: [Xen-devel] [BUG] Invalid guest state in Xen master, dom0 linux-5.3.6, domU windows 10
Den 04.11.2019 14:31, skrev Andrew Cooper: On 03/11/2019 10:23, Håkon Alstadheim wrote: (XEN) [2019-11-02 14:09:46] d2v0 vmentry failure (reason 0x8021): Invalid guest state (0) (XEN) [2019-11-02 14:09:46] * VMCS Area ** (XEN) [2019-11-02 14:09:46] *** Guest State *** (XEN) [2019-11-02 14:09:46] CR0: actual=0x80050031, shadow=0x80050031, gh_mask= (XEN) [2019-11-02 14:09:46] CR4: actual=0x00172678, shadow=0x00170678, gh_mask=ffe8f860 (XEN) [2019-11-02 14:09:46] CR3 = 0x001aa002 (XEN) [2019-11-02 14:09:46] RSP = 0x8c0f4dd71e98 (0x8c0f4dd71e98) RIP = 0xd18a040bb75e (0xd18a040bb75e) (XEN) [2019-11-02 14:09:46] RFLAGS=0x0187 (0x0187) DR7 = 0x0400 (XEN) [2019-11-02 14:09:46] Sysenter RSP= CS:RIP=: (XEN) [2019-11-02 14:09:46] sel attr limit base (XEN) [2019-11-02 14:09:46] CS: 0010 0209b (XEN) [2019-11-02 14:09:46] DS: 002b 0c0f3 (XEN) [2019-11-02 14:09:46] SS: 0018 04093 (XEN) [2019-11-02 14:09:46] ES: 002b 0c0f3 (XEN) [2019-11-02 14:09:46] FS: 0053 040f3 3c00 (XEN) [2019-11-02 14:09:46] GS: 002b 0c0f3 f8044ff8 (XEN) [2019-11-02 14:09:46] GDTR: 0057 f80459c61fb0 (XEN) [2019-11-02 14:09:46] LDTR: 1c000 (XEN) [2019-11-02 14:09:46] IDTR: 012f d18a014a0980 (XEN) [2019-11-02 14:09:46] TR: 0040 0008b 0067 f80459c6 (XEN) [2019-11-02 14:09:46] EFER(VMCS) = 0x0d01 PAT = 0x0007010600070106 (XEN) [2019-11-02 14:09:46] PreemptionTimer = 0x SM Base = 0x (XEN) [2019-11-02 14:09:46] DebugCtl = 0x DebugExceptions = 0x (XEN) [2019-11-02 14:09:46] Interruptibility = 0002 ActivityState = (XEN) [2019-11-02 14:09:46] InterruptStatus = (XEN) [2019-11-02 14:09:46] *** Host State *** (XEN) [2019-11-02 14:09:46] RIP = 0x82d080341950 (vmx_asm_vmexit_handler) RSP = 0x83083ff0ff70 (XEN) [2019-11-02 14:09:46] CS=e008 SS= DS= ES= FS= GS= TR=e040 (XEN) [2019-11-02 14:09:46] FSBase= GSBase= TRBase=83083ff14000 (XEN) [2019-11-02 14:09:46] GDTBase=83083ff03000 IDTBase=83083ff07000 (XEN) [2019-11-02 14:09:46] CR0=80050033 CR3=00054dbea000 CR4=001526e0 (XEN) [2019-11-02 14:09:46] Sysenter RSP=83083ff0ffa0 CS:RIP=e008:82d080395440 (XEN) [2019-11-02 14:09:46] EFER = 0x0d01 PAT = 0x050100070406 (XEN) [2019-11-02 14:09:46] *** Control State *** (XEN) [2019-11-02 14:09:46] PinBased=00bf CPUBased=b62065fa SecondaryExec=17eb (XEN) [2019-11-02 14:09:46] EntryControls=d3ff ExitControls=002fefff (XEN) [2019-11-02 14:09:46] ExceptionBitmap=00060002 PFECmask= PFECmatch= (XEN) [2019-11-02 14:09:46] VMEntry: intr_info=8501 errcode= ilen=0001 (XEN) [2019-11-02 14:09:46] VMExit: intr_info=8501 errcode= ilen=0001 (XEN) [2019-11-02 14:09:46] reason=8021 qualification= (XEN) [2019-11-02 14:09:46] IDTVectoring: info= errcode= (XEN) [2019-11-02 14:09:46] TSC Offset = 0xf45ded46dd57 TSC Multiplier = 0x (XEN) [2019-11-02 14:09:46] TPR Threshold = 0x00 PostedIntrVec = 0xf5 (XEN) [2019-11-02 14:09:46] EPT pointer = 0x00054e3a701e EPTP index = 0x (XEN) [2019-11-02 14:09:46] PLE Gap=0080 Window=1000 (XEN) [2019-11-02 14:09:46] Virtual processor ID = 0x5a02 VMfunc controls = (XEN) [2019-11-02 14:09:46] ** (XEN) [2019-11-02 14:09:46] domain_crash called from vmx.c:3335 (XEN) [2019-11-02 14:09:46] Domain 2 (vcpu#0) crashed on cpu#2: Interruptibility = 0002 (Blocked by Mov SS) and VMEntry: intr_info=8501 (ICEBP) Dare I ask what you're running in your windows guest? Unless it is a vulnerability test suite, I'm rather concerned. Because I have pulled out all stops ? Well no particular reason. I've asked my kids nicely not to poke any /more/ holes in the security on the system. Probably should tighten up the ship. I have some conflict going on between the hardware pci USB cards in the machine, so I thought I'd see what would happen if I gave ASUS and whatever no-name Taiwanese I have in there free rein. Nothing gained as far as I can see, so I'll see about closing some of the more gaping holes. At least as far as getting rid of deprecation warnings go :-) . I hope "they" never get serious about requiring a license to own a computer with Internet access. :-) ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
On 04.11.2019 16:22, Andrew Cooper wrote: > On 04/11/2019 15:03, Jan Beulich wrote: >> On 04.11.2019 15:59, Andrew Cooper wrote: >>> On 04/11/2019 13:25, Jan Beulich wrote: On 01.11.2019 21:25, Andrew Cooper wrote: > --- a/xen/arch/x86/cpu/intel.c > +++ b/xen/arch/x86/cpu/intel.c > @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) > if (disable) { > wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); > bootsym(trampoline_misc_enable_off) |= disable; > + bootsym(trampoline_efer) |= EFER_NX; > } I'm fine with all other changes here, just this one concerns me: Before your change we latch a value into trampoline_misc_enable_off just to be used for subsequent IA32_MISC_ENABLE writes we do. This means that, on a hypervisor (like Xen itself) simply discarding guest writes to the MSR (which is "fine" with the described usage of ours up to now), with your change we'd now end up trying to set EFER.NX when the feature may not actually be enabled in IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? I.e. don't we need to read back the MSR value here, and verify we actually managed to clear the bit before actually OR-ing in EFER_NX? >>> If this is a problem in practice, execution won't continue beyond the >>> next if() condition just out of context, which set EFER.NX on the BSP >>> with an unguarded WRMSR. >> And how is this any good? This kind of crash is exactly what I'm >> asking to avoid. > > What is the point of working around a theoretical edge case of broken > nesting under Xen which demonstrably doesn't exist in practice? Well, so far nothing was said about this not being an actual problem. I simply don't know whether hardware would refuse such an EFER write. If it does, it would be appropriate for hypervisors to also refuse it. I.e. if we don't do so right now, correcting the behavior would trip the code here. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [XEN PATCH for-4.13] libxl: Fix setting vncpasswd to empty string
Before 93dcc22, error from setting the vnc password to an empty string, when QEMU wasn't expected a password, never prevented the creation of a guest, and only logged an error message. Reported-by: Roger Pau Monné Fixes: 93dcc22fe798c9fa5ce117f1ed6db0d8bd779020 Signed-off-by: Anthony PERARD --- tools/libxl/libxl_dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 7e52f0973172..8e0fb78bd2f3 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -2936,7 +2936,7 @@ static void device_model_postconfig_vnc(libxl__egc *egc, if (rc) goto out; } -if (vnc && vnc->passwd) { +if (vnc && vnc->passwd && vnc->passwd[0]) { qmp->callback = device_model_postconfig_vnc_passwd; libxl__qmp_param_add_string(gc, &args, "password", vnc->passwd); rc = libxl__ev_qmp_send(gc, qmp, "change-vnc-password", args); -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot
On 04/11/2019 13:32, Jan Beulich wrote: > On 01.11.2019 21:25, Andrew Cooper wrote: >> --- a/xen/arch/x86/boot/head.S >> +++ b/xen/arch/x86/boot/head.S >> @@ -630,6 +630,10 @@ trampoline_setup: >> >> 1: >> /* Interrogate CPU extended features via CPUID. */ >> +mov $1, %eax >> +cpuid >> +mov %ecx, sym_fs(boot_cpu_data) + >> CPUINFO_FEATURE_OFFSET(X86_FEATURE_HYPERVISOR) > I understand the ECX output is all we need right now. But wouldn't > it be better to then store EDX as well (and similarly ECX of > 0x8001 output)? No - I don't think so. I did debate applying an and mask for the features we only intend to be usable this early, to avoid getting buggy code which checks for unexpected features this early. > Also, along the lines of a question back to Sergey on his > standalone patch, wouldn't it be better to take the opportunity > and check here that CPUID leaf 1 is actually valid? There is no such thing as a 64-bit capable CPU without leaf 1. > Of course the question on the (intended) effect of > "cpuid=no-hypervisor" also remains. As said before, I think this > should be honored by all of our code that possibly can (i.e. at > least everything following command line parsing). There is no chance of making that work in a sane way - we use cpu_has_hypervisor for quite a few things before the command line gets parsed. If you're booting under a hypervisor, your hypervisor ought to have a way to turn that off. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
On 04/11/2019 15:03, Jan Beulich wrote: > On 04.11.2019 15:59, Andrew Cooper wrote: >> On 04/11/2019 13:25, Jan Beulich wrote: >>> On 01.11.2019 21:25, Andrew Cooper wrote: --- a/xen/arch/x86/cpu/intel.c +++ b/xen/arch/x86/cpu/intel.c @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) if (disable) { wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); bootsym(trampoline_misc_enable_off) |= disable; + bootsym(trampoline_efer) |= EFER_NX; } >>> I'm fine with all other changes here, just this one concerns me: >>> Before your change we latch a value into trampoline_misc_enable_off >>> just to be used for subsequent IA32_MISC_ENABLE writes we do. This >>> means that, on a hypervisor (like Xen itself) simply discarding >>> guest writes to the MSR (which is "fine" with the described usage >>> of ours up to now), with your change we'd now end up trying to set >>> EFER.NX when the feature may not actually be enabled in >>> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? >>> I.e. don't we need to read back the MSR value here, and verify >>> we actually managed to clear the bit before actually OR-ing in >>> EFER_NX? >> If this is a problem in practice, execution won't continue beyond the >> next if() condition just out of context, which set EFER.NX on the BSP >> with an unguarded WRMSR. > And how is this any good? This kind of crash is exactly what I'm > asking to avoid. What is the point of working around a theoretical edge case of broken nesting under Xen which demonstrably doesn't exist in practice? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/events: remove event handling recursion detection
On 04.11.2019 16:09, Jürgen Groß wrote: > On 04.11.19 15:35, Jan Beulich wrote: >> On 04.11.2019 14:58, Juergen Gross wrote: >>> __xen_evtchn_do_upcall() contains guards against being called >>> recursively. This mechanism was introduced in the early pvops times >>> (kernel 2.6.26) when there were still Xen versions around not honoring >>> disabled interrupts for sending events to pv guests. >>> >>> This was changed in Xen 3.0, which is much older than any Xen version >>> supported by the kernel, so the recursion detection can be removed. >> >> Would you mind pointing out which exact change(s) this was(were)? > > Linux kernel: 229664bee6126e01f8662976a5fe2e79813b77c8 > Xen: d8263e8dbaf5ef1445bee0662143a0fcb6d43466 Are you sure about the latter, touching only header files underneath xen/, and there mostly public interface ones? >> It had always been my understanding that the recursion detection >> was mainly to guard against drivers re-enabling interrupts >> transiently in their handlers (which in turn may no longer be an >> issue in modern Linux kernels). > > This would have been doable with a simple bool. The more complex > xchg based logic was IMO for recursion detection at any point. Well, the respective XenoLinux c/s 13098 has no mention of this, i.e. it simply leaves open what the actual reason was: "[LINUX] Disallow nested event delivery. This eliminates the risk of overflowing the kernel stack and is a reasonable policy given that we have no concept of priorities among event sources." Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4 2/3] x86/boot: Introduce the kernel_info.setup_type_max
On Fri, Nov 01, 2019 at 01:55:57PM -0700, H. Peter Anvin wrote: > On 2019-10-24 04:48, Daniel Kiper wrote: > > This field contains maximal allowed type for setup_data. > > > > Now bump the setup_header version in arch/x86/boot/header.S. > > Please don't bump the protocol revision here, otherwise we would create > a very odd pseudo-revision of the protocol: 2.15 without SETUP_INDIRECT > support, should patch 3/3 end up getting reverted. > > (It is possible to detect, of course, but I feel pretty sure in saying > that bootloaders won't get it right.) > > Other than that: > > Reviewed-by: H. Peter Anvin (Intel) I have just sent v5. Please take a look. Daniel ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v5 0/3] x86/boot: Introduce the kernel_info et consortes
Hi, Due to very limited space in the setup_header this patch series introduces new kernel_info struct which will be used to convey information from the kernel to the bootloader. This way the boot protocol can be extended regardless of the setup_header limitations. Additionally, the patch series introduces some convenience features like the setup_indirect struct and the kernel_info.setup_type_max field. Daniel Documentation/x86/boot.rst | 174 ++ arch/x86/boot/Makefile | 2 +- arch/x86/boot/compressed/Makefile | 4 +- arch/x86/boot/compressed/kaslr.c | 12 ++ arch/x86/boot/compressed/kernel_info.S | 22 ++ arch/x86/boot/header.S | 3 +- arch/x86/boot/tools/build.c| 5 +++ arch/x86/include/uapi/asm/bootparam.h | 16 +++- arch/x86/kernel/e820.c | 11 + arch/x86/kernel/kdebugfs.c | 20 +++-- arch/x86/kernel/ksysfs.c | 30 ++ arch/x86/kernel/setup.c| 4 ++ arch/x86/mm/ioremap.c | 11 + 13 files changed, 298 insertions(+), 16 deletions(-) Daniel Kiper (3): x86/boot: Introduce the kernel_info x86/boot: Introduce the kernel_info.setup_type_max x86/boot: Introduce the setup_indirect ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v5 2/3] x86/boot: Introduce the kernel_info.setup_type_max
This field contains maximal allowed type for setup_data. This patch does not bump setup_header version in arch/x86/boot/header.S because it will be followed by additional changes coming into the Linux/x86 boot protocol. Suggested-by: H. Peter Anvin (Intel) Signed-off-by: Daniel Kiper Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Ross Philipson Reviewed-by: H. Peter Anvin (Intel) --- v5 - suggestions/fixes: - move incorrect references to the setup_indirect to the patch introducing it, - do not bump setup_header version in arch/x86/boot/header.S (suggested by H. Peter Anvin). --- Documentation/x86/boot.rst | 9 - arch/x86/boot/compressed/kernel_info.S | 5 + arch/x86/include/uapi/asm/bootparam.h | 3 +++ 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst index c60fafda9427..1dad6eee8a5c 100644 --- a/Documentation/x86/boot.rst +++ b/Documentation/x86/boot.rst @@ -73,7 +73,7 @@ Protocol 2.14:BURNT BY INCORRECT COMMIT ae7e1238e68f2a472a125673ab506d49158c188 (x86/boot: Add ACPI RSDP address to setup_header) DO NOT USE!!! ASSUME SAME AS 2.13. -Protocol 2.15: (Kernel 5.5) Added the kernel_info. +Protocol 2.15: (Kernel 5.5) Added the kernel_info and kernel_info.setup_type_max. = .. note:: @@ -981,6 +981,13 @@ Offset/size: 0x0008/4 This field contains the size of the kernel_info including kernel_info.header and kernel_info.kernel_info_var_len_data. + == +Field name:setup_type_max +Offset/size: 0x0008/4 + == + + This field contains maximal allowed type for setup_data. + The Image Checksum == diff --git a/arch/x86/boot/compressed/kernel_info.S b/arch/x86/boot/compressed/kernel_info.S index 8ea6f6e3feef..018dacbd753e 100644 --- a/arch/x86/boot/compressed/kernel_info.S +++ b/arch/x86/boot/compressed/kernel_info.S @@ -1,5 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ +#include + .section ".rodata.kernel_info", "a" .global kernel_info @@ -12,6 +14,9 @@ kernel_info: /* Size total. */ .long kernel_info_end - kernel_info + /* Maximal allowed type for setup_data. */ + .long SETUP_TYPE_MAX + kernel_info_var_len_data: /* Empty for time being... */ kernel_info_end: diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index a1ebcd7a991c..dbb41128e5a0 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -11,6 +11,9 @@ #define SETUP_APPLE_PROPERTIES 5 #define SETUP_JAILHOUSE6 +/* max(SETUP_*) */ +#define SETUP_TYPE_MAX SETUP_JAILHOUSE + /* ram_size flags */ #define RAMDISK_IMAGE_START_MASK 0x07FF #define RAMDISK_PROMPT_FLAG0x8000 -- 2.11.0 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v5 1/3] x86/boot: Introduce the kernel_info
The relationships between the headers are analogous to the various data sections: setup_header = .data boot_params/setup_data = .bss What is missing from the above list? That's right: kernel_info = .rodata We have been (ab)using .data for things that could go into .rodata or .bss for a long time, for lack of alternatives and -- especially early on -- inertia. Also, the BIOS stub is responsible for creating boot_params, so it isn't available to a BIOS-based loader (setup_data is, though). setup_header is permanently limited to 144 bytes due to the reach of the 2-byte jump field, which doubles as a length field for the structure, combined with the size of the "hole" in struct boot_params that a protected-mode loader or the BIOS stub has to copy it into. It is currently 119 bytes long, which leaves us with 25 very precious bytes. This isn't something that can be fixed without revising the boot protocol entirely, breaking backwards compatibility. boot_params proper is limited to 4096 bytes, but can be arbitrarily extended by adding setup_data entries. It cannot be used to communicate properties of the kernel image, because it is .bss and has no image-provided content. kernel_info solves this by providing an extensible place for information about the kernel image. It is readonly, because the kernel cannot rely on a bootloader copying its contents anywhere, but that is OK; if it becomes necessary it can still contain data items that an enabled bootloader would be expected to copy into a setup_data chunk. This patch does not bump setup_header version in arch/x86/boot/header.S because it will be followed by additional changes coming into the Linux/x86 boot protocol. Suggested-by: H. Peter Anvin (Intel) Signed-off-by: Daniel Kiper Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Ross Philipson Reviewed-by: H. Peter Anvin (Intel) --- v4 - suggestions/fixes: - improve the documentation (suggested by Randy Dunlap and Konrad Rzeszutek Wilk). v3 - suggestions/fixes: - split kernel_info data into fixed and variable sized regions, (suggested by H. Peter Anvin), - change kernel_info.header value to "LToP" (0x506f544c), (suggested by H. Peter Anvin), - improve the comments, - improve the documentation. v2 - suggestions/fixes: - rename setup_header2 to kernel_info, (suggested by H. Peter Anvin), - change kernel_info.header value to "InfO" (0x4f666e49), - new kernel_info description in Documentation/x86/boot.rst, (suggested by H. Peter Anvin), - drop kernel_info_offset_update() as an overkill and update kernel_info offset directly from main(), (suggested by Eric Snowberg), - new commit message (suggested by H. Peter Anvin), - fix some commit message misspellings (suggested by Eric Snowberg). --- Documentation/x86/boot.rst | 126 + arch/x86/boot/Makefile | 2 +- arch/x86/boot/compressed/Makefile | 4 +- arch/x86/boot/compressed/kernel_info.S | 17 + arch/x86/boot/header.S | 1 + arch/x86/boot/tools/build.c| 5 ++ arch/x86/include/uapi/asm/bootparam.h | 1 + 7 files changed, 153 insertions(+), 3 deletions(-) create mode 100644 arch/x86/boot/compressed/kernel_info.S diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst index 08a2f100c0e6..c60fafda9427 100644 --- a/Documentation/x86/boot.rst +++ b/Documentation/x86/boot.rst @@ -68,8 +68,25 @@ Protocol 2.12(Kernel 3.8) Added the xloadflags field and extension fields Protocol 2.13 (Kernel 3.14) Support 32- and 64-bit flags being set in xloadflags to support booting a 64-bit kernel from 32-bit EFI + +Protocol 2.14: BURNT BY INCORRECT COMMIT ae7e1238e68f2a472a125673ab506d49158c1889 + (x86/boot: Add ACPI RSDP address to setup_header) + DO NOT USE!!! ASSUME SAME AS 2.13. + +Protocol 2.15: (Kernel 5.5) Added the kernel_info. = +.. note:: + The protocol version number should be changed only if the setup header + is changed. There is no need to update the version number if boot_params + or kernel_info are changed. Additionally, it is recommended to use + xloadflags (in this case the protocol version number should not be + updated either) or kernel_info to communicate supported Linux kernel + features to the boot loader. Due to very limited space available in + the original setup header every update to it should be considered + with great care. Starting from the protocol 2.15 the primary way to + communicate things to the boot loader is the kernel_info. + Memory Layout = @@ -207,6 +224,7 @@ Offset/Size Proto NameMeaning 0258/8 2.10+ pref_addressPreferred loading address 0260/4 2.10+ init_siz
[Xen-devel] [PATCH v5 3/3] x86/boot: Introduce the setup_indirect
The setup_data is a bit awkward to use for extremely large data objects, both because the setup_data header has to be adjacent to the data object and because it has a 32-bit length field. However, it is important that intermediate stages of the boot process have a way to identify which chunks of memory are occupied by kernel data. Thus we introduce an uniform way to specify such indirect data as setup_indirect struct and SETUP_INDIRECT type. And finally bump setup_header version in arch/x86/boot/header.S. Suggested-by: H. Peter Anvin (Intel) Signed-off-by: Daniel Kiper Acked-by: Konrad Rzeszutek Wilk Reviewed-by: Ross Philipson Reviewed-by: H. Peter Anvin (Intel) --- v5 - suggestions/fixes: - bump setup_header version in arch/x86/boot/header.S (suggested by H. Peter Anvin). v4 - suggestions/fixes: - change "Note:" to ".. note::". v3 - suggestions/fixes: - add setup_indirect mapping/KASLR avoidance/etc. code (suggested by H. Peter Anvin), - the SETUP_INDIRECT sets most significant bit right now; this way it is possible to differentiate regular setup_data and setup_indirect objects in the debugfs filesystem. v2 - suggestions/fixes: - add setup_indirect usage example (suggested by Eric Snowberg and Ross Philipson). --- Documentation/x86/boot.rst | 43 +- arch/x86/boot/compressed/kaslr.c | 12 ++ arch/x86/boot/compressed/kernel_info.S | 2 +- arch/x86/boot/header.S | 2 +- arch/x86/include/uapi/asm/bootparam.h | 16 ++--- arch/x86/kernel/e820.c | 11 + arch/x86/kernel/kdebugfs.c | 20 arch/x86/kernel/ksysfs.c | 30 ++-- arch/x86/kernel/setup.c| 4 arch/x86/mm/ioremap.c | 11 + 10 files changed, 134 insertions(+), 17 deletions(-) diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst index 1dad6eee8a5c..38155ba8740f 100644 --- a/Documentation/x86/boot.rst +++ b/Documentation/x86/boot.rst @@ -827,6 +827,47 @@ Protocol: 2.09+ sure to consider the case where the linked list already contains entries. + The setup_data is a bit awkward to use for extremely large data objects, + both because the setup_data header has to be adjacent to the data object + and because it has a 32-bit length field. However, it is important that + intermediate stages of the boot process have a way to identify which + chunks of memory are occupied by kernel data. + + Thus setup_indirect struct and SETUP_INDIRECT type were introduced in + protocol 2.15. + + struct setup_indirect { +__u32 type; +__u32 reserved; /* Reserved, must be set to zero. */ +__u64 len; +__u64 addr; + }; + + The type member is a SETUP_INDIRECT | SETUP_* type. However, it cannot be + SETUP_INDIRECT itself since making the setup_indirect a tree structure + could require a lot of stack space in something that needs to parse it + and stack space can be limited in boot contexts. + + Let's give an example how to point to SETUP_E820_EXT data using setup_indirect. + In this case setup_data and setup_indirect will look like this: + + struct setup_data { +__u64 next = 0 or ; +__u32 type = SETUP_INDIRECT; +__u32 len = sizeof(setup_data); +__u8 data[sizeof(setup_indirect)] = struct setup_indirect { + __u32 type = SETUP_INDIRECT | SETUP_E820_EXT; + __u32 reserved = 0; + __u64 len = ; + __u64 addr = ; +} + } + +.. note:: + SETUP_INDIRECT | SETUP_NONE objects cannot be properly distinguished + from SETUP_INDIRECT itself. So, this kind of objects cannot be provided + by the bootloaders. + Field name:pref_address Type: read (reloc) @@ -986,7 +1027,7 @@ Field name:setup_type_max Offset/size: 0x0008/4 == - This field contains maximal allowed type for setup_data. + This field contains maximal allowed type for setup_data and setup_indirect structs. The Image Checksum diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index 2e53c056ba20..bb9bfef174ae 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -459,6 +459,18 @@ static bool mem_avoid_overlap(struct mem_vector *img, is_overlapping = true; } + if (ptr->type == SETUP_INDIRECT && + ((struct setup_indirect *)ptr->data)->type != SETUP_INDIRECT) { + avoid.start = ((struct setup_indirect *)ptr->data)->addr; + avoid.size = ((struct setup_indirect *)ptr->data)->len; + + if (mem_overlaps(img, &avoid) && (avoid.start < earliest)) { + *overlap = avoid; + earliest = overlap->start; + is_over
Re: [Xen-devel] [PATCH] xen/events: remove event handling recursion detection
On 04.11.19 15:35, Jan Beulich wrote: On 04.11.2019 14:58, Juergen Gross wrote: __xen_evtchn_do_upcall() contains guards against being called recursively. This mechanism was introduced in the early pvops times (kernel 2.6.26) when there were still Xen versions around not honoring disabled interrupts for sending events to pv guests. This was changed in Xen 3.0, which is much older than any Xen version supported by the kernel, so the recursion detection can be removed. Would you mind pointing out which exact change(s) this was(were)? Linux kernel: 229664bee6126e01f8662976a5fe2e79813b77c8 Xen: d8263e8dbaf5ef1445bee0662143a0fcb6d43466 It had always been my understanding that the recursion detection was mainly to guard against drivers re-enabling interrupts transiently in their handlers (which in turn may no longer be an issue in modern Linux kernels). This would have been doable with a simple bool. The more complex xchg based logic was IMO for recursion detection at any point. Juergen ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
On 04.11.2019 15:59, Andrew Cooper wrote: > On 04/11/2019 13:25, Jan Beulich wrote: >> On 01.11.2019 21:25, Andrew Cooper wrote: >>> --- a/xen/arch/x86/cpu/intel.c >>> +++ b/xen/arch/x86/cpu/intel.c >>> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) >>> if (disable) { >>> wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); >>> bootsym(trampoline_misc_enable_off) |= disable; >>> + bootsym(trampoline_efer) |= EFER_NX; >>> } >> I'm fine with all other changes here, just this one concerns me: >> Before your change we latch a value into trampoline_misc_enable_off >> just to be used for subsequent IA32_MISC_ENABLE writes we do. This >> means that, on a hypervisor (like Xen itself) simply discarding >> guest writes to the MSR (which is "fine" with the described usage >> of ours up to now), with your change we'd now end up trying to set >> EFER.NX when the feature may not actually be enabled in >> IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? >> I.e. don't we need to read back the MSR value here, and verify >> we actually managed to clear the bit before actually OR-ing in >> EFER_NX? > > If this is a problem in practice, execution won't continue beyond the > next if() condition just out of context, which set EFER.NX on the BSP > with an unguarded WRMSR. And how is this any good? This kind of crash is exactly what I'm asking to avoid. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
On 04/11/2019 13:25, Jan Beulich wrote: > On 01.11.2019 21:25, Andrew Cooper wrote: >> --- a/xen/arch/x86/cpu/intel.c >> +++ b/xen/arch/x86/cpu/intel.c >> @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) >> if (disable) { >> wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); >> bootsym(trampoline_misc_enable_off) |= disable; >> +bootsym(trampoline_efer) |= EFER_NX; >> } > I'm fine with all other changes here, just this one concerns me: > Before your change we latch a value into trampoline_misc_enable_off > just to be used for subsequent IA32_MISC_ENABLE writes we do. This > means that, on a hypervisor (like Xen itself) simply discarding > guest writes to the MSR (which is "fine" with the described usage > of ours up to now), with your change we'd now end up trying to set > EFER.NX when the feature may not actually be enabled in > IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? > I.e. don't we need to read back the MSR value here, and verify > we actually managed to clear the bit before actually OR-ing in > EFER_NX? If this is a problem in practice, execution won't continue beyond the next if() condition just out of context, which set EFER.NX on the BSP with an unguarded WRMSR. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v1 1/2] libxl: introduce new backend type VINPUT
On Mon, Oct 28, 2019 at 4:06 PM Oleksandr Grytsov wrote: > > On Wed, Oct 16, 2019 at 4:26 PM Oleksandr Grytsov wrote: > > > > On Fri, Oct 11, 2019 at 8:04 PM Ian Jackson wrote: > > > > > > Oleksandr Grytsov writes ("Re: [PATCH v1 1/2] libxl: introduce new > > > backend type VINPUT"): > > > > On Fri, Oct 11, 2019 at 5:58 PM Ian Jackson > > > > wrote: > > > > > I think it was a48e00f14a2d "libxl: add backend type and id to vkb" > > > > > which introduced what you are calling "user space" backends. It > > > > > appears that the vkb backend type enum was introduced there > > > > > specifically to distinguish between these two situations. For reasons > > > > > > > > > > Am I wrong ? If not, why is this not working or not suitable ? > > > > > > > > You are right. It is not working in some cases due to QEMU_BACKEND > > > > macro. > > > > QEMU_BACKEND treats both backends as QEMU. As result xl performs device > > > > hotplug on add/remove device. Which is not expected in case "user > > > > space" backend. > > > > > > Then perhaps this macro needs to be adjusted or called only > > > conditionally or something ? > > > > I had an idea to move this macro to libxl__device_type and let device > > itself decides > > if it is qemu backend. But in case of VKBD it will read XS to determine > > backend > > type. I guess it is ok. > > > > > > > > > In this patch I propose solution similar to VUSB device. Where VUSB > > > > used for frontend and depends on backend (kernel or QEMU) either > > > > VUSB or QUSB used for backend device type e.g. use different backend > > > > device type for different backends. > > > > > > I confess I don't quite follow all the VUSB stuff but I don't think it > > > is necessarily a good model. > > > > If you don't mind to move QEMU_BACKEND macrto to libxl__device_type then > > no need to add new device type at all. > > > > > > > > > Introducing new backend device type for VKBD also allows to have > > > > both backends (QEMU and non QEMU) run in same domain. Not sure if it > > > > is useful scenario but with this proposal it is possible from > > > > technical point of view. > > > > > > I don't understand why this is not possible simply by having a > > > different backend type. > > > > > > > > I also don't understand why the "user space" kbd backend seems to be > > > > > "linux" in the enum. > > > > > > > > I agree this is not so good name. But I don't know how to call > > > > backends which are not running > > > > inside QEMU in general. > > > > > > I think this would be useable on freebsd ? "linux" definitely seems > > > wrong. I see it hasn't been in a release so it is not too late to > > > rename it, subject to discussion with Juergen as RM. > > > > > > > > Where is the implementation of this user space > > > > > backend ? > > > > > > > > We have extended kbd interface (kbdif.h) to support multi-touch events > > > > as well. And we have > > > > implemented own kbd backend https://github.com/xen-troops/displ_be/ > > > > It is integrated with display backend as both use wayland API. > > > > > > Great. > > > > > > > > Is it be controlled entirely through xenstore ? > > > > > > > > Yes it is controlled entirely through xenstore: lib xl creates > > > > frontend/backend entries with > > > > initial states and configuration. > > > > > > And your display backend in "troops" (is that the name of your > > > project) checks whether the backend type is set to "linux", so that it > > > knows to ignore ones that say "qemu" ? > > > > > > Maybe "linux" should be "troops"... > > > > > > > It doesn't look as generic solution. If some user implements own backend > > it should add new entry into backend type enum. > > What about to have just string value instead of enum? In case QEMU > > we don't have such entry at all but in case custom backend the user > > can't put any string value here to be recognized by his backend. > > > > > Ian. > > ping ping Ian, I'm waiting for your comments about following questions: 1. Move QEMU_BACKEND macro to libxl__device_type structure as function and let the device to decide it has QEMU backend: struct libxl__device_type { ... device_qemu_backend_fn_t qemu_backend } In this case, introducing new device type for VKBD is not needed. The VKBD device will check backend type XS entry to define which backend is running. 2. Use string type for VKBD backend_type field instead of enum. It will be empty for QEMU and generic for "user space" backends. Thanks. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] xen/events: remove event handling recursion detection
On 04.11.2019 14:58, Juergen Gross wrote: > __xen_evtchn_do_upcall() contains guards against being called > recursively. This mechanism was introduced in the early pvops times > (kernel 2.6.26) when there were still Xen versions around not honoring > disabled interrupts for sending events to pv guests. > > This was changed in Xen 3.0, which is much older than any Xen version > supported by the kernel, so the recursion detection can be removed. Would you mind pointing out which exact change(s) this was(were)? It had always been my understanding that the recursion detection was mainly to guard against drivers re-enabling interrupts transiently in their handlers (which in turn may no longer be an issue in modern Linux kernels). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] xen/events: remove event handling recursion detection
__xen_evtchn_do_upcall() contains guards against being called recursively. This mechanism was introduced in the early pvops times (kernel 2.6.26) when there were still Xen versions around not honoring disabled interrupts for sending events to pv guests. This was changed in Xen 3.0, which is much older than any Xen version supported by the kernel, so the recursion detection can be removed. Signed-off-by: Juergen Gross --- drivers/xen/events/events_base.c | 16 +++- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c index 6c8843968a52..33212c494afd 100644 --- a/drivers/xen/events/events_base.c +++ b/drivers/xen/events/events_base.c @@ -1213,31 +1213,21 @@ void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector) notify_remote_via_irq(irq); } -static DEFINE_PER_CPU(unsigned, xed_nesting_count); - static void __xen_evtchn_do_upcall(void) { struct vcpu_info *vcpu_info = __this_cpu_read(xen_vcpu); - int cpu = get_cpu(); - unsigned count; + int cpu = smp_processor_id(); do { vcpu_info->evtchn_upcall_pending = 0; - if (__this_cpu_inc_return(xed_nesting_count) - 1) - goto out; - xen_evtchn_handle_events(cpu); BUG_ON(!irqs_disabled()); - count = __this_cpu_read(xed_nesting_count); - __this_cpu_write(xed_nesting_count, 0); - } while (count != 1 || vcpu_info->evtchn_upcall_pending); - -out: + rmb(); /* Hypervisor can set upcall pending. */ - put_cpu(); + } while (vcpu_info->evtchn_upcall_pending); } void xen_evtchn_do_upcall(struct pt_regs *regs) -- 2.16.4 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 3/3] x86/e820: fix 640k - 1M region reservation logic
On 01.11.2019 21:25, Andrew Cooper wrote: > From: Sergey Dyasli > > Converting a guest from PV to PV-in-PVH makes the guest to have 384k > less memory, which may confuse guest's balloon driver. This happens > because Xen unconditionally reserves 640k - 1M region in E820 despite > the fact that it's really a usable RAM in PVH boot mode. > > Fix this by skipping region type change in virtualised environments, > trusting whatever memory map our hypervisor has provided. > > Signed-off-by: Sergey Dyasli Reviewed-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/3] x86/boot: Cache cpu_has_hypervisor very early on boot
On 01.11.2019 21:25, Andrew Cooper wrote: > --- a/xen/arch/x86/boot/head.S > +++ b/xen/arch/x86/boot/head.S > @@ -630,6 +630,10 @@ trampoline_setup: > > 1: > /* Interrogate CPU extended features via CPUID. */ > +mov $1, %eax > +cpuid > +mov %ecx, sym_fs(boot_cpu_data) + > CPUINFO_FEATURE_OFFSET(X86_FEATURE_HYPERVISOR) I understand the ECX output is all we need right now. But wouldn't it be better to then store EDX as well (and similarly ECX of 0x8001 output)? Also, along the lines of a question back to Sergey on his standalone patch, wouldn't it be better to take the opportunity and check here that CPUID leaf 1 is actually valid? Of course the question on the (intended) effect of "cpuid=no-hypervisor" also remains. As said before, I think this should be honored by all of our code that possibly can (i.e. at least everything following command line parsing). Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [BUG] Invalid guest state in Xen master, dom0 linux-5.3.6, domU windows 10
On 03/11/2019 10:23, Håkon Alstadheim wrote: > (XEN) [2019-11-02 14:09:46] d2v0 vmentry failure (reason 0x8021): > Invalid guest state (0) > (XEN) [2019-11-02 14:09:46] * VMCS Area ** > (XEN) [2019-11-02 14:09:46] *** Guest State *** > (XEN) [2019-11-02 14:09:46] CR0: actual=0x80050031, > shadow=0x80050031, gh_mask= > (XEN) [2019-11-02 14:09:46] CR4: actual=0x00172678, > shadow=0x00170678, gh_mask=ffe8f860 > (XEN) [2019-11-02 14:09:46] CR3 = 0x001aa002 > (XEN) [2019-11-02 14:09:46] RSP = 0x8c0f4dd71e98 > (0x8c0f4dd71e98) RIP = 0xd18a040bb75e (0xd18a040bb75e) > (XEN) [2019-11-02 14:09:46] RFLAGS=0x0187 (0x0187) DR7 = > 0x0400 > (XEN) [2019-11-02 14:09:46] Sysenter RSP= > CS:RIP=: > (XEN) [2019-11-02 14:09:46] sel attr limit base > (XEN) [2019-11-02 14:09:46] CS: 0010 0209b > (XEN) [2019-11-02 14:09:46] DS: 002b 0c0f3 > (XEN) [2019-11-02 14:09:46] SS: 0018 04093 > (XEN) [2019-11-02 14:09:46] ES: 002b 0c0f3 > (XEN) [2019-11-02 14:09:46] FS: 0053 040f3 3c00 > (XEN) [2019-11-02 14:09:46] GS: 002b 0c0f3 f8044ff8 > (XEN) [2019-11-02 14:09:46] GDTR: 0057 f80459c61fb0 > (XEN) [2019-11-02 14:09:46] LDTR: 1c000 > (XEN) [2019-11-02 14:09:46] IDTR: 012f d18a014a0980 > (XEN) [2019-11-02 14:09:46] TR: 0040 0008b 0067 f80459c6 > (XEN) [2019-11-02 14:09:46] EFER(VMCS) = 0x0d01 PAT = > 0x0007010600070106 > (XEN) [2019-11-02 14:09:46] PreemptionTimer = 0x SM Base = > 0x > (XEN) [2019-11-02 14:09:46] DebugCtl = 0x > DebugExceptions = 0x > (XEN) [2019-11-02 14:09:46] Interruptibility = 0002 ActivityState > = > (XEN) [2019-11-02 14:09:46] InterruptStatus = > (XEN) [2019-11-02 14:09:46] *** Host State *** > (XEN) [2019-11-02 14:09:46] RIP = 0x82d080341950 > (vmx_asm_vmexit_handler) RSP = 0x83083ff0ff70 > (XEN) [2019-11-02 14:09:46] CS=e008 SS= DS= ES= FS= > GS= TR=e040 > (XEN) [2019-11-02 14:09:46] FSBase= > GSBase= TRBase=83083ff14000 > (XEN) [2019-11-02 14:09:46] GDTBase=83083ff03000 > IDTBase=83083ff07000 > (XEN) [2019-11-02 14:09:46] CR0=80050033 CR3=00054dbea000 > CR4=001526e0 > (XEN) [2019-11-02 14:09:46] Sysenter RSP=83083ff0ffa0 > CS:RIP=e008:82d080395440 > (XEN) [2019-11-02 14:09:46] EFER = 0x0d01 PAT = > 0x050100070406 > (XEN) [2019-11-02 14:09:46] *** Control State *** > (XEN) [2019-11-02 14:09:46] PinBased=00bf CPUBased=b62065fa > SecondaryExec=17eb > (XEN) [2019-11-02 14:09:46] EntryControls=d3ff ExitControls=002fefff > (XEN) [2019-11-02 14:09:46] ExceptionBitmap=00060002 PFECmask= > PFECmatch= > (XEN) [2019-11-02 14:09:46] VMEntry: intr_info=8501 > errcode= ilen=0001 > (XEN) [2019-11-02 14:09:46] VMExit: intr_info=8501 > errcode= ilen=0001 > (XEN) [2019-11-02 14:09:46] reason=8021 > qualification= > (XEN) [2019-11-02 14:09:46] IDTVectoring: info= errcode= > (XEN) [2019-11-02 14:09:46] TSC Offset = 0xf45ded46dd57 TSC > Multiplier = 0x > (XEN) [2019-11-02 14:09:46] TPR Threshold = 0x00 PostedIntrVec = 0xf5 > (XEN) [2019-11-02 14:09:46] EPT pointer = 0x00054e3a701e EPTP > index = 0x > (XEN) [2019-11-02 14:09:46] PLE Gap=0080 Window=1000 > (XEN) [2019-11-02 14:09:46] Virtual processor ID = 0x5a02 VMfunc > controls = > (XEN) [2019-11-02 14:09:46] ** > (XEN) [2019-11-02 14:09:46] domain_crash called from vmx.c:3335 > (XEN) [2019-11-02 14:09:46] Domain 2 (vcpu#0) crashed on cpu#2: Interruptibility = 0002 (Blocked by Mov SS) and VMEntry: intr_info=8501 (ICEBP) Dare I ask what you're running in your windows guest? Unless it is a vulnerability test suite, I'm rather concerned. Anyway - there are plenty of #DB injection corner cases, some of which I'm still working through with Intel. This is a new one I wasn't aware of before, but its not surprising in hindsight. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 1/3] x86/boot: Remove cached CPUID data from the trampoline
On 01.11.2019 21:25, Andrew Cooper wrote: > --- a/xen/arch/x86/cpu/intel.c > +++ b/xen/arch/x86/cpu/intel.c > @@ -270,6 +270,7 @@ static void early_init_intel(struct cpuinfo_x86 *c) > if (disable) { > wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable & ~disable); > bootsym(trampoline_misc_enable_off) |= disable; > + bootsym(trampoline_efer) |= EFER_NX; > } I'm fine with all other changes here, just this one concerns me: Before your change we latch a value into trampoline_misc_enable_off just to be used for subsequent IA32_MISC_ENABLE writes we do. This means that, on a hypervisor (like Xen itself) simply discarding guest writes to the MSR (which is "fine" with the described usage of ours up to now), with your change we'd now end up trying to set EFER.NX when the feature may not actually be enabled in IA32_MISC_ENABLE. Wouldn't such an EFER write be liable to #GP(0)? I.e. don't we need to read back the MSR value here, and verify we actually managed to clear the bit before actually OR-ing in EFER_NX? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] x86: explicitly disallow guest access to PPIN
On 01.11.2019 19:49, Andrew Cooper wrote: > On 01/11/2019 14:29, Andrew Cooper wrote: >> On 01/11/2019 14:00, Eslam Elnikety wrote: >>> Thanks for this series, Jan. >>> >>> On 30.10.19 11:39, Jan Beulich wrote: To fulfill the "protected" in its name, don't let the real hardware values "shine through". Report a control register value expressing this. Signed-off-by: Jan Beulich --- TBD: Do we want to permit Dom0 access? >>> It would be nice to give an administrator a way to get PPIN outside >>> the context of an MCE when needed. >> I suppose this is a reasonable request. We should expose it to the >> hardware domain. > > Actually on further thoughts, I'm going to backtrack slightly. > > It is reasonable to give to dom0, but it is not reasonable to provide it > by emulating the MSR interface. The problem is that dom0's result is > sensitive to where it happens to be scheduled. > > The only sane way of letting dom0 have access is via a hypercall which > returns "no PPIN" or all sockets information in one go, irrespective of > which socket the current vcpu happens to be executing on. > > This leaves us back in the (substantially easier) position of not having > to virtualise the MSR interface to begin with. Right, hence my question whether to make this a new sysctl subop, or whether to permit reading of this one MSR via XENPF_resource_op (or yet some other mechanism). Personally I'd go the XENPF_resource_op route. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking and logging
> -Original Message- > From: Anthony PERARD > Sent: 04 November 2019 12:14 > To: Durrant, Paul > Cc: Andrew Cooper ; xen- > de...@lists.xenproject.org; jgr...@suse.com; Igor Druzhinin > ; jbeul...@suse.com > Subject: Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking > and logging > > On Mon, Nov 04, 2019 at 11:13:48AM +, Durrant, Paul wrote: > > > -Original Message- > > > From: Andrew Cooper > > > Sent: 04 November 2019 11:06 > > > To: Durrant, Paul ; xen- > de...@lists.xenproject.org > > > Cc: Igor Druzhinin ; jgr...@suse.com; > > > jbeul...@suse.com > > > Subject: Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify > locking > > > and logging > > > > > > On 04/11/2019 08:31, Durrant, Paul wrote: > > > >> -Original Message- > > > >> From: Igor Druzhinin > > > >> Sent: 01 November 2019 19:28 > > > >> To: xen-devel@lists.xenproject.org > > > >> Cc: Durrant, Paul ; jbeul...@suse.com; > > > >> jgr...@suse.com > > > >> Subject: [PATCH for-4.13 v2] passthrough: simplify locking and > logging > > > >> > > > >> From: Paul Durrant > > > >> > > > >> > > > >> Signed-off-by: Paul Durrant > > > >> --- > > > >> > > > >> > > > >> v2: updated Paul's email address > > > > > > This was work you did at Citrix, yes? > > > > > > > Reviewed-by: Paul Durrant > > > > > > SoB and R-by? > > > > I did do the work while I was at Citrix, but surely the SoB must be > updated since the patch is only now being posted? > > I don't think it matters when a patch is publicly posted, the SoB > shouldn't change. > Also, Igor, I think you need to add your own SoB to the patch. This would > be because of (b) or (c) of the "Developer's Certificate of Origin 1.1" > [1]. > > > As for the R-b, why should that be historic? > > I think he meant that reviewing your own work is a bit weird. On the > other hand, it is possible to have both a SoB and a R-b from the same > persone, if the original patch has been modified. I was merely reviewing the change of email address and verifying that it was the patch I wrote :-) Paul > > > [1]: > Developer's Certificate of Origin 1.1 > > By making a contribution to this project, I certify that: > > (a) The contribution was created in whole or in part by me and I > have the right to submit it under the open source license > indicated in the file; or > > (b) The contribution is based upon previous work that, to the best > of my knowledge, is covered under an appropriate open source > license and I have the right under that license to submit that > work with modifications, whether created in whole or in part > by me, under the same open source license (unless I am > permitted to submit under a different license), as indicated > in the file; or > > (c) The contribution was provided directly to me by some other > person who certified (a), (b) or (c) and I have not modified > it. > > (d) I understand and agree that this project and the contribution > are public and that a record of the contribution (including all > personal information I submit with it, including my sign-off) is > maintained indefinitely and may be redistributed consistent with > this project or the open source license(s) involved. > > > Cheers, > > -- > Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking and logging
On Mon, Nov 04, 2019 at 11:13:48AM +, Durrant, Paul wrote: > > -Original Message- > > From: Andrew Cooper > > Sent: 04 November 2019 11:06 > > To: Durrant, Paul ; xen-devel@lists.xenproject.org > > Cc: Igor Druzhinin ; jgr...@suse.com; > > jbeul...@suse.com > > Subject: Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking > > and logging > > > > On 04/11/2019 08:31, Durrant, Paul wrote: > > >> -Original Message- > > >> From: Igor Druzhinin > > >> Sent: 01 November 2019 19:28 > > >> To: xen-devel@lists.xenproject.org > > >> Cc: Durrant, Paul ; jbeul...@suse.com; > > >> jgr...@suse.com > > >> Subject: [PATCH for-4.13 v2] passthrough: simplify locking and logging > > >> > > >> From: Paul Durrant > > >> > > >> > > >> Signed-off-by: Paul Durrant > > >> --- > > >> > > >> > > >> v2: updated Paul's email address > > > > This was work you did at Citrix, yes? > > > > > Reviewed-by: Paul Durrant > > > > SoB and R-by? > > I did do the work while I was at Citrix, but surely the SoB must be updated > since the patch is only now being posted? I don't think it matters when a patch is publicly posted, the SoB shouldn't change. Also, Igor, I think you need to add your own SoB to the patch. This would be because of (b) or (c) of the "Developer's Certificate of Origin 1.1" [1]. > As for the R-b, why should that be historic? I think he meant that reviewing your own work is a bit weird. On the other hand, it is possible to have both a SoB and a R-b from the same persone, if the original patch has been modified. [1]: Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. Cheers, -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [xen-unstable test] 143563: regressions - FAIL
flight 143563 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/143563/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-shadow18 guest-localmigrate/x10 fail REGR. vs. 142750 test-amd64-amd64-xl-qcow2 19 guest-start/debian.repeat fail REGR. vs. 142750 test-amd64-amd64-libvirt-vhd 17 guest-start/debian.repeat fail REGR. vs. 142750 test-amd64-i386-xl-raw 19 guest-start/debian.repeat fail REGR. vs. 142750 test-armhf-armhf-xl-vhd 15 guest-start/debian.repeat fail REGR. vs. 142750 test-armhf-armhf-libvirt-raw 15 guest-start/debian.repeat fail REGR. vs. 142750 Regressions which are regarded as allowable (not blocking): test-armhf-armhf-xl-rtds 12 guest-start fail REGR. vs. 142750 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 142750 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 142750 test-armhf-armhf-libvirt 14 saverestore-support-checkfail like 142750 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 142750 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 142750 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail like 142750 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 142750 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 142750 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 142750 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-i386-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-xl-pvshim12 guest-start fail never pass test-amd64-amd64-libvirt 13 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 13 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check fail never pass test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2 fail never pass test-arm64-arm64-xl 13 migrate-support-checkfail never pass test-arm64-arm64-xl 14 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 13 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 13 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 14 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 13 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 13 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 14 saverestore-support-checkfail never pass test-arm64-arm64-xl-seattle 13 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 14 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 12 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 13 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail never pass test-armhf-armhf-xl 13 migrate-support-checkfail never pass test-armhf-armhf-xl 14 saverestore-support-checkfail never pass test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop fail never pass test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass test-amd64-amd64
Re: [Xen-devel] [osstest test] 143493: regressions - FAIL
osstest service owner writes ("[osstest test] 143493: regressions - FAIL"): > flight 143493 osstest real [real] > http://logs.test-lab.xenproject.org/osstest/logs/143493/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. > 143392 I don't know what this is. Linux fails to boot under Xen. The last message is random: crng init done But it doesn't seem at all probable that this is anything to do with osstest. > test-amd64-amd64-xl-pvshim 18 guest-localmigrate/x10 fail REGR. vs. > 143392 This is the known pvshim timekeeping problem. there's a time jump in the pvshim, which likely triggers the watchdog: http://logs.test-lab.xenproject.org/osstest/logs/143493/test-amd64-amd64-xl-pvshim/pinot1---var-log-xen-console-guest-debian.guest.osstest--incoming.log So following irc discussion I have force pushed this. I have also pushed the bootloader console change (for the benefit of arm) to osstest pretest, but in the meantime when we have flights where that's the only problem we can force push the relevant branch. Ian. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] Community Call: Call for Agenda Items and call details for Nov 7, 15:00 - 16:00 UTC
Dear community members, please send me agenda items for next week’s community call. A draft agenda is at https://cryptpad.fr/pad/#/2/pad/edit/SkeU+Z5J9WIIU9ZsXlojiXcQ/ Please add agenda items to the document or reply to this e-mail Note that I am on PTO today and tomorrow Last month’s minutes are at https://cryptpad.fr/pad/#/2/pad/edit/4FGEw81flPUiivkjkuvQJ-CK/ Best Regards Lars ## Meeting time (please double check the times 15:00 - 16:00 UTC 07:00 - 08:00 PST (San Francisco) - sorry for the early time slot. If this is a problem, let's discuss at the call 10:00 - 11:00 EST (New York) 15:00 - 16:00 FMT (London) 16:00 - 17:00 CET (Berlin) 23:00 - 22:00 CST (Beijing) Further International meeting times: https://www.timeanddate.com/worldclock/meetingdetails.html?year=2018&month=11&day=7&hour=15&min=0&sec=0&p1=224&p2=24&p3=179&p4=136&p5=37&p6=33 ## Dial in details Web: https://www.gotomeet.me/larskurth You can also dial in using your phone. Access Code: 906-886-965 China (Toll Free): 4008 811084 Germany: +49 692 5736 7317 Poland (Toll Free): 00 800 1124759 United Kingdom: +44 330 221 0088 United States: +1 (571) 317-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Argentina (Toll Free): 0 800 444 3375 Bahrain (Toll Free): 800 81 111 Belarus (Toll Free): 8 820 0011 0400 Belgium: +32 28 93 7018 Brazil (Toll Free): 0 800 047 4906 Bulgaria (Toll Free): 00800 120 4417 Canada: +1 (647) 497-9391 Chile (Toll Free): 800 395 150 Colombia (Toll Free): 01 800 518 4483 Czech Republic (Toll Free): 800 500448 Denmark: +45 32 72 03 82 Finland: +358 923 17 0568 France: +33 170 950 594 Greece (Toll Free): 00 800 4414 3838 Hong Kong (Toll Free): 30713169 Hungary (Toll Free): (06) 80 986 255 Iceland (Toll Free): 800 7204 India (Toll Free): 18002669272 Indonesia (Toll Free): 007 803 020 5375 Ireland: +353 15 360 728 Israel (Toll Free): 1 809 454 830 Italy: +39 0 247 92 13 01 Japan (Toll Free): 0 120 663 800 Korea, Republic of (Toll Free): 00798 14 207 4914 Luxembourg (Toll Free): 800 85158 Malaysia (Toll Free): 1 800 81 6854 Mexico (Toll Free): 01 800 522 1133 Netherlands: +31 207 941 377 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Panama (Toll Free): 00 800 226 7928 Peru (Toll Free): 0 800 77023 Philippines (Toll Free): 1 800 1110 1661 Portugal (Toll Free): 800 819 575 Romania (Toll Free): 0 800 410 029 Russian Federation (Toll Free): 8 800 100 6203 Saudi Arabia (Toll Free): 800 844 3633 Singapore (Toll Free): 18007231323 South Africa (Toll Free): 0 800 555 447 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 225 4599 78 Taiwan (Toll Free): 0 800 666 854 Thailand (Toll Free): 001 800 011 023 Turkey (Toll Free): 00 800 4488 23683 Ukraine (Toll Free): 0 800 50 1733 United Arab Emirates (Toll Free): 800 044 40439 Uruguay (Toll Free): 0004 019 1018 Viet Nam (Toll Free): 122 80 481 First GoToMeeting? Let's do a quick system check: https://link.gotomeeting.com/system-check ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking and logging
> -Original Message- > From: Andrew Cooper > Sent: 04 November 2019 11:06 > To: Durrant, Paul ; xen-devel@lists.xenproject.org > Cc: Igor Druzhinin ; jgr...@suse.com; > jbeul...@suse.com > Subject: Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking > and logging > > On 04/11/2019 08:31, Durrant, Paul wrote: > >> -Original Message- > >> From: Igor Druzhinin > >> Sent: 01 November 2019 19:28 > >> To: xen-devel@lists.xenproject.org > >> Cc: Durrant, Paul ; jbeul...@suse.com; > >> jgr...@suse.com > >> Subject: [PATCH for-4.13 v2] passthrough: simplify locking and logging > >> > >> From: Paul Durrant > >> > >> Dropping the pcidevs lock between calling device_assigned() and > >> assign_device() means that the latter has to do the same check as the > >> former for no obvious gain. Also, since long running operations under > >> pcidevs lock already drop the lock and return -ERESTART periodically > there > >> is little point in immediately failing an assignment operation with > >> -ERESTART just because the pcidevs lock could not be acquired (for the > >> second time, having already blocked on acquiring the lock in > >> device_assigned()). > >> > >> This patch instead acquires the lock once for assignment (or test > assign) > >> operations directly in iommu_do_pci_domctl() and thus can remove the > >> duplicate domain ownership check in assign_device(). Whilst in the > >> neighbourhood, the patch also removes some debug logging from > >> assign_device() and deassign_device() and replaces it with proper error > >> logging, which allows error logging in iommu_do_pci_domctl() to be > >> removed. Also, since device_assigned() can tell the difference between > a > >> guest assigned device and a non-existent one, log the actual error > >> condition rather then being ambiguous for the sake a few extra lines of > >> code. > >> > >> Signed-off-by: Paul Durrant > >> --- > >> > >> This is XSA-302 followup and contains some changes important for > >> XenServer. > >> Juergen, could this be considered for 4.13 inclusion? > >> > >> v2: updated Paul's email address > > This was work you did at Citrix, yes? > > > Reviewed-by: Paul Durrant > > SoB and R-by? I did do the work while I was at Citrix, but surely the SoB must be updated since the patch is only now being posted? As for the R-b, why should that be historic? Paul > > ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking and logging
On 04/11/2019 08:31, Durrant, Paul wrote: >> -Original Message- >> From: Igor Druzhinin >> Sent: 01 November 2019 19:28 >> To: xen-devel@lists.xenproject.org >> Cc: Durrant, Paul ; jbeul...@suse.com; >> jgr...@suse.com >> Subject: [PATCH for-4.13 v2] passthrough: simplify locking and logging >> >> From: Paul Durrant >> >> Dropping the pcidevs lock between calling device_assigned() and >> assign_device() means that the latter has to do the same check as the >> former for no obvious gain. Also, since long running operations under >> pcidevs lock already drop the lock and return -ERESTART periodically there >> is little point in immediately failing an assignment operation with >> -ERESTART just because the pcidevs lock could not be acquired (for the >> second time, having already blocked on acquiring the lock in >> device_assigned()). >> >> This patch instead acquires the lock once for assignment (or test assign) >> operations directly in iommu_do_pci_domctl() and thus can remove the >> duplicate domain ownership check in assign_device(). Whilst in the >> neighbourhood, the patch also removes some debug logging from >> assign_device() and deassign_device() and replaces it with proper error >> logging, which allows error logging in iommu_do_pci_domctl() to be >> removed. Also, since device_assigned() can tell the difference between a >> guest assigned device and a non-existent one, log the actual error >> condition rather then being ambiguous for the sake a few extra lines of >> code. >> >> Signed-off-by: Paul Durrant >> --- >> >> This is XSA-302 followup and contains some changes important for >> XenServer. >> Juergen, could this be considered for 4.13 inclusion? >> >> v2: updated Paul's email address This was work you did at Citrix, yes? > Reviewed-by: Paul Durrant SoB and R-by? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] x86: explicitly disallow guest access to PPIN
On 01.11.2019 19:35, Andrew Cooper wrote: > On 30/10/2019 12:02, Jan Beulich wrote: >> On 30.10.2019 12:43, Andrew Cooper wrote: >>> On 30/10/2019 10:39, Jan Beulich wrote: To fulfill the "protected" in its name, don't let the real hardware values "shine through". Report a control register value expressing this. Signed-off-by: Jan Beulich --- TBD: Do we want to permit Dom0 access? >>> I would recommend reordering the two patches and putting this one first >>> (along with the enumeration details, along with a pair of feature >>> strings in xen-cpuid). This patch at least wants backporting. >> Well, the reason for this ordering is because this way Dom0 >> doesn't transiently lose all access. > > Nothing pre-existing can be used reliably by dom0 because of the > raz/write-discard behaviour. Why "raz" - default behavior for "un-enumerated" MSRs is to hand out the raw hardware value. I agree Dom0 can't _enable_ the PPIN MSR (due to the write-discard default behavior), but on systems where the firmware does the enabling it could still have read the values. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2] x86/passthrough: fix migration of MSI when using posted interrupts
On Sat, Nov 02, 2019 at 07:48:06AM +, Tian, Kevin wrote: > > From: Roger Pau Monné [mailto:roger@citrix.com] > > Sent: Thursday, October 31, 2019 11:23 PM > > > > On Thu, Oct 31, 2019 at 07:52:23AM -0700, Joe Jin wrote: > > > On 10/31/19 1:01 AM, Jan Beulich wrote: > > > > On 30.10.2019 19:01, Joe Jin wrote: > > > >> On 10/30/19 10:23 AM, Roger Pau Monné wrote: > > > >>> On Wed, Oct 30, 2019 at 09:38:16AM -0700, Joe Jin wrote: > > > On 10/30/19 1:24 AM, Roger Pau Monné wrote: > > > > Can you try to add the following debug patch on top of the existing > > > > one and report the output that you get on the Xen console? > > > > > > Applied debug patch and run the test again, not of any log printed, > > > attached Xen log on serial console, seems pi_update_irte() not been > > > called for iommu_intpost was false. > > > >>> > > > >>> I have to admit I'm lost at this point. Does it mean the original > > > >>> issue had nothing to do with posted interrupts? > > > >> > > > >> Looks when inject irq by vlapic_set_irq(), it checked by > > > >> hvm_funcs.deliver_posted_intr rather than iommu_intpost: > > > >> > > > >> 176 if ( hvm_funcs.deliver_posted_intr ) > > > >> 177 hvm_funcs.deliver_posted_intr(target, vec); > > > >> > > > >> And deliver_posted_intr() would be there, when vmx enabled: > > > >> > > > >> (XEN) HVM: VMX enabled > > > >> (XEN) HVM: Hardware Assisted Paging (HAP) detected > > > >> (XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB > > > > > > > > I can't see the connection. start_vmx() has > > > > > > > > if ( cpu_has_vmx_posted_intr_processing ) > > > > { > > > > alloc_direct_apic_vector(&posted_intr_vector, > > pi_notification_interrupt); > > > > if ( iommu_intpost ) > > > > alloc_direct_apic_vector(&pi_wakeup_vector, > > pi_wakeup_interrupt); > > > > > > > > vmx_function_table.deliver_posted_intr = > > > > vmx_deliver_posted_intr; > > > > vmx_function_table.sync_pir_to_irr = vmx_sync_pir_to_irr; > > > > vmx_function_table.test_pir= vmx_test_pir; > > > > } > > > > > > > > i.e. the hook is present only when posted interrupts are > > > > available in general. I.e. also with just CPU-side posted > > > > interrupts, yes, which gets confirmed by your "apicv=0" > > > > test. Yet with just CPU-side posted interrupts I'm > > > > struggling again to understand your original problem > > > > description, and the need to fiddle with IOMMU side code. > > > > > > Yes, on my test env, cpu_has_vmx_posted_intr_processing == true && > > iommu_intpost == false, > > > with this, posted interrupts been enabled. > > > > I'm still quite lost. My reading of the Intel VT-d spec is that the > > posted interrupt descriptor (which contains the PIRR) is used in > > conjunction with a posted interrupt remapping entry in the iommu, so > > that interrupts get recorded in the PIRR and later synced by the > > hypervisor into the vlapic IRR when resuming the virtual CPU. > > there are two parts. Intel first implements CPU posted interrupt, > which allows one CPU to post IPI into non-root context in another > CPU through posted interrupt descriptor. Later VT-d posted > interrupt comes, which use interrupt remapping entry and the > same posted interrupt descriptor (using more fields) to convert > a device interrupt into a posted interrupt. The posting process is > same on the dest CPU, regardless of whether it's from another CPU > or a device. Thanks for the description. So the problem reported by Jin happens when using CPU posted interrupts but not VT-d posted interrupts, in which case there shouldn't be a need to sync PIRR with IRR when interrupts from a passthrough device are reconfigured, because interrupts from that device shouldn't end up signaled in PIRR because VT-d posted interrupts is not being used. Do interrupts from passthrough devices end up signaled in the posted interrupt descriptor PIRR field when not using VT-d posted interrupts but using CPU posted interrupts? From my reading of your description above when using CPU posted interrupts only the vectors signaled in the PIRR field should belong to IPIs from other vCPUs? Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH for-4.13 v2] passthrough: simplify locking and logging
> -Original Message- > From: Igor Druzhinin > Sent: 01 November 2019 19:28 > To: xen-devel@lists.xenproject.org > Cc: Durrant, Paul ; jbeul...@suse.com; > jgr...@suse.com > Subject: [PATCH for-4.13 v2] passthrough: simplify locking and logging > > From: Paul Durrant > > Dropping the pcidevs lock between calling device_assigned() and > assign_device() means that the latter has to do the same check as the > former for no obvious gain. Also, since long running operations under > pcidevs lock already drop the lock and return -ERESTART periodically there > is little point in immediately failing an assignment operation with > -ERESTART just because the pcidevs lock could not be acquired (for the > second time, having already blocked on acquiring the lock in > device_assigned()). > > This patch instead acquires the lock once for assignment (or test assign) > operations directly in iommu_do_pci_domctl() and thus can remove the > duplicate domain ownership check in assign_device(). Whilst in the > neighbourhood, the patch also removes some debug logging from > assign_device() and deassign_device() and replaces it with proper error > logging, which allows error logging in iommu_do_pci_domctl() to be > removed. Also, since device_assigned() can tell the difference between a > guest assigned device and a non-existent one, log the actual error > condition rather then being ambiguous for the sake a few extra lines of > code. > > Signed-off-by: Paul Durrant > --- > > This is XSA-302 followup and contains some changes important for > XenServer. > Juergen, could this be considered for 4.13 inclusion? > > v2: updated Paul's email address Reviewed-by: Paul Durrant > > --- > xen/drivers/passthrough/pci.c | 101 ++--- > - > 1 file changed, 42 insertions(+), 59 deletions(-) > > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index e64666d..ea0770d 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -932,30 +932,27 @@ static int deassign_device(struct domain *d, > uint16_t seg, uint8_t bus, > break; > ret = hd->platform_ops->reassign_device(d, target, devfn, > pci_to_dev(pdev)); > -if ( !ret ) > -continue; > - > -printk(XENLOG_G_ERR "%pd: deassign %04x:%02x:%02x.%u failed > (%d)\n", > - d, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret); > -return ret; > +if ( ret ) > +goto out; > } > > devfn = pdev->devfn; > ret = hd->platform_ops->reassign_device(d, target, devfn, > pci_to_dev(pdev)); > if ( ret ) > -{ > -dprintk(XENLOG_G_ERR, > -"%pd: deassign device (%04x:%02x:%02x.%u) failed\n", > -d, seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn)); > -return ret; > -} > +goto out; > > if ( pdev->domain == hardware_domain ) > pdev->quarantine = false; > > pdev->fault.count = 0; > > +out: > +if ( ret ) > +printk(XENLOG_G_ERR > + "%pd: deassign device (%04x:%02x:%02x.%u) failed (%d)\n", > d, > + seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn), ret); > + > return ret; > } > > @@ -976,10 +973,7 @@ int pci_release_devices(struct domain *d) > { > bus = pdev->bus; > devfn = pdev->devfn; > -if ( deassign_device(d, pdev->seg, bus, devfn) ) > -printk("domain %d: deassign device (%04x:%02x:%02x.%u) > failed!\n", > - d->domain_id, pdev->seg, bus, > - PCI_SLOT(devfn), PCI_FUNC(devfn)); > +deassign_device(d, pdev->seg, bus, devfn); > } > pcidevs_unlock(); > > @@ -1534,8 +1528,7 @@ static int device_assigned(u16 seg, u8 bus, u8 > devfn) > struct pci_dev *pdev; > int rc = 0; > > -pcidevs_lock(); > - > +ASSERT(pcidevs_locked()); > pdev = pci_get_pdev(seg, bus, devfn); > > if ( !pdev ) > @@ -1549,11 +1542,10 @@ static int device_assigned(u16 seg, u8 bus, u8 > devfn) >pdev->domain != dom_io ) > rc = -EBUSY; > > -pcidevs_unlock(); > - > return rc; > } > > +/* caller should hold the pcidevs_lock */ > static int assign_device(struct domain *d, u16 seg, u8 bus, u8 devfn, u32 > flag) > { > const struct domain_iommu *hd = dom_iommu(d); > @@ -1571,23 +1563,11 @@ static int assign_device(struct domain *d, u16 > seg, u8 bus, u8 devfn, u32 flag) >vm_event_check_ring(d->vm_event_paging)) ) > return -EXDEV; > > -if ( !pcidevs_trylock() ) > -return -ERESTART; > - > +/* device_assigned() should already have cleared the device for > assignment */ > +ASSERT(pcidevs_locked()); > pdev = pci_get_pdev(seg, bus, devfn); > - > -rc = -ENODEV; > -if ( !pdev ) > -goto done; > - > -rc = 0; > -
Re: [Xen-devel] [PATCH 2/2] x86: explicitly disallow guest access to PPIN
On 01.11.2019 15:00, Eslam Elnikety wrote: > On 30.10.19 11:39, Jan Beulich wrote: >> @@ -237,6 +239,18 @@ int guest_rdmsr(struct vcpu *v, uint32_t >> ARRAY_SIZE(msrs->dr_mask))]; >> break; >> >> +case MSR_PPIN_CTL: >> +if ( d->arch.cpuid->x86_vendor != X86_VENDOR_INTEL ) >> +goto gp_fault; >> +*val = PPIN_LOCKOUT; >> +break; >> + >> +case MSR_AMD_PPIN_CTL: >> +if ( !cp->extd.amd_ppin ) >> +goto gp_fault; >> +*val = PPIN_LOCKOUT; >> +break; >> + > > nit: It is not clear to me why you use "d->arch.cpuid->.." (and not > "cp->..") in the first if condition. Simple oversight; corrected for v2. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH 2/2] x86: explicitly disallow guest access to PPIN
On 01.11.2019 15:29, Andrew Cooper wrote: > On 01/11/2019 14:00, Eslam Elnikety wrote: >> Thanks for this series, Jan. >> >> On 30.10.19 11:39, Jan Beulich wrote: >>> To fulfill the "protected" in its name, don't let the real hardware >>> values "shine through". Report a control register value expressing this. >>> >>> Signed-off-by: Jan Beulich >>> --- >>> TBD: Do we want to permit Dom0 access? >> >> It would be nice to give an administrator a way to get PPIN outside >> the context of an MCE when needed. > > I suppose this is a reasonable request. We should expose it to the > hardware domain. Via (new) sysctl (or platform op) or by allowing direct MSR read access? (If the former, I'd want to make this addition a separate patch.) Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel