Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
>>> On 04.08.16 at 17:08, wrote: > crash> x /130x 0x830bd0da1000 > 0x830bd0da1000: 0x000e 0x > 0x830bd0da1010: 0x 0x > 0x830bd0da1020: 0x 0x > 0x830bd0da1030: 0x 0x > 0x830bd0da1040: 0x 0x > 0x830bd0da1050: 0x 0x > 0x830bd0da1060: 0x 0x > 0x830bd0da1070: 0x 0x000bd0da3000 > 0x830bd0da1080: 0x000c17e36000 0x > 0x830bd0da1090: 0x 0x > 0x830bd0da10a0: 0xe7512000 0xe7513000 > 0x830bd0da10b0: 0x000bd0da 0x > 0x830bd0da10c0: 0x 0x > 0x830bd0da10d0: 0x 0x006fedea809b > 0x830bd0da10e0: 0x0001a379e000 0x000610f9101e > 0x830bd0da10f0: 0x 0x > 0x830bd0da1100: 0x 0x0007010600070106 > 0x830bd0da1110: 0x 0x > 0x830bd0da1120: 0x006bb6a075fa 0x00060042003f > 0x830bd0da1130: 0x 0x000fefff > 0x830bd0da1140: 0x 0x51ff > 0x830bd0da1150: 0x0041 0x > 0x830bd0da1160: 0x 0x000c > 0x830bd0da1170: 0x 0x > 0x830bd0da1180: 0x0001 0x > 0x830bd0da1190: 0x0008 0x > 0x830bd0da11a0: 0x0001 0x0096 > 0x830bd0da11b0: 0x82d0802bc208 0x806f6dbc > 0x830bd0da11c0: 0x 0x0400 > 0x830bd0da11d0: 0x80550f34 0xf0e48161 > 0x830bd0da11e0: 0x0246 0x > 0x830bd0da11f0: 0xf79c3000 0x804de6f0 > 0x830bd0da1200: 0x0023 0x > 0x830bd0da1210: 0x00c0f300 0x0008 > 0x830bd0da1220: 0x 0x00c09b00 > 0x830bd0da1230: 0x0010 0x > 0x830bd0da1240: 0x00c09300 0x0023 > 0x830bd0da1250: 0x 0x00c0f300 > 0x830bd0da1260: 0x0030 0xffdff000 > 0x830bd0da1270: 0x00c093001fff 0x > 0x830bd0da1280: 0x 0x01c0 > 0x830bd0da1290: 0x 0x > 0x830bd0da12a0: 0x01c0 0x0028 > 0x830bd0da12b0: 0x80042000 0x8b0020ab > 0x830bd0da12c0: 0x8003f000 0x8003f400 > 0x830bd0da12d0: 0x07ff03ff 0x8001003b > 0x830bd0da12e0: 0x00039000 0x26d9 > 0x830bd0da12f0: 0xdc3c 0x > 0x830bd0da1300: 0xe008 0x > 0x830bd0da1310: 0x 0xe040 > 0x830bd0da1320: 0x050100070406 0x > 0x830bd0da1330: 0x 0x80050033 > 0x830bd0da1340: 0x0001bd665000 0x26e0 > 0x830bd0da1350: 0x 0x > 0x830bd0da1360: 0x830c17e38c80 0x830617fd3000 > 0x830bd0da1370: 0x830617fcf000 0x830617fd7fc0 > 0x830bd0da1380: 0x82d08024e150 0x830617fd7f90 > 0x830bd0da1390: 0x82d080201bb0 0xe008 > 0x830bd0da13a0: 0x0060 0x > 0x830bd0da13b0: 0x 0x > 0x830bd0da13c0: 0x 0x > 0x830bd0da13d0: 0x8001003b 0x06d9 > 0x830bd0da13e0: 0x 0x > 0x830bd0da13f0: 0x 0x > 0x830bd0da1400: 0x 0x > > I don't quite understand the Intel developer manual at this point. How do I > have to read this data? I don't think this is formally specified anywhere (publicly). After all that's why one has to use vmread/vmwrite. > Since if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) must be true I assume > the > __vmwrite tries to | 0x8 into the host_cr0 leading to the 0x80050033 > for the current host_cr0 ( or better the 0x80050033
Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
d0da1230: 0x0010 0x 0x830bd0da1240: 0x00c09300 0x0023 0x830bd0da1250: 0x 0x00c0f300 0x830bd0da1260: 0x0030 0xffdff000 0x830bd0da1270: 0x00c093001fff 0x 0x830bd0da1280: 0x 0x01c0 0x830bd0da1290: 0x 0x 0x830bd0da12a0: 0x01c0 0x0028 0x830bd0da12b0: 0x80042000 0x8b0020ab 0x830bd0da12c0: 0x8003f000 0x8003f400 0x830bd0da12d0: 0x07ff03ff 0x8001003b 0x830bd0da12e0: 0x00039000 0x26d9 0x830bd0da12f0: 0xdc3c 0x 0x830bd0da1300: 0xe008 0x 0x830bd0da1310: 0x 0xe040 0x830bd0da1320: 0x050100070406 0x 0x830bd0da1330: 0x 0x80050033 0x830bd0da1340: 0x0001bd665000 0x26e0 0x830bd0da1350: 0x 0x 0x830bd0da1360: 0x830c17e38c80 0x830617fd3000 0x830bd0da1370: 0x830617fcf000 0x830617fd7fc0 0x830bd0da1380: 0x82d08024e150 0x830617fd7f90 0x830bd0da1390: 0x82d080201bb0 0xe008 0x830bd0da13a0: 0x0060 0x 0x830bd0da13b0: 0x 0x 0x830bd0da13c0: 0x 0x 0x830bd0da13d0: 0x8001003b 0x06d9 0x830bd0da13e0: 0x 0x 0x830bd0da13f0: 0x 0x 0x830bd0da1400: 0x 0x I don't quite understand the Intel developer manual at this point. How do I have to read this data? Since if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) must be true I assume the __vmwrite tries to | 0x8 into the host_cr0 leading to the 0x80050033 for the current host_cr0 ( or better the 0x80050033 ). Or at least this is what I think was intended to happen. > -Ursprüngliche Nachricht- > Von: Jan Beulich [mailto:jbeul...@suse.com] > Gesendet: Mittwoch, 3. August 2016 15:54 > An: Mayer, Kevin > Cc: andrew.coop...@citrix.com; xen-devel@lists.xen.org > Betreff: Re: AW: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default > > >>> On 03.08.16 at 15:24, wrote: > > I got around to take a closer look at the crash dump today. > > > > tl;dr: > > You were right, vmx_vmenter_helper is not called at all in the call stack. > > The real reason behind the [] > > vmx_vmenter_helper+0x27e/0x30a should be a failed > __vmwrite(HOST_CR0, > > v->arch.hvm_vmx.host_cr0); in static void vmx_fpu_leave(struct vcpu > > *v). > > Ah - that's what you get for not using most recent code, and what I get for > not considering the effect of you being on 4.6.x. In any event - the call > stack > is then fine, and you'll want to figure out which bit(s) of the new CR0 value > are in conflict with the rest of the active VMCS. > > Jan Virus checked by G Data MailSecurity Version: AVA 25.7724 dated 04.08.2016 Virus news: www.antiviruslab.com ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
>>> On 03.08.16 at 15:24, wrote: > I got around to take a closer look at the crash dump today. > > tl;dr: > You were right, vmx_vmenter_helper is not called at all in the call stack. > The real reason behind the [] > vmx_vmenter_helper+0x27e/0x30a > should be a failed > __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); in static void > vmx_fpu_leave(struct vcpu *v). Ah - that's what you get for not using most recent code, and what I get for not considering the effect of you being on 4.6.x. In any event - the call stack is then fine, and you'll want to figure out which bit(s) of the new CR0 value are in conflict with the rest of the active VMCS. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
3 :or (%rdi),%ecx 0x82d0801fa0a5 :and%al,%al 0x82d0801fa0a7 :test $0x8,%al 0x82d0801fa0a9 :jne 0x82d0801fa0ad 0x82d0801fa0ab :ud2 0x82d0801fa0ac :or -0x75(%rax),%ecx 0x82d0801fa0af :xchg %eax,(%rax) 0x82d0801fa0b1 :(bad) 0x82d0801fa0b2 :add%al,(%rax) 0x82d0801fa0b4 :test $0x8,%al 0x82d0801fa0b6 :jne 0x82d0801fa0d1 0x82d0801fa0b8 :or $0x8,%rax 0x82d0801fa0bc :mov%rax,0x700(%rdi) 0x82d0801fa0c3 :mov$0x6c00,%edx 0x82d0801fa0c8 :vmwrite %rax,%rdx 0x82d0801fa0cb :jbe 0x82d0801fd23a The two test, jne should be the if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) and if ( !(v->arch.hvm_vcpu.guest_cr[0] & X86_CR0_TS) ) conditions. mov$0x6c00,%edx vmwrite %rax,%rdx Should be the __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); So the real reason the hypervisor panics should be a failing __vmwrite in static void vmx_fpu_leave(struct vcpu *v) which simply jumps to a location behind vmx_vmenter_helper, thereby creating a slightly confusing stack trace. Chapter2: This does not seem to have anything to do with the altp2m so I looked at the stray vmx_vcpu_update_eptp which can be seen in the bt, but not in the xen dmesg. #10 [830617fd7d28] destroy_perdomain_mapping at 82d080196152 #11 [830617fd7d38] vmx_vcpu_update_eptp at 82d0801f7c6b #12 [830617fd7d78] free_compat_arg_xlat at 82d080244a62 This function is not called by free_compat_arg_xlat: void free_compat_arg_xlat(struct vcpu *v) { destroy_perdomain_mapping(v->domain, ARG_XLAT_START(v), PFN_UP(COMPAT_ARG_XLAT_SIZE)); } Instead it is called BEFORE free_compat_arg_xlat in hvm_vcpu_destroy->altp2m_vcpu_destroy->altp2m_vcpu_update_p2m->hvm_funcs.altp2m_vcpu_update_p2m (.altp2m_vcpu_update_p2m = vmx_vcpu_update_eptp,) : void hvm_vcpu_destroy(struct vcpu *v) { hvm_all_ioreq_servers_remove_vcpu(v->domain, v); if ( hvm_altp2m_supported() ) altp2m_vcpu_destroy(v); nestedhvm_vcpu_destroy(v); free_compat_arg_xlat(v); [...] Here the enabling of the altp2m has an effect, but I have no idea how it could lead to a failing __vmwrite. Any ideas where in the altp2m-code the error could be, or how I could help in finding it? Cheers Kevin > -Ursprüngliche Nachricht- > Von: Jan Beulich [mailto:jbeul...@suse.com] > Gesendet: Dienstag, 2. August 2016 14:34 > An: Mayer, Kevin > Cc: andrew.coop...@citrix.com; xen-devel@lists.xen.org > Betreff: Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default > > >>> On 02.08.16 at 13:45, wrote: > > (XEN) [ Xen-4.6.1 x86_64 debug=y Not tainted ] > > (XEN) CPU:6 > > (XEN) RIP:e008:[] > vmx_vmenter_helper+0x27e/0x30a > > (XEN) RFLAGS: 00010003 CONTEXT: hypervisor > > (XEN) rax: 8005003b rbx: 8300e72fc000 rcx: > > > (XEN) rdx: 6c00 rsi: 830617fd7fc0 rdi: 8300e6fc > > (XEN) rbp: 830617fd7c40 rsp: 830617fd7c30 r8: > > (XEN) r9: 830be8dc9310 r10: r11: 3475e9cf85d0 > > (XEN) r12: 0006 r13: 830c14ee1000 r14: 8300e6fc > > (XEN) r15: 830617fd cr0: 8005003b cr4: > 26e0 > > (XEN) cr3: 0001bd665000 cr2: 0451 > > (XEN) ds: es: fs: gs: ss: cs: e008 > > (XEN) Xen stack trace from rsp=830617fd7c30: > > (XEN)830617fd7c40 8300e72fc000 830617fd7ca0 > 82d080174f91 > > (XEN)830617fd7f18 830be8dc9000 0286 > 830617fd7c90 > > (XEN)0206 0246 0001 > 830617e91250 > > (XEN)8300e72fc000 830be8dc9000 830617fd7cc0 > 82d080178c19 > > (XEN)00bdeeae 8300e72fc000 830617fd7cd0 > 82d080178c3e > > (XEN)830617fd7d20 82d080179740 8300e6fc2000 > 830c17e38e80 > > (XEN)830617e91250 82008000 0002 > 830617e91250 > > (XEN)830617e91240 830be8dc9000 830617fd7d70 > 82d080196152 > > (XEN)830617fd7d50 82d0801f7c6b 8300e6fc2000 > 830617e91250 > > (XEN)8300e6fc2000 830617e91250 830617e91240 > 830be8dc9000 > > (XEN)830617fd7d80 82d080244a62 830617fd7db0 > 82d0801d3fe2 > > (XEN)8300e6fc2000 830617e91f28 > 830617e91000 > > (XEN)830617fd7dd0 82d080175c2c 8300e6fc2000 > 8300e6fc2000 > > (XEN)830617fd7e00
Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
>>> On 02.08.16 at 13:45, wrote: > (XEN) [ Xen-4.6.1 x86_64 debug=y Not tainted ] > (XEN) CPU:6 > (XEN) RIP:e008:[] vmx_vmenter_helper+0x27e/0x30a > (XEN) RFLAGS: 00010003 CONTEXT: hypervisor > (XEN) rax: 8005003b rbx: 8300e72fc000 rcx: > (XEN) rdx: 6c00 rsi: 830617fd7fc0 rdi: 8300e6fc > (XEN) rbp: 830617fd7c40 rsp: 830617fd7c30 r8: > (XEN) r9: 830be8dc9310 r10: r11: 3475e9cf85d0 > (XEN) r12: 0006 r13: 830c14ee1000 r14: 8300e6fc > (XEN) r15: 830617fd cr0: 8005003b cr4: 26e0 > (XEN) cr3: 0001bd665000 cr2: 0451 > (XEN) ds: es: fs: gs: ss: cs: e008 > (XEN) Xen stack trace from rsp=830617fd7c30: > (XEN)830617fd7c40 8300e72fc000 830617fd7ca0 82d080174f91 > (XEN)830617fd7f18 830be8dc9000 0286 830617fd7c90 > (XEN)0206 0246 0001 830617e91250 > (XEN)8300e72fc000 830be8dc9000 830617fd7cc0 82d080178c19 > (XEN)00bdeeae 8300e72fc000 830617fd7cd0 82d080178c3e > (XEN)830617fd7d20 82d080179740 8300e6fc2000 830c17e38e80 > (XEN)830617e91250 82008000 0002 830617e91250 > (XEN)830617e91240 830be8dc9000 830617fd7d70 82d080196152 > (XEN)830617fd7d50 82d0801f7c6b 8300e6fc2000 830617e91250 > (XEN)8300e6fc2000 830617e91250 830617e91240 830be8dc9000 > (XEN)830617fd7d80 82d080244a62 830617fd7db0 82d0801d3fe2 > (XEN)8300e6fc2000 830617e91f28 830617e91000 > (XEN)830617fd7dd0 82d080175c2c 8300e6fc2000 8300e6fc2000 > (XEN)830617fd7e00 82d080105dd4 830c17e38040 > (XEN) 830617fd 830617fd7e30 82d0801215fd > (XEN)8300e6fc 82d080329280 82d080328f80 fffd > (XEN)830617fd7e60 82d08012caf8 0006 830c17e3bc60 > (XEN)0002 830c17e3bbe0 830617fd7e70 82d08012cb3b > (XEN)830617fd7ef0 82d0801c23a8 8300e72fc000 > (XEN)82d0801f3200 830617fd7f08 82d080329280 > (XEN) Xen call trace: > (XEN)[] vmx_vmenter_helper+0x27e/0x30a > (XEN)[] __context_switch+0xdb/0x3b5 > (XEN)[] __sync_local_execstate+0x5e/0x7a > (XEN)[] sync_local_execstate+0x9/0xb > (XEN)[] map_domain_page+0xa0/0x5d4 > (XEN)[] destroy_perdomain_mapping+0x8f/0x1e8 > (XEN)[] free_compat_arg_xlat+0x26/0x28 > (XEN)[] hvm_vcpu_destroy+0x73/0xb0 > (XEN)[] vcpu_destroy+0x5d/0x72 > (XEN)[] complete_domain_destroy+0x49/0x192 > (XEN)[] rcu_process_callbacks+0x19a/0x1fb > (XEN)[] __do_softirq+0x82/0x8d > (XEN)[] process_pending_softirqs+0x38/0x3a > (XEN)[] mwait_idle+0x10c/0x315 > (XEN)[] idle_loop+0x51/0x6b On this deep a stack execution can't validly end up in vmx_vmenter_helper: That's a function called only when the stack is almost empty. Nor is the caller of it the context switch code. Hence your problem starts quite a bit earlier - perhaps memory corruption? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
6 8300e6fc BL U 164 830be8dc9000 0 0 8300e6fc6000 BL U 165 830bd0cc So in contrast to the last dump the crashing CPU is running DOMID 32767 (the Dom-0) if I understand the output correctly. Kevin Von: Andrew Cooper [mailto:andrew.coop...@citrix.com] Gesendet: Freitag, 29. Juli 2016 12:05 An: Mayer, Kevin ; xen-devel@lists.xen.org Betreff: Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default On 29/07/16 08:33, kevin.ma...@gdata.de<mailto:kevin.ma...@gdata.de> wrote: Hi guys We are using Xen 4.6.1 to manage our virtual machines on x86-64-servers. We start dozens of VMs and destroy them again after 60 seconds, which works fine as it is, but the next step in our approach requires the use of the altp2m functionality. Since libvirt does not pass the altp2m-enable flag to the hypervisor we enabled altp2m unconditionally by patching the hvm.c . Since all of our machines support the altp2m this seemed to be ok. altp2m is emulated in software when hardware support isn't available, so it should work on all hardware (albeit with rather higher overhead). d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1; d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] = SHUTDOWN_reboot; +d->arch.hvm_domain.params[HVM_PARAM_ALTP2M] = 1; + This looks to be ok, given your situation. vpic_init(d); rc = vioapic_init(d); Since applying this patch the hypervisor crashes after several hundred restarted VMs (without any altp2m-functionality used by us) with the following dmesg: (XEN) [ Xen-4.6.1 x86_64 debug=n Not tainted ] As a start, please always use a debug hypervisor for investigating issues like this. (XEN) CPU:7 (XEN) RIP:e008:[] vmx_vmenter_helper+0x2b5/0x340 (XEN) RFLAGS: 00010003 CONTEXT: hypervisor (d0v3) (XEN) rax: 8005003b rbx: 8300e7038000 rcx: 0008 (XEN) rdx: 6c00 rsi: 83062eb5e000 rdi: 8300e7038000 (XEN) rbp: 830c17e3f000 rsp: 830617fc7d70 r8: (XEN) r9: 83014f8d7028 r10: 02700f858000 r11: 2201be6861f0 (XEN) r12: 83062eb5e000 r13: 8300e752f000 r14: 82d08030ea40 (XEN) r15: 0007 cr0: 8005003b cr4: 26e0 (XEN) cr3: 0001bf4da000 cr2: dd840c00 (XEN) ds: es: fs: gs: ss: cs: e008 (XEN) Xen stack trace from rsp=830617fc7d70: (XEN)8300e7038000 82d080170c04 000780109f6a (XEN)830617fc7f18 831e 8300e752f19c (XEN)0286 8300e752f000 8300e72fc000 0007 (XEN)830c17e3f000 830c14ee1000 82d08030ea40 82d080173d6a (XEN) (XEN)82d08030ea40 8300e72fc000 02700f481091 0001 (XEN)82d080324560 82d08030ea40 8300e752f000 82d080128004 (XEN)0001 01c9c380 830c14ef60e8 17fce600 (XEN)0001 82d0801bd18b 82d0801d9e88 8300e752f000 (XEN)01c9c380 82d08012e700 006e0171 (XEN)830617fc 82d0802f8f80 83062eb5e000 (XEN)82d08030ea40 82d08012b040 8300e7038000 830617fc (XEN)8300e7038000 830c14ee1000 82d080170970 (XEN)8300e72fc000 (XEN) 80550f50 ffdffc70 (XEN) 2fcffe19 (XEN)ffdffc70 ffdffc50 853b0918 (XEN)00fa f0e48162 0246 (XEN)80550f34 (XEN) 0007 8300e752f000 (XEN) Xen call trace: (XEN)[] vmx_vmenter_helper+0x2b5/0x340 (XEN)[] __context_switch+0xb4/0x350 (XEN)[] context_switch+0xca/0xef0 (XEN)[] schedule+0x264/0x5f0 (XEN)[] mwait_idle+0x25b/0x3a0 (XEN)[] hvm_vcpu_has_pending_irq+0x58/0xc0 (XEN)[] timer_softirq_action+0x80/0x250 (XEN)[] __do_softirq+0x60/0x90 (XEN)[] idle_loop+0x20/0x50 (XEN) (XEN) (XEN) (XEN) Panic on CPU 7: (XEN) FATAL TRAP: vector = 6 (invalid opcode) (XEN) (XEN) (XEN) Reboot in five seconds... (XEN) Executing kexec image on cpu7 (XEN) Shot down all CPUs The RIP points to ud2 0x82d0801f5a55: ud2 From the RFLAGS we concluded that the vmwrite failed due to an invalid vmcs-pointer (CF = 1), but this is where we are stuck since we have no idea how the pointer could have gotten corrupted. crash> vcpu gives vmcs = 0x817cbc20 for vcpu_id = 7, and vcpus gives VC
Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
On 29/07/16 08:33, kevin.ma...@gdata.de wrote: > > Hi guys > > > > We are using Xen 4.6.1 to manage our virtual machines on x86-64-servers. > > We start dozens of VMs and destroy them again after 60 seconds, which > works fine as it is, but the next step in our approach requires the > use of the altp2m functionality. > > Since libvirt does not pass the altp2m-enable flag to the hypervisor > we enabled altp2m unconditionally by patching the hvm.c . Since all of > our machines support the altp2m this seemed to be ok. > altp2m is emulated in software when hardware support isn't available, so it should work on all hardware (albeit with rather higher overhead). > > > d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1; > > d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] = > SHUTDOWN_reboot; > > +d->arch.hvm_domain.params[HVM_PARAM_ALTP2M] = 1; > > + > This looks to be ok, given your situation. > vpic_init(d); > > rc = vioapic_init(d); > > > > Since applying this patch the hypervisor crashes after several hundred > restarted VMs (without any altp2m-functionality used by us) with the > following dmesg: > > > > (XEN) [ Xen-4.6.1 x86_64 debug=n Not tainted ] > As a start, please always use a debug hypervisor for investigating issues like this. > (XEN) CPU:7 > > (XEN) RIP:e008:[] vmx_vmenter_helper+0x2b5/0x340 > > (XEN) RFLAGS: 00010003 CONTEXT: hypervisor (d0v3) > > (XEN) rax: 8005003b rbx: 8300e7038000 rcx: > 0008 > > (XEN) rdx: 6c00 rsi: 83062eb5e000 rdi: > 8300e7038000 > > (XEN) rbp: 830c17e3f000 rsp: 830617fc7d70 r8: > > > (XEN) r9: 83014f8d7028 r10: 02700f858000 r11: > 2201be6861f0 > > (XEN) r12: 83062eb5e000 r13: 8300e752f000 r14: > 82d08030ea40 > > (XEN) r15: 0007 cr0: 8005003b cr4: > 26e0 > > (XEN) cr3: 0001bf4da000 cr2: dd840c00 > > (XEN) ds: es: fs: gs: ss: cs: e008 > > (XEN) Xen stack trace from rsp=830617fc7d70: > > (XEN)8300e7038000 82d080170c04 > 000780109f6a > > (XEN)830617fc7f18 831e > 8300e752f19c > > (XEN)0286 8300e752f000 8300e72fc000 > 0007 > > (XEN)830c17e3f000 830c14ee1000 82d08030ea40 > 82d080173d6a > > (XEN) > > > (XEN)82d08030ea40 8300e72fc000 02700f481091 > 0001 > > (XEN)82d080324560 82d08030ea40 8300e752f000 > 82d080128004 > > (XEN)0001 01c9c380 830c14ef60e8 > 17fce600 > > (XEN)0001 82d0801bd18b 82d0801d9e88 > 8300e752f000 > > (XEN)01c9c380 82d08012e700 006e0171 > > > (XEN)830617fc 82d0802f8f80 > 83062eb5e000 > > (XEN)82d08030ea40 82d08012b040 8300e7038000 > 830617fc > > (XEN)8300e7038000 830c14ee1000 > 82d080170970 > > (XEN)8300e72fc000 > > > (XEN) 80550f50 ffdffc70 > > > (XEN) > 2fcffe19 > > (XEN)ffdffc70 ffdffc50 > 853b0918 > > (XEN)00fa f0e48162 > 0246 > > (XEN)80550f34 > > > (XEN) 0007 > 8300e752f000 > > (XEN) Xen call trace: > > (XEN)[] vmx_vmenter_helper+0x2b5/0x340 > > (XEN)[] __context_switch+0xb4/0x350 > > (XEN)[] context_switch+0xca/0xef0 > > (XEN)[] schedule+0x264/0x5f0 > > (XEN)[] mwait_idle+0x25b/0x3a0 > > (XEN)[] hvm_vcpu_has_pending_irq+0x58/0xc0 > > (XEN)[] timer_softirq_action+0x80/0x250 > > (XEN)[] __do_softirq+0x60/0x90 > > (XEN)[] idle_loop+0x20/0x50 > > (XEN) > > (XEN) > > (XEN) > > (XEN) Panic on CPU 7: > > (XEN) FATAL TRAP: vector = 6 (invalid opcode) > > (XEN) > > (XEN) > > (XEN) Reboot in five seconds... > > (XEN) Executing kexec image on cpu7 > > (XEN) Shot down all CPUs > > > > The RIP points to ud2 > > 0x82d0801f5a55: ud2 > > From the RFLAGS we concluded that the vmwrite failed due to an invalid > vmcs-pointer (CF = 1), but this is where we are stuck since we have no > idea how the pointer could have gotten corrupted. > > crash> vcpu > > gives vmcs = 0x817cbc20 for vcpu_id = 7, > > > > and vcpus gives > > > >VCID PCID VCPU ST T DOMID DOMAIN > > 0 0 8300e75f2000 RU I 32767 830c14ee1000 > >
Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
On Fri, 2016-07-29 at 07:33 +, kevin.ma...@gdata.de wrote: > Hi guys > Hi, I'm pretty much just Cc-ing maintainers/key people, to see if they have ideas. Only one thing. Since you are rebuilding Xen anyway, I think it could be helpful to try a debug build, and post the dump it will produce. > (XEN) [ Xen-4.6.1 x86_64 debug=n Not tainted ] > I.e., this would need to become debug=y. Since you said you're using 4.6.x, I think putting "debug=y" in a .config file (and then rebuilding and reinstalling, of course) should be enough. Regards, Dario -- <> (Raistlin Majere) - Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org https://lists.xen.org/xen-devel
[Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
Hi guys We are using Xen 4.6.1 to manage our virtual machines on x86-64-servers. We start dozens of VMs and destroy them again after 60 seconds, which works fine as it is, but the next step in our approach requires the use of the altp2m functionality. Since libvirt does not pass the altp2m-enable flag to the hypervisor we enabled altp2m unconditionally by patching the hvm.c . Since all of our machines support the altp2m this seemed to be ok. d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1; d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] = SHUTDOWN_reboot; +d->arch.hvm_domain.params[HVM_PARAM_ALTP2M] = 1; + vpic_init(d); rc = vioapic_init(d); Since applying this patch the hypervisor crashes after several hundred restarted VMs (without any altp2m-functionality used by us) with the following dmesg: (XEN) [ Xen-4.6.1 x86_64 debug=n Not tainted ] (XEN) CPU:7 (XEN) RIP:e008:[] vmx_vmenter_helper+0x2b5/0x340 (XEN) RFLAGS: 00010003 CONTEXT: hypervisor (d0v3) (XEN) rax: 8005003b rbx: 8300e7038000 rcx: 0008 (XEN) rdx: 6c00 rsi: 83062eb5e000 rdi: 8300e7038000 (XEN) rbp: 830c17e3f000 rsp: 830617fc7d70 r8: (XEN) r9: 83014f8d7028 r10: 02700f858000 r11: 2201be6861f0 (XEN) r12: 83062eb5e000 r13: 8300e752f000 r14: 82d08030ea40 (XEN) r15: 0007 cr0: 8005003b cr4: 26e0 (XEN) cr3: 0001bf4da000 cr2: dd840c00 (XEN) ds: es: fs: gs: ss: cs: e008 (XEN) Xen stack trace from rsp=830617fc7d70: (XEN)8300e7038000 82d080170c04 000780109f6a (XEN)830617fc7f18 831e 8300e752f19c (XEN)0286 8300e752f000 8300e72fc000 0007 (XEN)830c17e3f000 830c14ee1000 82d08030ea40 82d080173d6a (XEN) (XEN)82d08030ea40 8300e72fc000 02700f481091 0001 (XEN)82d080324560 82d08030ea40 8300e752f000 82d080128004 (XEN)0001 01c9c380 830c14ef60e8 17fce600 (XEN)0001 82d0801bd18b 82d0801d9e88 8300e752f000 (XEN)01c9c380 82d08012e700 006e0171 (XEN)830617fc 82d0802f8f80 83062eb5e000 (XEN)82d08030ea40 82d08012b040 8300e7038000 830617fc (XEN)8300e7038000 830c14ee1000 82d080170970 (XEN)8300e72fc000 (XEN) 80550f50 ffdffc70 (XEN) 2fcffe19 (XEN)ffdffc70 ffdffc50 853b0918 (XEN)00fa f0e48162 0246 (XEN)80550f34 (XEN) 0007 8300e752f000 (XEN) Xen call trace: (XEN)[] vmx_vmenter_helper+0x2b5/0x340 (XEN)[] __context_switch+0xb4/0x350 (XEN)[] context_switch+0xca/0xef0 (XEN)[] schedule+0x264/0x5f0 (XEN)[] mwait_idle+0x25b/0x3a0 (XEN)[] hvm_vcpu_has_pending_irq+0x58/0xc0 (XEN)[] timer_softirq_action+0x80/0x250 (XEN)[] __do_softirq+0x60/0x90 (XEN)[] idle_loop+0x20/0x50 (XEN) (XEN) (XEN) (XEN) Panic on CPU 7: (XEN) FATAL TRAP: vector = 6 (invalid opcode) (XEN) (XEN) (XEN) Reboot in five seconds... (XEN) Executing kexec image on cpu7 (XEN) Shot down all CPUs The RIP points to ud2 0x82d0801f5a55: ud2 >From the RFLAGS we concluded that the vmwrite failed due to an invalid >vmcs-pointer (CF = 1), but this is where we are stuck since we have no idea >how the pointer could have gotten corrupted. crash> vcpu gives vmcs = 0x817cbc20 for vcpu_id = 7, and vcpus gives VCID PCID VCPU ST T DOMID DOMAIN 0 0 8300e75f2000 RU I 32767 830c14ee1000 1 1 8300e72fe000 RU I 32767 830c14ee1000 2 2 8300e7527000 RU I 32767 830c14ee1000 > 3 3 8300e7526000 RU I 32767 830c14ee1000 4 4 8300e75f1000 RU I 32767 830c14ee1000 > 5 5 8300e75f RU I 32767 830c14ee1000 > 6 6 8300e72fd000 RU I 32767 830c14ee1000 7 7 8300e72fc000 RU I 32767 830c14ee1000 0 0 8300e72fa000 BL 0 0 830c17e3f000 1 6 8300e72f9000 BL 0 0 830c17e3f000 2 3 8300e72f8000 BL 0 0 830c17e3f000 > 3 7 8300e752f000 RU 0 0 830c17e3f000 4 5 8300e752e000 RU 0 0 830c17e3f000 >