Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default

2016-08-04 Thread Jan Beulich
>>> On 04.08.16 at 17:08,  wrote:
> crash> x /130x 0x830bd0da1000
> 0x830bd0da1000: 0x000e  0x
> 0x830bd0da1010: 0x  0x
> 0x830bd0da1020: 0x  0x
> 0x830bd0da1030: 0x  0x
> 0x830bd0da1040: 0x  0x
> 0x830bd0da1050: 0x  0x
> 0x830bd0da1060: 0x  0x
> 0x830bd0da1070: 0x  0x000bd0da3000
> 0x830bd0da1080: 0x000c17e36000  0x
> 0x830bd0da1090: 0x  0x
> 0x830bd0da10a0: 0xe7512000  0xe7513000
> 0x830bd0da10b0: 0x000bd0da  0x
> 0x830bd0da10c0: 0x  0x
> 0x830bd0da10d0: 0x  0x006fedea809b
> 0x830bd0da10e0: 0x0001a379e000  0x000610f9101e
> 0x830bd0da10f0: 0x  0x
> 0x830bd0da1100: 0x  0x0007010600070106
> 0x830bd0da1110: 0x  0x
> 0x830bd0da1120: 0x006bb6a075fa  0x00060042003f
> 0x830bd0da1130: 0x  0x000fefff
> 0x830bd0da1140: 0x  0x51ff
> 0x830bd0da1150: 0x0041  0x
> 0x830bd0da1160: 0x  0x000c
> 0x830bd0da1170: 0x  0x
> 0x830bd0da1180: 0x0001  0x
> 0x830bd0da1190: 0x0008  0x
> 0x830bd0da11a0: 0x0001  0x0096
> 0x830bd0da11b0: 0x82d0802bc208  0x806f6dbc
> 0x830bd0da11c0: 0x  0x0400
> 0x830bd0da11d0: 0x80550f34  0xf0e48161
> 0x830bd0da11e0: 0x0246  0x
> 0x830bd0da11f0: 0xf79c3000  0x804de6f0
> 0x830bd0da1200: 0x0023  0x
> 0x830bd0da1210: 0x00c0f300  0x0008
> 0x830bd0da1220: 0x  0x00c09b00
> 0x830bd0da1230: 0x0010  0x
> 0x830bd0da1240: 0x00c09300  0x0023
> 0x830bd0da1250: 0x  0x00c0f300
> 0x830bd0da1260: 0x0030  0xffdff000
> 0x830bd0da1270: 0x00c093001fff  0x
> 0x830bd0da1280: 0x  0x01c0
> 0x830bd0da1290: 0x  0x
> 0x830bd0da12a0: 0x01c0  0x0028
> 0x830bd0da12b0: 0x80042000  0x8b0020ab
> 0x830bd0da12c0: 0x8003f000  0x8003f400
> 0x830bd0da12d0: 0x07ff03ff  0x8001003b
> 0x830bd0da12e0: 0x00039000  0x26d9
> 0x830bd0da12f0: 0xdc3c  0x
> 0x830bd0da1300: 0xe008  0x
> 0x830bd0da1310: 0x  0xe040
> 0x830bd0da1320: 0x050100070406  0x
> 0x830bd0da1330: 0x  0x80050033
> 0x830bd0da1340: 0x0001bd665000  0x26e0
> 0x830bd0da1350: 0x  0x
> 0x830bd0da1360: 0x830c17e38c80  0x830617fd3000
> 0x830bd0da1370: 0x830617fcf000  0x830617fd7fc0
> 0x830bd0da1380: 0x82d08024e150  0x830617fd7f90
> 0x830bd0da1390: 0x82d080201bb0  0xe008
> 0x830bd0da13a0: 0x0060  0x
> 0x830bd0da13b0: 0x  0x
> 0x830bd0da13c0: 0x  0x
> 0x830bd0da13d0: 0x8001003b  0x06d9
> 0x830bd0da13e0: 0x  0x
> 0x830bd0da13f0: 0x  0x
> 0x830bd0da1400: 0x  0x
> 
> I don't quite understand the Intel developer manual at this point. How do I 
> have to read this data?

I don't think this is formally specified anywhere (publicly). After all that's
why one has to use vmread/vmwrite. 

> Since if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) must be true I assume 
> the 
> __vmwrite tries to | 0x8 into the host_cr0 leading to the 0x80050033 
> for the current host_cr0 ( 

Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default

2016-08-04 Thread Kevin.Mayer
d0da1230: 0x0010  0x
0x830bd0da1240: 0x00c09300  0x0023
0x830bd0da1250: 0x  0x00c0f300
0x830bd0da1260: 0x0030  0xffdff000
0x830bd0da1270: 0x00c093001fff  0x
0x830bd0da1280: 0x  0x01c0
0x830bd0da1290: 0x  0x
0x830bd0da12a0: 0x01c0  0x0028
0x830bd0da12b0: 0x80042000  0x8b0020ab
0x830bd0da12c0: 0x8003f000  0x8003f400
0x830bd0da12d0: 0x07ff03ff  0x8001003b
0x830bd0da12e0: 0x00039000  0x26d9
0x830bd0da12f0: 0xdc3c  0x
0x830bd0da1300: 0xe008  0x
0x830bd0da1310: 0x  0xe040
0x830bd0da1320: 0x050100070406  0x
0x830bd0da1330: 0x  0x80050033
0x830bd0da1340: 0x0001bd665000  0x26e0
0x830bd0da1350: 0x  0x
0x830bd0da1360: 0x830c17e38c80  0x830617fd3000
0x830bd0da1370: 0x830617fcf000  0x830617fd7fc0
0x830bd0da1380: 0x82d08024e150  0x830617fd7f90
0x830bd0da1390: 0x82d080201bb0  0xe008
0x830bd0da13a0: 0x0060  0x
0x830bd0da13b0: 0x  0x
0x830bd0da13c0: 0x  0x
0x830bd0da13d0: 0x8001003b  0x06d9
0x830bd0da13e0: 0x  0x
0x830bd0da13f0: 0x  0x
0x830bd0da1400: 0x  0x

I don't quite understand the Intel developer manual at this point. How do I 
have to read this data?

Since if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) must be true I assume the 
__vmwrite tries to | 0x8 into the host_cr0 leading to the 0x80050033 
for the current host_cr0 ( or better the 0x80050033 ).
Or at least this is what I think was intended to happen.

> -Ursprüngliche Nachricht-
> Von: Jan Beulich [mailto:jbeul...@suse.com]
> Gesendet: Mittwoch, 3. August 2016 15:54
> An: Mayer, Kevin <kevin.ma...@gdata.de>
> Cc: andrew.coop...@citrix.com; xen-devel@lists.xen.org
> Betreff: Re: AW: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
> 
> >>> On 03.08.16 at 15:24, <kevin.ma...@gdata.de> wrote:
> > I got around to take a closer look at the crash dump today.
> >
> > tl;dr:
> > You were right, vmx_vmenter_helper is not called at all in the call stack.
> > The real reason behind the []
> > vmx_vmenter_helper+0x27e/0x30a should be a failed
> __vmwrite(HOST_CR0,
> > v->arch.hvm_vmx.host_cr0); in static void vmx_fpu_leave(struct vcpu
> > *v).
> 
> Ah - that's what you get for not using most recent code, and what I get for
> not considering the effect of you being on 4.6.x. In any event - the call 
> stack
> is then fine, and you'll want to figure out which bit(s) of the new CR0 value
> are in conflict with the rest of the active VMCS.
> 
> Jan

Virus checked by G Data MailSecurity
Version: AVA 25.7724 dated 04.08.2016
Virus news: www.antiviruslab.com

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default

2016-08-03 Thread Jan Beulich
>>> On 03.08.16 at 15:24,  wrote:
> I got around to take a closer look at the crash dump today.
> 
> tl;dr:
> You were right, vmx_vmenter_helper is not called at all in the call stack.
> The real reason behind the [] 
> vmx_vmenter_helper+0x27e/0x30a 
> should be a failed
> __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); in static void 
> vmx_fpu_leave(struct vcpu *v).

Ah - that's what you get for not using most recent code, and what
I get for not considering the effect of you being on 4.6.x. In any
event - the call stack is then fine, and you'll want to figure out
which bit(s) of the new CR0 value are in conflict with the rest of the
active VMCS.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default

2016-08-03 Thread Kevin.Mayer
from+26>:mov 0x7ff0(%rdx),%rdx
0x82d0801fa08f <vmx_ctxt_switch_from+33>:cmpb   $0x0,(%rdx,%rax,1)
0x82d0801fa093 <vmx_ctxt_switch_from+37>:je  0x82d0801fa1d9 
<vmx_ctxt_switch_from+363>
0x82d0801fa099 <vmx_ctxt_switch_from+43>:cmpb   $0x0,0x109(%rdi)
0x82d0801fa0a0 <vmx_ctxt_switch_from+50>:je  0x82d0801fa0a4 
<vmx_ctxt_switch_from+54>
0x82d0801fa0a2 <vmx_ctxt_switch_from+52>:ud2
0x82d0801fa0a3 <vmx_ctxt_switch_from+53>:or (%rdi),%ecx
0x82d0801fa0a5 <vmx_ctxt_switch_from+55>:and%al,%al
0x82d0801fa0a7 <vmx_ctxt_switch_from+57>:test   $0x8,%al
0x82d0801fa0a9 <vmx_ctxt_switch_from+59>:jne 0x82d0801fa0ad 
<vmx_ctxt_switch_from+63>
0x82d0801fa0ab <vmx_ctxt_switch_from+61>:ud2
0x82d0801fa0ac <vmx_ctxt_switch_from+62>:or -0x75(%rax),%ecx
0x82d0801fa0af <vmx_ctxt_switch_from+65>:xchg   %eax,(%rax)
0x82d0801fa0b1 <vmx_ctxt_switch_from+67>:(bad)
0x82d0801fa0b2 <vmx_ctxt_switch_from+68>:add%al,(%rax)
0x82d0801fa0b4 <vmx_ctxt_switch_from+70>:test   $0x8,%al
0x82d0801fa0b6 <vmx_ctxt_switch_from+72>:jne 0x82d0801fa0d1 
<vmx_ctxt_switch_from+99>
0x82d0801fa0b8 <vmx_ctxt_switch_from+74>:or $0x8,%rax
0x82d0801fa0bc <vmx_ctxt_switch_from+78>:mov%rax,0x700(%rdi)
0x82d0801fa0c3 <vmx_ctxt_switch_from+85>:mov$0x6c00,%edx
0x82d0801fa0c8 <vmx_ctxt_switch_from+90>:vmwrite %rax,%rdx
0x82d0801fa0cb <vmx_ctxt_switch_from+93>:jbe 0x82d0801fd23a

The two test, jne should be the if ( !(v->arch.hvm_vmx.host_cr0 & X86_CR0_TS) ) 
and if ( !(v->arch.hvm_vcpu.guest_cr[0] & X86_CR0_TS) ) conditions.
mov$0x6c00,%edx
vmwrite %rax,%rdx
Should be the
__vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);

So the real reason the hypervisor panics should be a failing __vmwrite in 
static void vmx_fpu_leave(struct vcpu *v)
which simply jumps to a location behind vmx_vmenter_helper, thereby creating a 
slightly confusing stack trace.

Chapter2:
This does not seem to have anything to do with the altp2m so I looked at the 
stray vmx_vcpu_update_eptp which can be seen in the bt, but not in the xen 
dmesg.
#10 [830617fd7d28] destroy_perdomain_mapping at 82d080196152
#11 [830617fd7d38] vmx_vcpu_update_eptp at 82d0801f7c6b
#12 [830617fd7d78] free_compat_arg_xlat at 82d080244a62

This function is not called by free_compat_arg_xlat:
void free_compat_arg_xlat(struct vcpu *v)
{
destroy_perdomain_mapping(v->domain, ARG_XLAT_START(v),
  PFN_UP(COMPAT_ARG_XLAT_SIZE));
}

Instead it is called BEFORE free_compat_arg_xlat in 
hvm_vcpu_destroy->altp2m_vcpu_destroy->altp2m_vcpu_update_p2m->hvm_funcs.altp2m_vcpu_update_p2m
 (.altp2m_vcpu_update_p2m = vmx_vcpu_update_eptp,) :
void hvm_vcpu_destroy(struct vcpu *v)
{
hvm_all_ioreq_servers_remove_vcpu(v->domain, v);

if ( hvm_altp2m_supported() )
altp2m_vcpu_destroy(v);

nestedhvm_vcpu_destroy(v);

free_compat_arg_xlat(v);
[...]

Here the enabling of the altp2m has an effect, but I have no idea how it could 
lead to a failing __vmwrite.
Any ideas where in the altp2m-code the error could be, or how I could help in 
finding it?

Cheers

Kevin

> -Ursprüngliche Nachricht-
> Von: Jan Beulich [mailto:jbeul...@suse.com]
> Gesendet: Dienstag, 2. August 2016 14:34
> An: Mayer, Kevin <kevin.ma...@gdata.de>
> Cc: andrew.coop...@citrix.com; xen-devel@lists.xen.org
> Betreff: Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default
> 
> >>> On 02.08.16 at 13:45, <kevin.ma...@gdata.de> wrote:
> > (XEN) [ Xen-4.6.1  x86_64  debug=y  Not tainted ]
> > (XEN) CPU:6
> > (XEN) RIP:e008:[]
> vmx_vmenter_helper+0x27e/0x30a
> > (XEN) RFLAGS: 00010003   CONTEXT: hypervisor
> > (XEN) rax: 8005003b   rbx: 8300e72fc000   rcx:
> 
> > (XEN) rdx: 6c00   rsi: 830617fd7fc0   rdi: 8300e6fc
> > (XEN) rbp: 830617fd7c40   rsp: 830617fd7c30   r8:  
> > (XEN) r9:  830be8dc9310   r10:    r11: 3475e9cf85d0
> > (XEN) r12: 0006   r13: 830c14ee1000   r14: 8300e6fc
> > (XEN) r15: 830617fd   cr0: 8005003b   cr4:
> 26e0
> > (XEN) cr3: 0001bd665000   cr2: 0451
> > (XEN) ds:    es:    fs:    gs:    ss:    cs: e008
> > (XEN) Xen stack trace from rsp=830617fd7c30:
> > (XEN)830617fd7c40 8300e72fc000 830617fd7ca0
> 82d080174f91
> > (XEN)

Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default

2016-08-02 Thread Jan Beulich
>>> On 02.08.16 at 13:45,  wrote:
> (XEN) [ Xen-4.6.1  x86_64  debug=y  Not tainted ]
> (XEN) CPU:6
> (XEN) RIP:e008:[] vmx_vmenter_helper+0x27e/0x30a
> (XEN) RFLAGS: 00010003   CONTEXT: hypervisor
> (XEN) rax: 8005003b   rbx: 8300e72fc000   rcx: 
> (XEN) rdx: 6c00   rsi: 830617fd7fc0   rdi: 8300e6fc
> (XEN) rbp: 830617fd7c40   rsp: 830617fd7c30   r8:  
> (XEN) r9:  830be8dc9310   r10:    r11: 3475e9cf85d0
> (XEN) r12: 0006   r13: 830c14ee1000   r14: 8300e6fc
> (XEN) r15: 830617fd   cr0: 8005003b   cr4: 26e0
> (XEN) cr3: 0001bd665000   cr2: 0451
> (XEN) ds:    es:    fs:    gs:    ss:    cs: e008
> (XEN) Xen stack trace from rsp=830617fd7c30:
> (XEN)830617fd7c40 8300e72fc000 830617fd7ca0 82d080174f91
> (XEN)830617fd7f18 830be8dc9000 0286 830617fd7c90
> (XEN)0206 0246 0001 830617e91250
> (XEN)8300e72fc000 830be8dc9000 830617fd7cc0 82d080178c19
> (XEN)00bdeeae 8300e72fc000 830617fd7cd0 82d080178c3e
> (XEN)830617fd7d20 82d080179740 8300e6fc2000 830c17e38e80
> (XEN)830617e91250 82008000 0002 830617e91250
> (XEN)830617e91240 830be8dc9000 830617fd7d70 82d080196152
> (XEN)830617fd7d50 82d0801f7c6b 8300e6fc2000 830617e91250
> (XEN)8300e6fc2000 830617e91250 830617e91240 830be8dc9000
> (XEN)830617fd7d80 82d080244a62 830617fd7db0 82d0801d3fe2
> (XEN)8300e6fc2000  830617e91f28 830617e91000
> (XEN)830617fd7dd0 82d080175c2c 8300e6fc2000 8300e6fc2000
> (XEN)830617fd7e00 82d080105dd4 830c17e38040 
> (XEN) 830617fd 830617fd7e30 82d0801215fd
> (XEN)8300e6fc 82d080329280 82d080328f80 fffd
> (XEN)830617fd7e60 82d08012caf8 0006 830c17e3bc60
> (XEN)0002 830c17e3bbe0 830617fd7e70 82d08012cb3b
> (XEN)830617fd7ef0 82d0801c23a8 8300e72fc000 
> (XEN)82d0801f3200 830617fd7f08 82d080329280 
> (XEN) Xen call trace:
> (XEN)[] vmx_vmenter_helper+0x27e/0x30a
> (XEN)[] __context_switch+0xdb/0x3b5
> (XEN)[] __sync_local_execstate+0x5e/0x7a
> (XEN)[] sync_local_execstate+0x9/0xb
> (XEN)[] map_domain_page+0xa0/0x5d4
> (XEN)[] destroy_perdomain_mapping+0x8f/0x1e8
> (XEN)[] free_compat_arg_xlat+0x26/0x28
> (XEN)[] hvm_vcpu_destroy+0x73/0xb0
> (XEN)[] vcpu_destroy+0x5d/0x72
> (XEN)[] complete_domain_destroy+0x49/0x192
> (XEN)[] rcu_process_callbacks+0x19a/0x1fb
> (XEN)[] __do_softirq+0x82/0x8d
> (XEN)[] process_pending_softirqs+0x38/0x3a
> (XEN)[] mwait_idle+0x10c/0x315
> (XEN)[] idle_loop+0x51/0x6b

On this deep a stack execution can't validly end up in
vmx_vmenter_helper: That's a function called only when the stack
is almost empty. Nor is the caller of it the context switch code.
Hence your problem starts quite a bit earlier - perhaps memory
corruption?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default

2016-08-02 Thread Kevin.Mayer
6 8300e6fc BL U   164 830be8dc9000
  0 0 8300e6fc6000 BL U   165 830bd0cc

So in contrast to the last dump the crashing CPU is running DOMID 32767 (the 
Dom-0) if I understand the output correctly.

Kevin


Von: Andrew Cooper [mailto:andrew.coop...@citrix.com]
Gesendet: Freitag, 29. Juli 2016 12:05
An: Mayer, Kevin <kevin.ma...@gdata.de>; xen-devel@lists.xen.org
Betreff: Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default

On 29/07/16 08:33, kevin.ma...@gdata.de<mailto:kevin.ma...@gdata.de> wrote:
Hi guys

We are using Xen 4.6.1 to manage our virtual machines on x86-64-servers.
We start dozens of VMs and destroy them again after 60 seconds, which works 
fine as it is, but the next step in our approach requires the use of the altp2m 
functionality.
Since libvirt does not pass the altp2m-enable flag to the hypervisor we enabled 
altp2m unconditionally by patching the hvm.c . Since all of our machines 
support the altp2m this seemed to be ok.

altp2m is emulated in software when hardware support isn't available, so it 
should work on all hardware (albeit with rather higher overhead).



 d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1;
 d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] = SHUTDOWN_reboot;
+d->arch.hvm_domain.params[HVM_PARAM_ALTP2M] = 1;
+

This looks to be ok, given your situation.


 vpic_init(d);
 rc = vioapic_init(d);

Since applying this patch the hypervisor crashes after several hundred 
restarted VMs (without any altp2m-functionality used by us) with the following 
dmesg:

(XEN) [ Xen-4.6.1  x86_64  debug=n  Not tainted ]

As a start, please always use a debug hypervisor for investigating issues like 
this.


(XEN) CPU:7
(XEN) RIP:e008:[] vmx_vmenter_helper+0x2b5/0x340
(XEN) RFLAGS: 00010003   CONTEXT: hypervisor (d0v3)
(XEN) rax: 8005003b   rbx: 8300e7038000   rcx: 0008
(XEN) rdx: 6c00   rsi: 83062eb5e000   rdi: 8300e7038000
(XEN) rbp: 830c17e3f000   rsp: 830617fc7d70   r8:  
(XEN) r9:  83014f8d7028   r10: 02700f858000   r11: 2201be6861f0
(XEN) r12: 83062eb5e000   r13: 8300e752f000   r14: 82d08030ea40
(XEN) r15: 0007   cr0: 8005003b   cr4: 26e0
(XEN) cr3: 0001bf4da000   cr2: dd840c00
(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen stack trace from rsp=830617fc7d70:
(XEN)8300e7038000 82d080170c04  000780109f6a
(XEN)830617fc7f18 831e  8300e752f19c
(XEN)0286 8300e752f000 8300e72fc000 0007
(XEN)830c17e3f000 830c14ee1000 82d08030ea40 82d080173d6a
(XEN)   
(XEN)82d08030ea40 8300e72fc000 02700f481091 0001
(XEN)82d080324560 82d08030ea40 8300e752f000 82d080128004
(XEN)0001 01c9c380 830c14ef60e8 17fce600
(XEN)0001 82d0801bd18b 82d0801d9e88 8300e752f000
(XEN)01c9c380 82d08012e700 006e0171 
(XEN)830617fc 82d0802f8f80  83062eb5e000
(XEN)82d08030ea40 82d08012b040 8300e7038000 830617fc
(XEN)8300e7038000  830c14ee1000 82d080170970
(XEN)8300e72fc000   
(XEN) 80550f50 ffdffc70 
(XEN)   2fcffe19
(XEN)ffdffc70  ffdffc50 853b0918
(XEN)00fa f0e48162  0246
(XEN)80550f34   
(XEN)  0007 8300e752f000
(XEN) Xen call trace:
(XEN)[] vmx_vmenter_helper+0x2b5/0x340
(XEN)[] __context_switch+0xb4/0x350
(XEN)[] context_switch+0xca/0xef0
(XEN)[] schedule+0x264/0x5f0
(XEN)[] mwait_idle+0x25b/0x3a0
(XEN)[] hvm_vcpu_has_pending_irq+0x58/0xc0
(XEN)[] timer_softirq_action+0x80/0x250
(XEN)[] __do_softirq+0x60/0x90
(XEN)[] idle_loop+0x20/0x50
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 7:
(XEN) FATAL TRAP: vector = 6 (invalid opcode)
(XEN) 
(XEN)
(XEN) Reboot in five seconds...
(XEN) Executing kexec image on cpu7
(XEN) Shot down all CPUs

The RIP points to ud2
0x82d0801f5a55:  ud2
From the RFLAGS we concluded that the vmwrite failed due to an invalid 
vmcs-pointer (CF = 1), but this is where we are stuck since we have no idea how 
the pointer could have gotten corrupted.
crash> vcpu
gives vmcs = 0x817cbc20 for vcpu_id = 7,

a

Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default

2016-07-29 Thread Andrew Cooper
On 29/07/16 08:33, kevin.ma...@gdata.de wrote:
>
> Hi guys
>
>  
>
> We are using Xen 4.6.1 to manage our virtual machines on x86-64-servers.
>
> We start dozens of VMs and destroy them again after 60 seconds, which
> works fine as it is, but the next step in our approach requires the
> use of the altp2m functionality.
>
> Since libvirt does not pass the altp2m-enable flag to the hypervisor
> we enabled altp2m unconditionally by patching the hvm.c . Since all of
> our machines support the altp2m this seemed to be ok.
>

altp2m is emulated in software when hardware support isn't available, so
it should work on all hardware (albeit with rather higher overhead).

>  
>
>  d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1;
>
>  d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] =
> SHUTDOWN_reboot;
>
> +d->arch.hvm_domain.params[HVM_PARAM_ALTP2M] = 1;
>
> +
>

This looks to be ok, given your situation.

>  vpic_init(d);
>
>  rc = vioapic_init(d);
>
>  
>
> Since applying this patch the hypervisor crashes after several hundred
> restarted VMs (without any altp2m-functionality used by us) with the
> following dmesg:
>
>  
>
> (XEN) [ Xen-4.6.1  x86_64  debug=n  Not tainted ]
>

As a start, please always use a debug hypervisor for investigating
issues like this.

> (XEN) CPU:7
>
> (XEN) RIP:e008:[] vmx_vmenter_helper+0x2b5/0x340
>
> (XEN) RFLAGS: 00010003   CONTEXT: hypervisor (d0v3)
>
> (XEN) rax: 8005003b   rbx: 8300e7038000   rcx:
> 0008
>
> (XEN) rdx: 6c00   rsi: 83062eb5e000   rdi:
> 8300e7038000
>
> (XEN) rbp: 830c17e3f000   rsp: 830617fc7d70   r8: 
> 
>
> (XEN) r9:  83014f8d7028   r10: 02700f858000   r11:
> 2201be6861f0
>
> (XEN) r12: 83062eb5e000   r13: 8300e752f000   r14:
> 82d08030ea40
>
> (XEN) r15: 0007   cr0: 8005003b   cr4:
> 26e0
>
> (XEN) cr3: 0001bf4da000   cr2: dd840c00
>
> (XEN) ds:    es:    fs:    gs:    ss:    cs: e008
>
> (XEN) Xen stack trace from rsp=830617fc7d70:
>
> (XEN)8300e7038000 82d080170c04 
> 000780109f6a
>
> (XEN)830617fc7f18 831e 
> 8300e752f19c
>
> (XEN)0286 8300e752f000 8300e72fc000
> 0007
>
> (XEN)830c17e3f000 830c14ee1000 82d08030ea40
> 82d080173d6a
>
> (XEN)  
> 
>
> (XEN)82d08030ea40 8300e72fc000 02700f481091
> 0001
>
> (XEN)82d080324560 82d08030ea40 8300e752f000
> 82d080128004
>
> (XEN)0001 01c9c380 830c14ef60e8
> 17fce600
>
> (XEN)0001 82d0801bd18b 82d0801d9e88
> 8300e752f000
>
> (XEN)01c9c380 82d08012e700 006e0171
> 
>
> (XEN)830617fc 82d0802f8f80 
> 83062eb5e000
>
> (XEN)82d08030ea40 82d08012b040 8300e7038000
> 830617fc
>
> (XEN)8300e7038000  830c14ee1000
> 82d080170970
>
> (XEN)8300e72fc000  
> 
>
> (XEN) 80550f50 ffdffc70
> 
>
> (XEN)  
> 2fcffe19
>
> (XEN)ffdffc70  ffdffc50
> 853b0918
>
> (XEN)00fa f0e48162 
> 0246
>
> (XEN)80550f34  
> 
>
> (XEN)  0007
> 8300e752f000
>
> (XEN) Xen call trace:
>
> (XEN)[] vmx_vmenter_helper+0x2b5/0x340
>
> (XEN)[] __context_switch+0xb4/0x350
>
> (XEN)[] context_switch+0xca/0xef0
>
> (XEN)[] schedule+0x264/0x5f0
>
> (XEN)[] mwait_idle+0x25b/0x3a0
>
> (XEN)[] hvm_vcpu_has_pending_irq+0x58/0xc0
>
> (XEN)[] timer_softirq_action+0x80/0x250
>
> (XEN)[] __do_softirq+0x60/0x90
>
> (XEN)[] idle_loop+0x20/0x50
>
> (XEN)
>
> (XEN)
>
> (XEN) 
>
> (XEN) Panic on CPU 7:
>
> (XEN) FATAL TRAP: vector = 6 (invalid opcode)
>
> (XEN) 
>
> (XEN)
>
> (XEN) Reboot in five seconds...
>
> (XEN) Executing kexec image on cpu7
>
> (XEN) Shot down all CPUs
>
>  
>
> The RIP points to ud2
>
> 0x82d0801f5a55:  ud2
>
> From the RFLAGS we concluded that the vmwrite failed due to an invalid
> vmcs-pointer (CF = 1), but this is where we are stuck since we have no
> idea how the pointer could have gotten corrupted.
>
> crash> vcpu
>
> gives vmcs = 0x817cbc20 for vcpu_id = 7,
>
>  
>
> and vcpus gives
>
>  
>
>VCID  PCID   VCPU   ST T DOMID  DOMAIN
>
>   0 0 8300e75f2000 RU I 32767 830c14ee1000
>
>  

Re: [Xen-devel] Xen 4.6.1 crash with altp2m enabled by default

2016-07-29 Thread Dario Faggioli
On Fri, 2016-07-29 at 07:33 +, kevin.ma...@gdata.de wrote:
> Hi guys
>  
Hi,

I'm pretty much just Cc-ing maintainers/key people, to see if they have
ideas.

Only one thing. Since you are rebuilding Xen anyway, I think it could
be helpful to try a debug build, and post the dump it will produce.

> (XEN) [ Xen-4.6.1  x86_64  debug=n  Not tainted ]
>
I.e., this would need to become debug=y.

Since you said you're using 4.6.x, I think putting "debug=y" in a
.config file (and then rebuilding and reinstalling, of course) should
be enough.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] Xen 4.6.1 crash with altp2m enabled by default

2016-07-29 Thread Kevin.Mayer
Hi guys

We are using Xen 4.6.1 to manage our virtual machines on x86-64-servers.
We start dozens of VMs and destroy them again after 60 seconds, which works 
fine as it is, but the next step in our approach requires the use of the altp2m 
functionality.
Since libvirt does not pass the altp2m-enable flag to the hypervisor we enabled 
altp2m unconditionally by patching the hvm.c . Since all of our machines 
support the altp2m this seemed to be ok.

 d->arch.hvm_domain.params[HVM_PARAM_HPET_ENABLED] = 1;
 d->arch.hvm_domain.params[HVM_PARAM_TRIPLE_FAULT_REASON] = SHUTDOWN_reboot;
+d->arch.hvm_domain.params[HVM_PARAM_ALTP2M] = 1;
+
 vpic_init(d);
 rc = vioapic_init(d);

Since applying this patch the hypervisor crashes after several hundred 
restarted VMs (without any altp2m-functionality used by us) with the following 
dmesg:

(XEN) [ Xen-4.6.1  x86_64  debug=n  Not tainted ]
(XEN) CPU:7
(XEN) RIP:e008:[] vmx_vmenter_helper+0x2b5/0x340
(XEN) RFLAGS: 00010003   CONTEXT: hypervisor (d0v3)
(XEN) rax: 8005003b   rbx: 8300e7038000   rcx: 0008
(XEN) rdx: 6c00   rsi: 83062eb5e000   rdi: 8300e7038000
(XEN) rbp: 830c17e3f000   rsp: 830617fc7d70   r8:  
(XEN) r9:  83014f8d7028   r10: 02700f858000   r11: 2201be6861f0
(XEN) r12: 83062eb5e000   r13: 8300e752f000   r14: 82d08030ea40
(XEN) r15: 0007   cr0: 8005003b   cr4: 26e0
(XEN) cr3: 0001bf4da000   cr2: dd840c00
(XEN) ds:    es:    fs:    gs:    ss:    cs: e008
(XEN) Xen stack trace from rsp=830617fc7d70:
(XEN)8300e7038000 82d080170c04  000780109f6a
(XEN)830617fc7f18 831e  8300e752f19c
(XEN)0286 8300e752f000 8300e72fc000 0007
(XEN)830c17e3f000 830c14ee1000 82d08030ea40 82d080173d6a
(XEN)   
(XEN)82d08030ea40 8300e72fc000 02700f481091 0001
(XEN)82d080324560 82d08030ea40 8300e752f000 82d080128004
(XEN)0001 01c9c380 830c14ef60e8 17fce600
(XEN)0001 82d0801bd18b 82d0801d9e88 8300e752f000
(XEN)01c9c380 82d08012e700 006e0171 
(XEN)830617fc 82d0802f8f80  83062eb5e000
(XEN)82d08030ea40 82d08012b040 8300e7038000 830617fc
(XEN)8300e7038000  830c14ee1000 82d080170970
(XEN)8300e72fc000   
(XEN) 80550f50 ffdffc70 
(XEN)   2fcffe19
(XEN)ffdffc70  ffdffc50 853b0918
(XEN)00fa f0e48162  0246
(XEN)80550f34   
(XEN)  0007 8300e752f000
(XEN) Xen call trace:
(XEN)[] vmx_vmenter_helper+0x2b5/0x340
(XEN)[] __context_switch+0xb4/0x350
(XEN)[] context_switch+0xca/0xef0
(XEN)[] schedule+0x264/0x5f0
(XEN)[] mwait_idle+0x25b/0x3a0
(XEN)[] hvm_vcpu_has_pending_irq+0x58/0xc0
(XEN)[] timer_softirq_action+0x80/0x250
(XEN)[] __do_softirq+0x60/0x90
(XEN)[] idle_loop+0x20/0x50
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 7:
(XEN) FATAL TRAP: vector = 6 (invalid opcode)
(XEN) 
(XEN)
(XEN) Reboot in five seconds...
(XEN) Executing kexec image on cpu7
(XEN) Shot down all CPUs

The RIP points to ud2
0x82d0801f5a55:  ud2
>From the RFLAGS we concluded that the vmwrite failed due to an invalid 
>vmcs-pointer (CF = 1), but this is where we are stuck since we have no idea 
>how the pointer could have gotten corrupted.
crash> vcpu
gives vmcs = 0x817cbc20 for vcpu_id = 7,

and vcpus gives

   VCID  PCID   VCPU   ST T DOMID  DOMAIN
  0 0 8300e75f2000 RU I 32767 830c14ee1000
  1 1 8300e72fe000 RU I 32767 830c14ee1000
  2 2 8300e7527000 RU I 32767 830c14ee1000
> 3 3 8300e7526000 RU I 32767 830c14ee1000
  4 4 8300e75f1000 RU I 32767 830c14ee1000
> 5 5 8300e75f RU I 32767 830c14ee1000
> 6 6 8300e72fd000 RU I 32767 830c14ee1000
  7 7 8300e72fc000 RU I 32767 830c14ee1000
  0 0 8300e72fa000 BL 0 0 830c17e3f000
  1 6 8300e72f9000 BL 0 0 830c17e3f000
  2 3 8300e72f8000 BL 0 0 830c17e3f000
> 3 7 8300e752f000 RU 0 0 830c17e3f000
  4 5 8300e752e000 RU 0 0 830c17e3f000
>