Re: [Xen-devel] Xen 4.11.1 panic

2018-12-19 Thread Manuel Bouyer
On Thu, Dec 20, 2018 at 12:36:35AM -0700, Jan Beulich wrote:
> >>> On 19.12.18 at 13:54,  wrote:
>  On 19.12.18 at 12:55,  wrote:
> >> On Wed, Dec 19, 2018 at 04:05:57AM -0700, Jan Beulich wrote:
> >>> In any event, both Andrew and I must have overlooked the one
> >>> crucial place due to which the assertion is indeed wrong from
> >>> put_page_from_l2e():
> >>> 
> >>> int rc = _put_page_type(pg, false, mfn_to_page(_mfn(pfn)));
> >>> 
> >>> Not allowing for preemption there is fine if the L2E is pointing to
> >>> an L1 table, but is now wrong if the L2E points to another L2,
> >>> which surely is the case when you see the assertion trigger.
> >> 
> >> Should we just change false to true here, or should the cases above be 
> >> handled differently ?
> > 
> > Switching from false to true here is just the initial part of the
> > necessary change - if you did just this, you'd end up hitting
> > the ASSERT() right after the line above. There's quite a bit
> > more to it, and it needs to be done pretty carefully.
> 
> Actually there was no reason to alter the free_l2_table() paths
> in the XSA-273 fixes: A switch to shadow mode can only occur
> when validating page tables. Therefore I think you could safely
> revert the respective hunks, which includes deleting the
> ASSERT() you found triggering.

You mean, Xen is not going to fix this ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.11.1 panic

2018-12-19 Thread Jan Beulich
>>> On 19.12.18 at 13:54,  wrote:
 On 19.12.18 at 12:55,  wrote:
>> On Wed, Dec 19, 2018 at 04:05:57AM -0700, Jan Beulich wrote:
>>> In any event, both Andrew and I must have overlooked the one
>>> crucial place due to which the assertion is indeed wrong from
>>> put_page_from_l2e():
>>> 
>>> int rc = _put_page_type(pg, false, mfn_to_page(_mfn(pfn)));
>>> 
>>> Not allowing for preemption there is fine if the L2E is pointing to
>>> an L1 table, but is now wrong if the L2E points to another L2,
>>> which surely is the case when you see the assertion trigger.
>> 
>> Should we just change false to true here, or should the cases above be 
>> handled differently ?
> 
> Switching from false to true here is just the initial part of the
> necessary change - if you did just this, you'd end up hitting
> the ASSERT() right after the line above. There's quite a bit
> more to it, and it needs to be done pretty carefully.

Actually there was no reason to alter the free_l2_table() paths
in the XSA-273 fixes: A switch to shadow mode can only occur
when validating page tables. Therefore I think you could safely
revert the respective hunks, which includes deleting the
ASSERT() you found triggering.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.11.1 panic

2018-12-19 Thread Manuel Bouyer
On Wed, Dec 19, 2018 at 05:54:42AM -0700, Jan Beulich wrote:
> >>> On 19.12.18 at 12:55,  wrote:
> > On Wed, Dec 19, 2018 at 04:05:57AM -0700, Jan Beulich wrote:
> >> In any event, both Andrew and I must have overlooked the one
> >> crucial place due to which the assertion is indeed wrong from
> >> put_page_from_l2e():
> >> 
> >> int rc = _put_page_type(pg, false, mfn_to_page(_mfn(pfn)));
> >> 
> >> Not allowing for preemption there is fine if the L2E is pointing to
> >> an L1 table, but is now wrong if the L2E points to another L2,
> >> which surely is the case when you see the assertion trigger.
> > 
> > Should we just change false to true here, or should the cases above be 
> > handled differently ?
> 
> Switching from false to true here is just the initial part of the
> necessary change - if you did just this, you'd end up hitting
> the ASSERT() right after the line above. There's quite a bit
> more to it, and it needs to be done pretty carefully.

OK, so I'll wait for a more complete patch :)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.11.1 panic

2018-12-19 Thread Jan Beulich
>>> On 19.12.18 at 12:55,  wrote:
> On Wed, Dec 19, 2018 at 04:05:57AM -0700, Jan Beulich wrote:
>> In any event, both Andrew and I must have overlooked the one
>> crucial place due to which the assertion is indeed wrong from
>> put_page_from_l2e():
>> 
>> int rc = _put_page_type(pg, false, mfn_to_page(_mfn(pfn)));
>> 
>> Not allowing for preemption there is fine if the L2E is pointing to
>> an L1 table, but is now wrong if the L2E points to another L2,
>> which surely is the case when you see the assertion trigger.
> 
> Should we just change false to true here, or should the cases above be 
> handled differently ?

Switching from false to true here is just the initial part of the
necessary change - if you did just this, you'd end up hitting
the ASSERT() right after the line above. There's quite a bit
more to it, and it needs to be done pretty carefully.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.11.1 panic

2018-12-19 Thread Manuel Bouyer
On Wed, Dec 19, 2018 at 04:05:57AM -0700, Jan Beulich wrote:
> >>> On 18.12.18 at 23:19,  wrote:
> > I tried updating my NetBSD dom0 to 4.11.1 (from 4.11.0 with security 
> > patches),
> 
> Hmm, the issue stems from the XSA-273 changes, so did you perhaps
> mean "with some security patches", and you didn't have those ones
> applied?

Yes, for some reason XSA-273 isn't in the list of patches I had for
Xen 4.11.0. I guess I forgot it ...

> 
> > and on a 32bits PV domU shutdown I get (100% reproductible):
> > (XEN) Assertion 'preemptible' failed at mm.c:2493   
> [...]
> 
> The line number above doesn't match any one with a respective
> ASSERT() in plain 4.11.1. There are a few nearby ones, and hence
> I can only guess that it's the one that was recently added (in
> PGT_l2_page_table handling of free_page_type()). Can you confirm
> this please with the exact sources you've used for your build?

Here's what I have in xen/arch/x86/mm.c:
  2486  switch ( type & PGT_type_mask )  
  2487  {
  2488  case PGT_l1_page_table:
  2489  free_l1_table(page);
  2490  rc = 0; 
  2491  break;
  2492  case PGT_l2_page_table:
  2493  ASSERT(preemptible);
  2494  rc = free_l2_table(page); 
  2495  break; 
  2496  case PGT_l3_page_table:
  2497  ASSERT(preemptible);
  2498  rc = free_l3_table(page);
  2499  break;
  2500  case PGT_l4_page_table: 
  2501  ASSERT(preemptible);
  2502  rc = free_l4_table(page);
  2503  break;
  2504  default:

This is in free_page_type()

> 
> In any event, both Andrew and I must have overlooked the one
> crucial place due to which the assertion is indeed wrong from
> put_page_from_l2e():
> 
> int rc = _put_page_type(pg, false, mfn_to_page(_mfn(pfn)));
> 
> Not allowing for preemption there is fine if the L2E is pointing to
> an L1 table, but is now wrong if the L2E points to another L2,
> which surely is the case when you see the assertion trigger.

Should we just change false to true here, or should the cases above be handled
differently ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.11.1 panic

2018-12-19 Thread Jan Beulich
>>> On 18.12.18 at 23:19,  wrote:
> I tried updating my NetBSD dom0 to 4.11.1 (from 4.11.0 with security patches),

Hmm, the issue stems from the XSA-273 changes, so did you perhaps
mean "with some security patches", and you didn't have those ones
applied?

> and on a 32bits PV domU shutdown I get (100% reproductible):
> (XEN) Assertion 'preemptible' failed at mm.c:2493   
> (XEN) [ Xen-4.11.1nb0  x86_64  debug=y   Tainted:  C   ]
> (XEN) CPU:1 
> (XEN) RIP:e008:[] free_page_type+0x232/0x790  
> (XEN) RFLAGS: 00010246   CONTEXT: hypervisor (d0v0) 
> (XEN) rax: 4000   rbx: 4401   rcx: 4000 
> (XEN) rdx: 8300   rsi: 4401   rdi: 82e004215260 
> (XEN) rbp: 82e004215260   rsp: 83023704fab8   r8:   
> (XEN) r9:     r10: 82e0   r11: 82e004226000 
> (XEN) r12:    r13: 8302135d9000   r14: 10ff 
> (XEN) r15: 1000   cr0: 8005003b   cr4: 2660 
> (XEN) cr3: 00022f0f6000   cr2: 7f7ff60ce7a0
> (XEN) fsb: 7f7ff7ff36c0   gsb: 80ca42c0   gss: 
> (XEN) ds: 003f   es: 003f   fs:    gs:    ss: e010   cs: e008
> (XEN) Xen code around  (free_page_type+0x232/0x790):
> (XEN)  05 00 00 45 85 e4 75 02 <0f> 0b 8b 45 18 85 c0 74 18 89 c0 48 c1 e0 0c 
> 49
> (XEN) Xen stack trace from rsp=83023704fab8:
> (XEN)83023704fe38 00ec 4401 82e004215260
> (XEN)82e004215240 00ff 10ff 1000
> (XEN)82d08028b83d 00ff8300bedfc000 83023704 82d0
> (XEN)82e004215260 82e004215240 8302135d9000 00210a92
> (XEN)820040019000 0200 82d08028bedf 01ff
> (XEN)82e004215240 82d08028b25e 8302 83023704
> (XEN)4401 82e004215240 82e004206c60 00ff
> (XEN)10ff 1000 82d08028b83d 0102
> (XEN)83023704 83020001 82e004215240 82e004206c60
> (XEN) 820040015010  820040015000
> (XEN)82d08028af30 0002 82e004206c60 
> (XEN)820040015010 82d08028b3d1 00210363 8302135d9000
> (XEN)6401 82e004206c60  00ff
> (XEN)10ff 1000 82d08028b83d 01ff82d08022697f
> (XEN)83023704 83020001 82e004206c60 83023704fd10
> (XEN)8302135d9028 8302135d9000 8302135d9020 82e004206c70
> (XEN)82d08028bf1f 82d080274e6b 83023701ec00 e401
> (XEN)83023704 8000 8302135d9000 
> (XEN)8302135d9018 deadbeefdeadf00d 0001 7f7ff7b32004
> (XEN)82d080278f83 8302135d9000 7f7ff7b32004 82d080208b2d
> (XEN) Xen call trace:
> (XEN)[] free_page_type+0x232/0x790
> (XEN)[] mm.c#_put_page_type+0x14d/0x380
> (XEN)[] mm.c#put_page_from_l2e+0xdf/0x110
> (XEN)[] free_page_type+0x2fe/0x790
> (XEN)[] mm.c#_put_page_type+0x14d/0x380
> (XEN)[] mm.c#put_page_from_l3e+0x1a0/0x1d0

The line number above doesn't match any one with a respective
ASSERT() in plain 4.11.1. There are a few nearby ones, and hence
I can only guess that it's the one that was recently added (in
PGT_l2_page_table handling of free_page_type()). Can you confirm
this please with the exact sources you've used for your build?

In any event, both Andrew and I must have overlooked the one
crucial place due to which the assertion is indeed wrong from
put_page_from_l2e():

int rc = _put_page_type(pg, false, mfn_to_page(_mfn(pfn)));

Not allowing for preemption there is fine if the L2E is pointing to
an L1 table, but is now wrong if the L2E points to another L2,
which surely is the case when you see the assertion trigger.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] Xen 4.11.1 panic

2018-12-18 Thread Manuel Bouyer
Hello,
I tried updating my NetBSD dom0 to 4.11.1 (from 4.11.0 with security patches),
and on a 32bits PV domU shutdown I get (100% reproductible):
(XEN) Assertion 'preemptible' failed at mm.c:2493   
(XEN) [ Xen-4.11.1nb0  x86_64  debug=y   Tainted:  C   ]
(XEN) CPU:1 
(XEN) RIP:e008:[] free_page_type+0x232/0x790  
(XEN) RFLAGS: 00010246   CONTEXT: hypervisor (d0v0) 
(XEN) rax: 4000   rbx: 4401   rcx: 4000 
(XEN) rdx: 8300   rsi: 4401   rdi: 82e004215260 
(XEN) rbp: 82e004215260   rsp: 83023704fab8   r8:   
(XEN) r9:     r10: 82e0   r11: 82e004226000 
(XEN) r12:    r13: 8302135d9000   r14: 10ff 
(XEN) r15: 1000   cr0: 8005003b   cr4: 2660 
(XEN) cr3: 00022f0f6000   cr2: 7f7ff60ce7a0
(XEN) fsb: 7f7ff7ff36c0   gsb: 80ca42c0   gss: 
(XEN) ds: 003f   es: 003f   fs:    gs:    ss: e010   cs: e008
(XEN) Xen code around  (free_page_type+0x232/0x790):
(XEN)  05 00 00 45 85 e4 75 02 <0f> 0b 8b 45 18 85 c0 74 18 89 c0 48 c1 e0 0c 49
(XEN) Xen stack trace from rsp=83023704fab8:
(XEN)83023704fe38 00ec 4401 82e004215260
(XEN)82e004215240 00ff 10ff 1000
(XEN)82d08028b83d 00ff8300bedfc000 83023704 82d0
(XEN)82e004215260 82e004215240 8302135d9000 00210a92
(XEN)820040019000 0200 82d08028bedf 01ff
(XEN)82e004215240 82d08028b25e 8302 83023704
(XEN)4401 82e004215240 82e004206c60 00ff
(XEN)10ff 1000 82d08028b83d 0102
(XEN)83023704 83020001 82e004215240 82e004206c60
(XEN) 820040015010  820040015000
(XEN)82d08028af30 0002 82e004206c60 
(XEN)820040015010 82d08028b3d1 00210363 8302135d9000
(XEN)6401 82e004206c60  00ff
(XEN)10ff 1000 82d08028b83d 01ff82d08022697f
(XEN)83023704 83020001 82e004206c60 83023704fd10
(XEN)8302135d9028 8302135d9000 8302135d9020 82e004206c70
(XEN)82d08028bf1f 82d080274e6b 83023701ec00 e401
(XEN)83023704 8000 8302135d9000 
(XEN)8302135d9018 deadbeefdeadf00d 0001 7f7ff7b32004
(XEN)82d080278f83 8302135d9000 7f7ff7b32004 82d080208b2d
(XEN) Xen call trace:
(XEN)[] free_page_type+0x232/0x790
(XEN)[] mm.c#_put_page_type+0x14d/0x380
(XEN)[] mm.c#put_page_from_l2e+0xdf/0x110
(XEN)[] free_page_type+0x2fe/0x790
(XEN)[] mm.c#_put_page_type+0x14d/0x380
(XEN)[] mm.c#put_page_from_l3e+0x1a0/0x1d0
(XEN)[] free_page_type+0x471/0x790
(XEN)[] mm.c#_put_page_type+0x14d/0x380
(XEN)[] put_page_type_preemptible+0xf/0x10
(XEN)[] domain.c#relinquish_memory+0xab/0x460
(XEN)[] domain_relinquish_resources+0x203/0x290
(XEN)[] domain_kill+0xbd/0x150
(XEN)[] do_domctl+0x7d3/0x1a90
(XEN)[] do_physdev_op_compat+0/0x70
(XEN)[] do_domctl+0/0x1a90
(XEN)[] pv_hypercall+0x20b/0x440
(XEN)[] lstar_enter+0xa2/0x120
(XEN)[] lstar_enter+0xae/0x120
(XEN)[] lstar_enter+0xa2/0x120
(XEN)[] lstar_enter+0xae/0x120
(XEN)[] lstar_enter+0xa2/0x120
(XEN)[] lstar_enter+0xae/0x120
(XEN)[] lstar_enter+0x110/0x120
(XEN)
(XEN)
(XEN) 
(XEN) Panic on CPU 1:
(XEN) Assertion 'preemptible' failed at mm.c:2493
(XEN) 
(XEN)
(XEN) Reboot in five seconds...

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel