Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-06-02 Thread Martin Cerveny



On Thu, 2 Jun 2016, Martin Cerveny wrote:




On Wed, 1 Jun 2016, Boris Ostrovsky wrote:


On 06/01/2016 05:01 PM, Martin Cerveny wrote:

Hello.

On Wed, 1 Jun 2016, Boris Ostrovsky wrote:

On 06/01/2016 12:23 PM, Martin Cerveny wrote:

:-(

On Wed, 1 Jun 2016, Martin Cerveny wrote:

I hit probably the same error with released "XenServer 7.0".
- I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
update Xen version to 4.6.1)
- XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
- XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
- patch does not work, arch/x86/xen/mmu.c is very old in 3.10
- Can someone verify error ?

Thanks, Martin Cerveny

Crash (kernel-3.10.96-479.383024.x86_64.rpm):

 ^^^
correction: kernel-3.10.96-484.383030.x86_64.rpm

If you can provide vmlinux (better) or System.map we can probably see
whether it's the same signature.


http://xenserver.org/open-source-virtualization-download.html
->
XenServer-7.0.0-main.iso or XenServer-7.0.0-binpkg.iso
->
kernel-3.10.96-484.383030.x86_64.rpm
->
System.map-3.10.0+10  vmlinuz-3.10.0+10
->
http://s000.tinyupload.com/index.php?file_id=30528714656973136220

Thanks for analyzing, Martin



This looks like a different problem, the stack is
...
start_kernel
   cleanup_highmap
   xen_set_pmd_hyper
   arbitrary_virt_to_machine

Can you reproduce this with a newer kernel?


Thanks for analysing.

But there is no new kernel.

XenServer7 has specially crafted Centos7 kernel
( https://github.com/xenserver/linux-3.x + 
https://github.com/xenserver/linux-3.x.pg ) and will not move

to newer kernel. I must stay on this kernel because NVidia vgpu
binary blob does not support newer and nvidia refuses
to share sources to kernel bridge for vgpu ( 
https://gridforums.nvidia.com/default/topic/231/?comment=1920 )

I will stay on working XS7 beta3 kernel.

Thanks, Martin Cerveny



Now I found the problem of my XS7 crash - surprisingly "crashkernel" xen 
parameter :-)
I log error to XS https://bugs.xenserver.org/browse/XSO-554

Thanks for help, Martin Cerveny

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-06-02 Thread David Vrabel
On 01/06/16 17:12, Martin Cerveny wrote:
> Hello.
> 
> I hit probably the same error with released "XenServer 7.0".
> - I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
> update Xen version to 4.6.1)
> - XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
> - XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
> - patch does not work, arch/x86/xen/mmu.c is very old in 3.10
> - Can someone verify error ?

This list it not the correct place for XenServer support.

See http://xenserver.org/discuss-virtualization.html for available options.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-06-01 Thread Martin Cerveny



On Wed, 1 Jun 2016, Boris Ostrovsky wrote:


On 06/01/2016 05:01 PM, Martin Cerveny wrote:

Hello.

On Wed, 1 Jun 2016, Boris Ostrovsky wrote:

On 06/01/2016 12:23 PM, Martin Cerveny wrote:

:-(

On Wed, 1 Jun 2016, Martin Cerveny wrote:

I hit probably the same error with released "XenServer 7.0".
- I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
update Xen version to 4.6.1)
- XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
- XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
- patch does not work, arch/x86/xen/mmu.c is very old in 3.10
- Can someone verify error ?

Thanks, Martin Cerveny

Crash (kernel-3.10.96-479.383024.x86_64.rpm):

 ^^^
correction: kernel-3.10.96-484.383030.x86_64.rpm

If you can provide vmlinux (better) or System.map we can probably see
whether it's the same signature.


http://xenserver.org/open-source-virtualization-download.html
->
XenServer-7.0.0-main.iso or XenServer-7.0.0-binpkg.iso
->
kernel-3.10.96-484.383030.x86_64.rpm
->
System.map-3.10.0+10  vmlinuz-3.10.0+10
->
http://s000.tinyupload.com/index.php?file_id=30528714656973136220

Thanks for analyzing, Martin



This looks like a different problem, the stack is
...
start_kernel
   cleanup_highmap
   xen_set_pmd_hyper
   arbitrary_virt_to_machine

Can you reproduce this with a newer kernel?


Thanks for analysing.

But there is no new kernel.

XenServer7 has specially crafted Centos7 kernel
( https://github.com/xenserver/linux-3.x + 
https://github.com/xenserver/linux-3.x.pg ) and will not move

to newer kernel. I must stay on this kernel because NVidia vgpu
binary blob does not support newer and nvidia refuses
to share sources to kernel bridge for vgpu 
( https://gridforums.nvidia.com/default/topic/231/?comment=1920 )

I will stay on working XS7 beta3 kernel.

Thanks, Martin Cerveny

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-06-01 Thread Boris Ostrovsky
On 06/01/2016 05:01 PM, Martin Cerveny wrote:
> Hello.
>
> On Wed, 1 Jun 2016, Boris Ostrovsky wrote:
>> On 06/01/2016 12:23 PM, Martin Cerveny wrote:
>>> :-(
>>>
>>> On Wed, 1 Jun 2016, Martin Cerveny wrote:
 I hit probably the same error with released "XenServer 7.0".
 - I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
 update Xen version to 4.6.1)
 - XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
 - XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
 - patch does not work, arch/x86/xen/mmu.c is very old in 3.10
 - Can someone verify error ?

 Thanks, Martin Cerveny

 Crash (kernel-3.10.96-479.383024.x86_64.rpm):
>>>  ^^^
>>> correction: kernel-3.10.96-484.383030.x86_64.rpm
>> If you can provide vmlinux (better) or System.map we can probably see
>> whether it's the same signature.
>
> http://xenserver.org/open-source-virtualization-download.html
> ->
> XenServer-7.0.0-main.iso or XenServer-7.0.0-binpkg.iso
> ->
> kernel-3.10.96-484.383030.x86_64.rpm
> ->
> System.map-3.10.0+10  vmlinuz-3.10.0+10
> ->
> http://s000.tinyupload.com/index.php?file_id=30528714656973136220
>
> Thanks for analyzing, Martin


This looks like a different problem, the stack is
...
start_kernel
cleanup_highmap
xen_set_pmd_hyper
arbitrary_virt_to_machine

Can you reproduce this with a newer kernel?

-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-06-01 Thread Martin Cerveny

Hello.

On Wed, 1 Jun 2016, Boris Ostrovsky wrote:

On 06/01/2016 12:23 PM, Martin Cerveny wrote:

:-(

On Wed, 1 Jun 2016, Martin Cerveny wrote:

I hit probably the same error with released "XenServer 7.0".
- I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
update Xen version to 4.6.1)
- XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
- XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
- patch does not work, arch/x86/xen/mmu.c is very old in 3.10
- Can someone verify error ?

Thanks, Martin Cerveny

Crash (kernel-3.10.96-479.383024.x86_64.rpm):

 ^^^
correction: kernel-3.10.96-484.383030.x86_64.rpm

If you can provide vmlinux (better) or System.map we can probably see
whether it's the same signature.


http://xenserver.org/open-source-virtualization-download.html
->
XenServer-7.0.0-main.iso or XenServer-7.0.0-binpkg.iso
->
kernel-3.10.96-484.383030.x86_64.rpm
->
System.map-3.10.0+10  vmlinuz-3.10.0+10
->
http://s000.tinyupload.com/index.php?file_id=30528714656973136220

Thanks for analyzing, Martin


-boris





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-06-01 Thread Boris Ostrovsky
On 06/01/2016 12:23 PM, Martin Cerveny wrote:
> :-(
>
> On Wed, 1 Jun 2016, Martin Cerveny wrote:
>> I hit probably the same error with released "XenServer 7.0".
>> - I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf -
>> update Xen version to 4.6.1)
>> - XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
>> - XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
>> - patch does not work, arch/x86/xen/mmu.c is very old in 3.10
>> - Can someone verify error ?
>>
>> Thanks, Martin Cerveny
>>
>> Crash (kernel-3.10.96-479.383024.x86_64.rpm):
>  ^^^
> correction: kernel-3.10.96-484.383030.x86_64.rpm

If you can provide vmlinux (better) or System.map we can probably see
whether it's the same signature.

-boris



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-06-01 Thread Martin Cerveny

:-(

On Wed, 1 Jun 2016, Martin Cerveny wrote:

I hit probably the same error with released "XenServer 7.0".
- I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf - update 
Xen version to 4.6.1)

- XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
- XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
- patch does not work, arch/x86/xen/mmu.c is very old in 3.10
- Can someone verify error ?

Thanks, Martin Cerveny

Crash (kernel-3.10.96-479.383024.x86_64.rpm):

 ^^^
correction: kernel-3.10.96-484.383030.x86_64.rpm


about to get started...
(XEN) d0v0: unhandled page fault (ec=)
(XEN) Pagetable walk from 88010278b080:
(XEN)  L4[0x110] = 000439a0d067 1a0d
(XEN)  L3[0x004] =  
(XEN) domain_crash_sync called from entry.S: fault at 82d08022b2c3 
create_bounce_frame+0x12b/0x13a

(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) [ Xen-4.6.1-vgpu  x86_64  debug=n  Not tainted ]
(XEN) CPU:0
(XEN) RIP:e033:[]
(XEN) RFLAGS: 0282   EM: 1   CONTEXT: pv guest (d0v0)
(XEN) rax: 88010278b080   rbx: 81a1   rcx: 8880
(XEN) rdx: 3000   rsi: 81a01de4   rdi: 00043a95c067
(XEN) rbp: 81a01df8   rsp: 81a01da0   r8:  3000
(XEN) r9:  8800   r10: 0001   r11: 0001
(XEN) r12: 8000   r13: 81a1   r14: 
(XEN) r15: 0082   cr0: 8005003b   cr4: 001526e0
(XEN) cr3: 000439a0c000   cr2: 88010278b080
(XEN) ds:    es:    fs:    gs:    ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=81a01da0:
(XEN)8880 0001  81005dea
(XEN)0001e030 00010082 81a01de0 e02b
(XEN)000181a1 81a1 8000 81a01e40
(XEN)810067f6 0001 0001 81a1
(XEN)8000 83d7a000  81df
(XEN)81a01e78 81aedf2d 0114b000 0100
(XEN)   81a01ef0
(XEN)81add76b   81a01ef0
(XEN)81a01f08 0010 81a01f00 81a01ec0
(XEN)  81b69900 
(XEN)  81a01f30 81ad5bb9
(XEN) 81b732c0 81a01f60 
(XEN)  81a01f40 81ad55ee
(XEN)81a01ff8 81ad8b48 000306e4 000100200800
(XEN)03010032 0005 0020 
(XEN)   
(XEN)   
(XEN)   
(XEN)   
(XEN)0f0060c0c748 c305  
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.


On Thu, 26 May 2016, David Vrabel wrote:


On 17/05/16 16:11, David Vrabel wrote:

On 11/05/16 11:16, David Vrabel wrote:


Why don't we get the RW bits correct when making the pteval when we
already have the pfn, instead trying to fix it up afterwards.


Kevin, can you try this patch.

David

8<-
x86/xen: avoid m2p lookup when setting early page table entries

When page tables entries are set using xen_set_pte_init() during early
boot there is no page fault handler that could handle a fault when
performing an M2P lookup.

In 64 guest (usually dom0) early_ioremap() would fault in
xen_set_pte_init() because an M2P lookup faults because the MFN is in
MMIO space and not mapped in the M2P.  This lookup is done to see if
the PFN in in the range used for the initial page table pages, so that
the PTE may be set as read-only.

The M2P lookup can be avoided by moving the check (and clear of RW)
earlier when the PFN is still available.

[ Not entirely happy with this as the 32/64 bit paths diverge even
  more. Is there some way to unify them instead? ]


Boris, Juergen, any opinion on this patch?

David


--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1562,7 +1562,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
pte)
return pte;
 }
 #else /* CONFIG_X86_64 */
-static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
+static pteval_t __init mask_rw_pte(pteval_t pte)
 {
unsigned long pfn;

@@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
pte_t pte)
 * page tables for mapping the p2m list, too, and page t

Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-06-01 Thread Martin Cerveny

Hello.

I hit probably the same error with released "XenServer 7.0".
- I have Xen4.6.1 (commit d77bac5c064ffb9dbb5b89b55b89853f1b784ebf - update Xen 
version to 4.6.1)
- XS7 (Dundee) beta3 (kernel-3.10.96-479.383024.x86_64.rpm) work OK
- XS7 release (kernel-3.10.96-484.383030.x86_64.rpm) crash
- patch does not work, arch/x86/xen/mmu.c is very old in 3.10
- Can someone verify error ?

Thanks, Martin Cerveny

Crash (kernel-3.10.96-479.383024.x86_64.rpm):

about to get started...
(XEN) d0v0: unhandled page fault (ec=)
(XEN) Pagetable walk from 88010278b080:
(XEN)  L4[0x110] = 000439a0d067 1a0d
(XEN)  L3[0x004] =  
(XEN) domain_crash_sync called from entry.S: fault at 82d08022b2c3 
create_bounce_frame+0x12b/0x13a
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) [ Xen-4.6.1-vgpu  x86_64  debug=n  Not tainted ]
(XEN) CPU:0
(XEN) RIP:e033:[]
(XEN) RFLAGS: 0282   EM: 1   CONTEXT: pv guest (d0v0)
(XEN) rax: 88010278b080   rbx: 81a1   rcx: 8880
(XEN) rdx: 3000   rsi: 81a01de4   rdi: 00043a95c067
(XEN) rbp: 81a01df8   rsp: 81a01da0   r8:  3000
(XEN) r9:  8800   r10: 0001   r11: 0001
(XEN) r12: 8000   r13: 81a1   r14: 
(XEN) r15: 0082   cr0: 8005003b   cr4: 001526e0
(XEN) cr3: 000439a0c000   cr2: 88010278b080
(XEN) ds:    es:    fs:    gs:    ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=81a01da0:
(XEN)8880 0001  81005dea
(XEN)0001e030 00010082 81a01de0 e02b
(XEN)000181a1 81a1 8000 81a01e40
(XEN)810067f6 0001 0001 81a1
(XEN)8000 83d7a000  81df
(XEN)81a01e78 81aedf2d 0114b000 0100
(XEN)   81a01ef0
(XEN)81add76b   81a01ef0
(XEN)81a01f08 0010 81a01f00 81a01ec0
(XEN)  81b69900 
(XEN)  81a01f30 81ad5bb9
(XEN) 81b732c0 81a01f60 
(XEN)  81a01f40 81ad55ee
(XEN)81a01ff8 81ad8b48 000306e4 000100200800
(XEN)03010032 0005 0020 
(XEN)   
(XEN)   
(XEN)   
(XEN)   
(XEN)0f0060c0c748 c305  
(XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.


On Thu, 26 May 2016, David Vrabel wrote:


On 17/05/16 16:11, David Vrabel wrote:

On 11/05/16 11:16, David Vrabel wrote:


Why don't we get the RW bits correct when making the pteval when we
already have the pfn, instead trying to fix it up afterwards.


Kevin, can you try this patch.

David

8<-
x86/xen: avoid m2p lookup when setting early page table entries

When page tables entries are set using xen_set_pte_init() during early
boot there is no page fault handler that could handle a fault when
performing an M2P lookup.

In 64 guest (usually dom0) early_ioremap() would fault in
xen_set_pte_init() because an M2P lookup faults because the MFN is in
MMIO space and not mapped in the M2P.  This lookup is done to see if
the PFN in in the range used for the initial page table pages, so that
the PTE may be set as read-only.

The M2P lookup can be avoided by moving the check (and clear of RW)
earlier when the PFN is still available.

[ Not entirely happy with this as the 32/64 bit paths diverge even
  more. Is there some way to unify them instead? ]


Boris, Juergen, any opinion on this patch?

David


--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1562,7 +1562,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
pte)
return pte;
 }
 #else /* CONFIG_X86_64 */
-static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
+static pteval_t __init mask_rw_pte(pteval_t pte)
 {
unsigned long pfn;

@@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
pte_t pte)
 * page tables for mapping the p2m list, too, and page tables MUST be
 * mapped read-only.
 */
-   pfn = pte_pfn(pte);
+   pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
 

Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-26 Thread David Vrabel
On 26/05/16 15:05, Boris Ostrovsky wrote:
> On 05/26/2016 06:24 AM, David Vrabel wrote:
>>> @@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>>> pte_t pte)
>>>  * page tables for mapping the p2m list, too, and page tables MUST be
>>>  * mapped read-only.
>>>  */
>>> -   pfn = pte_pfn(pte);
>>> +   pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
>>> if (pfn >= xen_start_info->first_p2m_pfn &&
>>> pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
>>> -   pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
>>> +   pte &= ~_PAGE_RW;
>>>
>>> return pte;
>>>  }
>>> @@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>>> pte_t pte)
>>>   * so always write the PTE directly and rely on Xen trapping and
>>>   * emulating any updates as necessary.
>>>   */
>>> +__visible __init pte_t xen_make_pte_init(pteval_t pte)
>>> +{
>>> +#ifdef CONFIG_X86_64
>>> +   pte = mask_rw_pte(pte);
>>> +#endif
> 
> 
> Won't make_pte() be called on 32-bit as well? (And if yes then we can
> get rid of xen_set_pte_init())

Yes, but the 32-bit check needs the pointer to the PTE to see if it is
currently read-only, this isn't available in make_pte().

> (Also there were build warnings about xen_make_pte_init() being in wrong
> section because PV_CALLEE_SAVE is not __init).

I intent to fix this up before posting a v2.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-26 Thread Boris Ostrovsky
On 05/26/2016 06:24 AM, David Vrabel wrote:
>> @@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>> pte_t pte)
>>   * page tables for mapping the p2m list, too, and page tables MUST be
>>   * mapped read-only.
>>   */
>> -pfn = pte_pfn(pte);
>> +pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
>>  if (pfn >= xen_start_info->first_p2m_pfn &&
>>  pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
>> -pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
>> +pte &= ~_PAGE_RW;
>>
>>  return pte;
>>  }
>> @@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
>> pte_t pte)
>>   * so always write the PTE directly and rely on Xen trapping and
>>   * emulating any updates as necessary.
>>   */
>> +__visible __init pte_t xen_make_pte_init(pteval_t pte)
>> +{
>> +#ifdef CONFIG_X86_64
>> +pte = mask_rw_pte(pte);
>> +#endif


Won't make_pte() be called on 32-bit as well? (And if yes then we can
get rid of xen_set_pte_init())

(Also there were build warnings about xen_make_pte_init() being in wrong
section because PV_CALLEE_SAVE is not __init).

-boris



>> +pte = pte_pfn_to_mfn(pte);
>> +
>> +if ((pte & PTE_PFN_MASK) >> PAGE_SHIFT == INVALID_P2M_ENTRY)
>> +pte = 0;
>> +
>> +return native_make_pte(pte);
>> +}
>> +PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init);
>> +
>>  static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
>>  {
>> +#ifdef CONFIG_X86_32
>>  if (pte_mfn(pte) != INVALID_P2M_ENTRY)
>>  pte = mask_rw_pte(ptep, pte);
>> -else
>> -pte = __pte_ma(0);
>> -
>> +#endif
>>  native_set_pte(ptep, pte);
>>  }
>>
>> @@ -2407,6 +2420,7 @@ static void __init xen_post_allocator_init(void)
>>  pv_mmu_ops.alloc_pud = xen_alloc_pud;
>>  pv_mmu_ops.release_pud = xen_release_pud;
>>  #endif
>> +pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);
>>
>>  #ifdef CONFIG_X86_64
>>  pv_mmu_ops.write_cr3 = &xen_write_cr3;
>> @@ -2455,7 +2469,7 @@ static const struct pv_mmu_ops xen_mmu_ops
>> __initconst = {
>>  .pte_val = PV_CALLEE_SAVE(xen_pte_val),
>>  .pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
>>
>> -.make_pte = PV_CALLEE_SAVE(xen_make_pte),
>> +.make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
>>  .make_pgd = PV_CALLEE_SAVE(xen_make_pgd),
>>
>>  #ifdef CONFIG_X86_PAE
>>
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-26 Thread David Vrabel
On 17/05/16 16:11, David Vrabel wrote:
> On 11/05/16 11:16, David Vrabel wrote:
>>
>> Why don't we get the RW bits correct when making the pteval when we
>> already have the pfn, instead trying to fix it up afterwards.
> 
> Kevin, can you try this patch.
> 
> David
> 
> 8<-
> x86/xen: avoid m2p lookup when setting early page table entries
> 
> When page tables entries are set using xen_set_pte_init() during early
> boot there is no page fault handler that could handle a fault when
> performing an M2P lookup.
> 
> In 64 guest (usually dom0) early_ioremap() would fault in
> xen_set_pte_init() because an M2P lookup faults because the MFN is in
> MMIO space and not mapped in the M2P.  This lookup is done to see if
> the PFN in in the range used for the initial page table pages, so that
> the PTE may be set as read-only.
> 
> The M2P lookup can be avoided by moving the check (and clear of RW)
> earlier when the PFN is still available.
> 
> [ Not entirely happy with this as the 32/64 bit paths diverge even
>   more. Is there some way to unify them instead? ]

Boris, Juergen, any opinion on this patch?

David

> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1562,7 +1562,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
> pte)
>   return pte;
>  }
>  #else /* CONFIG_X86_64 */
> -static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
> +static pteval_t __init mask_rw_pte(pteval_t pte)
>  {
>   unsigned long pfn;
> 
> @@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
> pte_t pte)
>* page tables for mapping the p2m list, too, and page tables MUST be
>* mapped read-only.
>*/
> - pfn = pte_pfn(pte);
> + pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
>   if (pfn >= xen_start_info->first_p2m_pfn &&
>   pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
> - pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
> + pte &= ~_PAGE_RW;
> 
>   return pte;
>  }
> @@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
> pte_t pte)
>   * so always write the PTE directly and rely on Xen trapping and
>   * emulating any updates as necessary.
>   */
> +__visible __init pte_t xen_make_pte_init(pteval_t pte)
> +{
> +#ifdef CONFIG_X86_64
> + pte = mask_rw_pte(pte);
> +#endif
> + pte = pte_pfn_to_mfn(pte);
> +
> + if ((pte & PTE_PFN_MASK) >> PAGE_SHIFT == INVALID_P2M_ENTRY)
> + pte = 0;
> +
> + return native_make_pte(pte);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init);
> +
>  static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
>  {
> +#ifdef CONFIG_X86_32
>   if (pte_mfn(pte) != INVALID_P2M_ENTRY)
>   pte = mask_rw_pte(ptep, pte);
> - else
> - pte = __pte_ma(0);
> -
> +#endif
>   native_set_pte(ptep, pte);
>  }
> 
> @@ -2407,6 +2420,7 @@ static void __init xen_post_allocator_init(void)
>   pv_mmu_ops.alloc_pud = xen_alloc_pud;
>   pv_mmu_ops.release_pud = xen_release_pud;
>  #endif
> + pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);
> 
>  #ifdef CONFIG_X86_64
>   pv_mmu_ops.write_cr3 = &xen_write_cr3;
> @@ -2455,7 +2469,7 @@ static const struct pv_mmu_ops xen_mmu_ops
> __initconst = {
>   .pte_val = PV_CALLEE_SAVE(xen_pte_val),
>   .pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
> 
> - .make_pte = PV_CALLEE_SAVE(xen_make_pte),
> + .make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
>   .make_pgd = PV_CALLEE_SAVE(xen_make_pgd),
> 
>  #ifdef CONFIG_X86_PAE
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-17 Thread Kevin Moraga

On 05/17/2016 09:11 AM, David Vrabel wrote:
> On 11/05/16 11:16, David Vrabel wrote:
>> Why don't we get the RW bits correct when making the pteval when we
>> already have the pfn, instead trying to fix it up afterwards.
> Kevin, can you try this patch.
Yes :D. The patch is working fine.

I only got this warning while compiling:

WARNING: arch/x86/xen/built-in.o(.text+0x257d): Section mismatch in
reference from the variable __raw_callee_save_xen_make_pte_init to the
function .init.text:xen_make_pte_init()
The function __raw_callee_save_xen_make_pte_init() references
the function __init xen_make_pte_init().
This is often because __raw_callee_save_xen_make_pte_init lacks a __init
annotation or the annotation of xen_make_pte_init is wrong.


>
> David
>
> 8<-
> x86/xen: avoid m2p lookup when setting early page table entries
>
> When page tables entries are set using xen_set_pte_init() during early
> boot there is no page fault handler that could handle a fault when
> performing an M2P lookup.
>
> In 64 guest (usually dom0) early_ioremap() would fault in
> xen_set_pte_init() because an M2P lookup faults because the MFN is in
> MMIO space and not mapped in the M2P.  This lookup is done to see if
> the PFN in in the range used for the initial page table pages, so that
> the PTE may be set as read-only.
>
> The M2P lookup can be avoided by moving the check (and clear of RW)
> earlier when the PFN is still available.
>
> [ Not entirely happy with this as the 32/64 bit paths diverge even
>   more. Is there some way to unify them instead? ]
>
> Signed-off-by: David Vrabel 
> ---
>  arch/x86/xen/mmu.c | 28 +---
>  1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 478a2de..897fad4 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -1562,7 +1562,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
> pte)
>   return pte;
>  }
>  #else /* CONFIG_X86_64 */
> -static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
> +static pteval_t __init mask_rw_pte(pteval_t pte)
>  {
>   unsigned long pfn;
>
> @@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
> pte_t pte)
>* page tables for mapping the p2m list, too, and page tables MUST be
>* mapped read-only.
>*/
> - pfn = pte_pfn(pte);
> + pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
>   if (pfn >= xen_start_info->first_p2m_pfn &&
>   pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
> - pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
> + pte &= ~_PAGE_RW;
>
>   return pte;
>  }
> @@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
> pte_t pte)
>   * so always write the PTE directly and rely on Xen trapping and
>   * emulating any updates as necessary.
>   */
> +__visible __init pte_t xen_make_pte_init(pteval_t pte)
> +{
> +#ifdef CONFIG_X86_64
> + pte = mask_rw_pte(pte);
> +#endif
> + pte = pte_pfn_to_mfn(pte);
> +
> + if ((pte & PTE_PFN_MASK) >> PAGE_SHIFT == INVALID_P2M_ENTRY)
> + pte = 0;
> +
> + return native_make_pte(pte);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init);
> +
>  static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
>  {
> +#ifdef CONFIG_X86_32
>   if (pte_mfn(pte) != INVALID_P2M_ENTRY)
>   pte = mask_rw_pte(ptep, pte);
> - else
> - pte = __pte_ma(0);
> -
> +#endif
>   native_set_pte(ptep, pte);
>  }
>
> @@ -2407,6 +2420,7 @@ static void __init xen_post_allocator_init(void)
>   pv_mmu_ops.alloc_pud = xen_alloc_pud;
>   pv_mmu_ops.release_pud = xen_release_pud;
>  #endif
> + pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);
>
>  #ifdef CONFIG_X86_64
>   pv_mmu_ops.write_cr3 = &xen_write_cr3;
> @@ -2455,7 +2469,7 @@ static const struct pv_mmu_ops xen_mmu_ops
> __initconst = {
>   .pte_val = PV_CALLEE_SAVE(xen_pte_val),
>   .pgd_val = PV_CALLEE_SAVE(xen_pgd_val),
>
> - .make_pte = PV_CALLEE_SAVE(xen_make_pte),
> + .make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
>   .make_pgd = PV_CALLEE_SAVE(xen_make_pgd),
>
>  #ifdef CONFIG_X86_PAE

-- 
Sincerely,
Kevin Moraga
PGP: F258EDCB
Fingerprint: 3915 A5A9 959C D18F 0A89 B47E FB4B 55F5 F258 EDCB




signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-17 Thread David Vrabel
On 11/05/16 11:16, David Vrabel wrote:
> 
> Why don't we get the RW bits correct when making the pteval when we
> already have the pfn, instead trying to fix it up afterwards.

Kevin, can you try this patch.

David

8<-
x86/xen: avoid m2p lookup when setting early page table entries

When page tables entries are set using xen_set_pte_init() during early
boot there is no page fault handler that could handle a fault when
performing an M2P lookup.

In 64 guest (usually dom0) early_ioremap() would fault in
xen_set_pte_init() because an M2P lookup faults because the MFN is in
MMIO space and not mapped in the M2P.  This lookup is done to see if
the PFN in in the range used for the initial page table pages, so that
the PTE may be set as read-only.

The M2P lookup can be avoided by moving the check (and clear of RW)
earlier when the PFN is still available.

[ Not entirely happy with this as the 32/64 bit paths diverge even
  more. Is there some way to unify them instead? ]

Signed-off-by: David Vrabel 
---
 arch/x86/xen/mmu.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 478a2de..897fad4 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1562,7 +1562,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
pte)
return pte;
 }
 #else /* CONFIG_X86_64 */
-static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
+static pteval_t __init mask_rw_pte(pteval_t pte)
 {
unsigned long pfn;

@@ -1577,10 +1577,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
pte_t pte)
 * page tables for mapping the p2m list, too, and page tables MUST be
 * mapped read-only.
 */
-   pfn = pte_pfn(pte);
+   pfn = (pte & PTE_PFN_MASK) >> PAGE_SHIFT;
if (pfn >= xen_start_info->first_p2m_pfn &&
pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
-   pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
+   pte &= ~_PAGE_RW;

return pte;
 }
@@ -1600,13 +1600,26 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
pte_t pte)
  * so always write the PTE directly and rely on Xen trapping and
  * emulating any updates as necessary.
  */
+__visible __init pte_t xen_make_pte_init(pteval_t pte)
+{
+#ifdef CONFIG_X86_64
+   pte = mask_rw_pte(pte);
+#endif
+   pte = pte_pfn_to_mfn(pte);
+
+   if ((pte & PTE_PFN_MASK) >> PAGE_SHIFT == INVALID_P2M_ENTRY)
+   pte = 0;
+
+   return native_make_pte(pte);
+}
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte_init);
+
 static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
 {
+#ifdef CONFIG_X86_32
if (pte_mfn(pte) != INVALID_P2M_ENTRY)
pte = mask_rw_pte(ptep, pte);
-   else
-   pte = __pte_ma(0);
-
+#endif
native_set_pte(ptep, pte);
 }

@@ -2407,6 +2420,7 @@ static void __init xen_post_allocator_init(void)
pv_mmu_ops.alloc_pud = xen_alloc_pud;
pv_mmu_ops.release_pud = xen_release_pud;
 #endif
+   pv_mmu_ops.make_pte = PV_CALLEE_SAVE(xen_make_pte);

 #ifdef CONFIG_X86_64
pv_mmu_ops.write_cr3 = &xen_write_cr3;
@@ -2455,7 +2469,7 @@ static const struct pv_mmu_ops xen_mmu_ops
__initconst = {
.pte_val = PV_CALLEE_SAVE(xen_pte_val),
.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),

-   .make_pte = PV_CALLEE_SAVE(xen_make_pte),
+   .make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
.make_pgd = PV_CALLEE_SAVE(xen_make_pgd),

 #ifdef CONFIG_X86_PAE
-- 
2.1.4




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread Kevin Moraga
Hi Boris,

On 05/10/2016 02:11 PM, Boris Ostrovsky wrote:
> On 05/10/2016 12:11 PM, Kevin Moraga wrote:
> Can you boot your system bare-metal and post output of 'biosdecode' command?
>
> -boris
Sure, it's attached.

-- 
Sincerely,
Kevin Moraga
PGP: F258EDCB
Fingerprint: 3915 A5A9 959C D18F 0A89 B47E FB4B 55F5 F258 EDCB

# biosdecode 2.12
SMBIOS 2.8 present.
Structure Table Length: 3297 bytes
Structure Table Address: 0xD7BDC000
Number Of Structures: 66
Maximum Structure Size: 287 bytes
ACPI 2.0 present.
OEM Identifier: LENOVO
RSD Table 32-bit Address: 0xD7FD10C4
XSD Table 64-bit Address: 0xD7FD1188
PNP BIOS 1.0 present.
Event Notification: Not Supported
Real Mode 16-bit Code Address: F000:0A6D
Real Mode 16-bit Data Address: F000:
16-bit Protected Mode Code Address: 0x000F0A48
16-bit Protected Mode Data Address: 0x000F
BIOS32 Service Directory present.
Revision: 0
Calling Interface Address: 0x000FD000
PCI Interrupt Routing 1.0 present.
Router ID: 00:1f.0
Exclusive IRQs: None
Compatible Router: 8086:9d48
Slot Entry 1: ID 00:02, on-board
Slot Entry 2: ID 00:14, on-board
Slot Entry 3: ID 00:16, on-board
Slot Entry 4: ID 00:17, on-board
Slot Entry 5: ID 00:1c, on-board
Slot Entry 6: ID 02:00, slot number 33
Slot Entry 7: ID 04:00, slot number 8
Slot Entry 8: ID 00:1f, on-board
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread Juergen Gross
On 11/05/16 14:48, David Vrabel wrote:
> On 11/05/16 13:21, Jan Beulich wrote:
> On 11.05.16 at 12:16,  wrote:
>>> On 11/05/16 08:00, Juergen Gross wrote:
 Adding David as he removed _PAGE_IOMAP in kernel 3.18.
>>>
>>> Why don't we get the RW bits correct when making the pteval when we
>>> already have the pfn, instead trying to fix it up afterwards.
>>
>> While it looks like this would help in this specific situation, the next
>> time something is found to access the M2P early, that would need
>> another fix then. I.e. dealing with the underlying more general
>> issue would seem preferable to me.
> 
> I'm more concerned with future regression caused by changes to the
> generic x86 code to (for example) install a different early page fault
> handler.
> 
> Can we fix this specific issue in the way I suggested (avoiding the
> unnecessary m2p lookup entirely) and then discuss the merits of the page
> fault handler approach as a separate topic?

Sure.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread Jan Beulich
>>> On 11.05.16 at 14:48,  wrote:
> On 11/05/16 13:21, Jan Beulich wrote:
> On 11.05.16 at 12:16,  wrote:
>>> On 11/05/16 08:00, Juergen Gross wrote:
 Adding David as he removed _PAGE_IOMAP in kernel 3.18.
>>>
>>> Why don't we get the RW bits correct when making the pteval when we
>>> already have the pfn, instead trying to fix it up afterwards.
>> 
>> While it looks like this would help in this specific situation, the next
>> time something is found to access the M2P early, that would need
>> another fix then. I.e. dealing with the underlying more general
>> issue would seem preferable to me.
> 
> I'm more concerned with future regression caused by changes to the
> generic x86 code to (for example) install a different early page fault
> handler.
> 
> Can we fix this specific issue in the way I suggested (avoiding the
> unnecessary m2p lookup entirely) and then discuss the merits of the page
> fault handler approach as a separate topic?

That's fine with me.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread David Vrabel
On 11/05/16 13:21, Jan Beulich wrote:
 On 11.05.16 at 12:16,  wrote:
>> On 11/05/16 08:00, Juergen Gross wrote:
>>> Adding David as he removed _PAGE_IOMAP in kernel 3.18.
>>
>> Why don't we get the RW bits correct when making the pteval when we
>> already have the pfn, instead trying to fix it up afterwards.
> 
> While it looks like this would help in this specific situation, the next
> time something is found to access the M2P early, that would need
> another fix then. I.e. dealing with the underlying more general
> issue would seem preferable to me.

I'm more concerned with future regression caused by changes to the
generic x86 code to (for example) install a different early page fault
handler.

Can we fix this specific issue in the way I suggested (avoiding the
unnecessary m2p lookup entirely) and then discuss the merits of the page
fault handler approach as a separate topic?

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread Jan Beulich
>>> On 11.05.16 at 12:16,  wrote:
> On 11/05/16 08:00, Juergen Gross wrote:
>> Adding David as he removed _PAGE_IOMAP in kernel 3.18.
> 
> Why don't we get the RW bits correct when making the pteval when we
> already have the pfn, instead trying to fix it up afterwards.

While it looks like this would help in this specific situation, the next
time something is found to access the M2P early, that would need
another fix then. I.e. dealing with the underlying more general
issue would seem preferable to me.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread Jan Beulich
>>> On 11.05.16 at 12:10,  wrote:
> On 11/05/16 12:03, Jan Beulich wrote:
> On 11.05.16 at 11:57,  wrote:
>>> On 11/05/16 09:15, Jan Beulich wrote:
>>> On 11.05.16 at 09:00,  wrote:
> Having a Xen specific pte flag seems to be much more intrusive than
> having an early boot page fault handler consisting of just one line
> being capable to mimic the default handler in just one aspect (see
> attached patch - only compile tested).

 Well, this simple handler may serve the purpose here, but what's
 the effect of having it in place on actual #PF (resulting e.g. from
 a bug somewhere)? I.e. what diagnostic information will be
 available to the developer in that case, now that the hypervisor
 won't help out anymore?
>>>
>>> Good point. As fixup_exception() is returning 0 in this case we can
>>> set the #PF handler to NULL again and retry the failing instruction.
>>> This will then lead to the same hypervisor handled case as today.
>> 
>> And how would you mean to set the #PF handler to this tiny one
>> again for the next M2P access? You simply can't have both, I'm afraid.
> 
> Why would I need another #PF handler after a crash? I meant something
> like:
> 
> +dotraplinkage void notrace
> +xen_do_page_fault(struct pt_regs *regs, unsigned long error_code)
> +{
> +   if (!fixup_exception(regs, X86_TRAP_PF))
> +   set_intr_gate_notrace(X86_TRAP_PF, NULL);
> +}

Ah, right, that should work (albeit looks a bit, well, odd).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread David Vrabel
On 11/05/16 08:00, Juergen Gross wrote:
> On 11/05/16 08:35, Jan Beulich wrote:
> On 11.05.16 at 07:49,  wrote:
>>> On 10/05/16 18:35, Boris Ostrovsky wrote:
 On 05/10/2016 11:43 AM, Juergen Gross wrote:
> On 10/05/16 17:35, Jan Beulich wrote:
> On 10.05.16 at 17:19,  wrote:
>>> On 10/05/16 15:57, Jan Beulich wrote:
>>> On 10.05.16 at 15:39,  wrote:
> I didn't finish unwrapping the stack yesterday. Here it is:
>
> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
 Ah, that makes sense. Yet why would early_ioremap() involve an
 M2P lookup? As said, MMIO addresses shouldn't be subject to such
 lookups.
>>> early_ioremap()->
>>>   __early_ioremap()->
>>> __early_set_fixmap()->
>>>   set_pte()->
>>> xen_set_pte_init()->
>>>   mask_rw_pte()->
>>> pte_pfn()->
>>>   pte_val()->
>>> xen_pte_val()->
>>>   pte_mfn_to_pfn()
>> Well, I understand (also from Boris' first reply) that's how it is,
>> but not why it is so. I.e. the call flow above doesn't answer my
>> question.
> On x86 early_ioremap() and early_memremap() share a common sub-function
> __early_ioremap(). This together with pvops requires a common set_pte()
> implementation leading to the mfn validation in the end.

 Do we make any assumptions about where DMI data lives?
>>>
>>> I don't think so.
>>>
>>> So the basic problem is the page fault due to the sparse m2p map before
>>> the #PF handler is registered.
>>>
>>> What do you think about registering a minimal #PF handler in
>>> xen_arch_setup() being capable to handle this problem? This should be
>>> doable without major problems. I can do a patch.
>>
>> To me that would feel like working around the issue instead of
>> admitting that the removal of _PAGE_IOMAP was a mistake.
> 
> Hmm, I don't think so.
> 
> Having a Xen specific pte flag seems to be much more intrusive than
> having an early boot page fault handler consisting of just one line
> being capable to mimic the default handler in just one aspect (see
> attached patch - only compile tested).
> 
> Adding David as he removed _PAGE_IOMAP in kernel 3.18.

Why don't we get the RW bits correct when making the pteval when we
already have the pfn, instead trying to fix it up afterwards.

Something like this:

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 478a2de..d187368 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -430,6 +430,22 @@ __visible pte_t xen_make_pte(pteval_t pte)
 }
 PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);

+__visible __init pte_t xen_make_pte_init(pteval_t pte)
+{
+   unsigned long pfn = pte_mfn(pte);
+
+#ifdef CONFIG_X86_64
+   pte = mask_rw_pte(pte);
+#endif
+   pte = pte_pfn_to_mfn(pte);
+
+   if (pte_mfn(pte) == INVALID_P2M_ENTRY)
+   pte = __pte_ma(0);
+
+   return native_make_pte(pte);
+}
+PV_CALLEE_SAVE_REGS_THUNK(xen_make_pte);
+
 __visible pgd_t xen_make_pgd(pgdval_t pgd)
 {
pgd = pte_pfn_to_mfn(pgd);
@@ -1562,7 +1578,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
pte)
return pte;
 }
 #else /* CONFIG_X86_64 */
-static pte_t __init mask_rw_pte(pte_t *ptep, pte_t pte)
+static pte_t __init mask_rw_pte(pte_t pte)
 {
unsigned long pfn;

@@ -1577,7 +1593,7 @@ static pte_t __init mask_rw_pte(pte_t *ptep, pte_t
pte)
 * page tables for mapping the p2m list, too, and page tables MUST be
 * mapped read-only.
 */
-   pfn = pte_pfn(pte);
+   pfn = pte_mfn(pte);
if (pfn >= xen_start_info->first_p2m_pfn &&
pfn < xen_start_info->first_p2m_pfn + xen_start_info->nr_p2m_frames)
pte = __pte_ma(pte_val_ma(pte) & ~_PAGE_RW);
@@ -1602,11 +1618,10 @@ static pte_t __init mask_rw_pte(pte_t *ptep,
pte_t pte)
  */
 static void __init xen_set_pte_init(pte_t *ptep, pte_t pte)
 {
+#ifdef CONFIG_X86_32
if (pte_mfn(pte) != INVALID_P2M_ENTRY)
pte = mask_rw_pte(ptep, pte);
-   else
-   pte = __pte_ma(0);
-
+#endif
native_set_pte(ptep, pte);
 }

@@ -2407,6 +2422,7 @@ static void __init xen_post_allocator_init(void)
pv_mmu_ops.alloc_pud = xen_alloc_pud;
pv_mmu_ops.release_pud = xen_release_pud;
 #endif
+   pv_mmu_ops.make_pte = xen_make_pte;

 #ifdef CONFIG_X86_64
pv_mmu_ops.write_cr3 = &xen_write_cr3;
@@ -2455,7 +2471,7 @@ static const struct pv_mmu_ops xen_mmu_ops
__initconst = {
.pte_val = PV_CALLEE_SAVE(xen_pte_val),
.pgd_val = PV_CALLEE_SAVE(xen_pgd_val),

-   .make_pte = PV_CALLEE_SAVE(xen_make_pte),
+   .make_pte = PV_CALLEE_SAVE(xen_make_pte_init),
.make_pgd = PV_CALLEE_SAVE(xen_make_pgd),

 #ifdef CONFIG_X86_PAE


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread Juergen Gross
On 11/05/16 12:03, Jan Beulich wrote:
 On 11.05.16 at 11:57,  wrote:
>> On 11/05/16 09:15, Jan Beulich wrote:
>> On 11.05.16 at 09:00,  wrote:
 Having a Xen specific pte flag seems to be much more intrusive than
 having an early boot page fault handler consisting of just one line
 being capable to mimic the default handler in just one aspect (see
 attached patch - only compile tested).
>>>
>>> Well, this simple handler may serve the purpose here, but what's
>>> the effect of having it in place on actual #PF (resulting e.g. from
>>> a bug somewhere)? I.e. what diagnostic information will be
>>> available to the developer in that case, now that the hypervisor
>>> won't help out anymore?
>>
>> Good point. As fixup_exception() is returning 0 in this case we can
>> set the #PF handler to NULL again and retry the failing instruction.
>> This will then lead to the same hypervisor handled case as today.
> 
> And how would you mean to set the #PF handler to this tiny one
> again for the next M2P access? You simply can't have both, I'm afraid.

Why would I need another #PF handler after a crash? I meant something
like:

+dotraplinkage void notrace
+xen_do_page_fault(struct pt_regs *regs, unsigned long error_code)
+{
+   if (!fixup_exception(regs, X86_TRAP_PF))
+   set_intr_gate_notrace(X86_TRAP_PF, NULL);
+}


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread Jan Beulich
>>> On 11.05.16 at 11:57,  wrote:
> On 11/05/16 09:15, Jan Beulich wrote:
> On 11.05.16 at 09:00,  wrote:
>>> Having a Xen specific pte flag seems to be much more intrusive than
>>> having an early boot page fault handler consisting of just one line
>>> being capable to mimic the default handler in just one aspect (see
>>> attached patch - only compile tested).
>> 
>> Well, this simple handler may serve the purpose here, but what's
>> the effect of having it in place on actual #PF (resulting e.g. from
>> a bug somewhere)? I.e. what diagnostic information will be
>> available to the developer in that case, now that the hypervisor
>> won't help out anymore?
> 
> Good point. As fixup_exception() is returning 0 in this case we can
> set the #PF handler to NULL again and retry the failing instruction.
> This will then lead to the same hypervisor handled case as today.

And how would you mean to set the #PF handler to this tiny one
again for the next M2P access? You simply can't have both, I'm afraid.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread Juergen Gross
On 11/05/16 09:15, Jan Beulich wrote:
 On 11.05.16 at 09:00,  wrote:
>> Having a Xen specific pte flag seems to be much more intrusive than
>> having an early boot page fault handler consisting of just one line
>> being capable to mimic the default handler in just one aspect (see
>> attached patch - only compile tested).
> 
> Well, this simple handler may serve the purpose here, but what's
> the effect of having it in place on actual #PF (resulting e.g. from
> a bug somewhere)? I.e. what diagnostic information will be
> available to the developer in that case, now that the hypervisor
> won't help out anymore?

Good point. As fixup_exception() is returning 0 in this case we can
set the #PF handler to NULL again and retry the failing instruction.
This will then lead to the same hypervisor handled case as today.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread Jan Beulich
>>> On 11.05.16 at 09:00,  wrote:
> Having a Xen specific pte flag seems to be much more intrusive than
> having an early boot page fault handler consisting of just one line
> being capable to mimic the default handler in just one aspect (see
> attached patch - only compile tested).

Well, this simple handler may serve the purpose here, but what's
the effect of having it in place on actual #PF (resulting e.g. from
a bug somewhere)? I.e. what diagnostic information will be
available to the developer in that case, now that the hypervisor
won't help out anymore?

As to the Xen-specific-ness of such a flag: ARM also has a
distinct FIXMAP_PAGE_IO, and in all reality that's what we care
about here. Whether that translates to a separate flag on x86 is
a secondary aspect. That said, I certainly understand that
re-introduction of the flag wouldn't be liked by the x86 maintainers
(and likely also not by David and others), but the question to me is
what the downsides are of not having it, not so much whether it
is "nice".

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-11 Thread Juergen Gross
On 11/05/16 08:35, Jan Beulich wrote:
 On 11.05.16 at 07:49,  wrote:
>> On 10/05/16 18:35, Boris Ostrovsky wrote:
>>> On 05/10/2016 11:43 AM, Juergen Gross wrote:
 On 10/05/16 17:35, Jan Beulich wrote:
 On 10.05.16 at 17:19,  wrote:
>> On 10/05/16 15:57, Jan Beulich wrote:
>> On 10.05.16 at 15:39,  wrote:
 I didn't finish unwrapping the stack yesterday. Here it is:

 setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>>> Ah, that makes sense. Yet why would early_ioremap() involve an
>>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>>> lookups.
>> early_ioremap()->
>>   __early_ioremap()->
>> __early_set_fixmap()->
>>   set_pte()->
>> xen_set_pte_init()->
>>   mask_rw_pte()->
>> pte_pfn()->
>>   pte_val()->
>> xen_pte_val()->
>>   pte_mfn_to_pfn()
> Well, I understand (also from Boris' first reply) that's how it is,
> but not why it is so. I.e. the call flow above doesn't answer my
> question.
 On x86 early_ioremap() and early_memremap() share a common sub-function
 __early_ioremap(). This together with pvops requires a common set_pte()
 implementation leading to the mfn validation in the end.
>>>
>>> Do we make any assumptions about where DMI data lives?
>>
>> I don't think so.
>>
>> So the basic problem is the page fault due to the sparse m2p map before
>> the #PF handler is registered.
>>
>> What do you think about registering a minimal #PF handler in
>> xen_arch_setup() being capable to handle this problem? This should be
>> doable without major problems. I can do a patch.
> 
> To me that would feel like working around the issue instead of
> admitting that the removal of _PAGE_IOMAP was a mistake.

Hmm, I don't think so.

Having a Xen specific pte flag seems to be much more intrusive than
having an early boot page fault handler consisting of just one line
being capable to mimic the default handler in just one aspect (see
attached patch - only compile tested).

Adding David as he removed _PAGE_IOMAP in kernel 3.18.


Juergen
commit 272793dcb989fc1ff2caaa9519f8f1ea5434b578
Author: Juergen Gross 
Date:   Wed May 11 07:53:54 2016 +0200

xen: register early page fault handler

In early boot of dom0 accesses to the sparse m2p list of the hypervisor
can result in unhandled page faults as the #PF handler handling this
case via exception table isn't yet registered.

Install a primitive early page fault handler for this case.

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 858b555..a20ea98 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -911,6 +911,7 @@ idtentry stack_segment		do_stack_segment	has_error_code=1
 idtentry xen_debug		do_debug		has_error_code=0
 idtentry xen_int3		do_int3			has_error_code=0
 idtentry xen_stack_segment	do_stack_segment	has_error_code=1
+idtentry xen_page_fault		xen_do_page_fault	has_error_code=1
 #endif
 
 idtentry general_protection	do_general_protection	has_error_code=1
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index c3496619..f91cb3f 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -16,6 +16,7 @@ asmlinkage void int3(void);
 asmlinkage void xen_debug(void);
 asmlinkage void xen_int3(void);
 asmlinkage void xen_stack_segment(void);
+asmlinkage void xen_page_fault(void);
 asmlinkage void overflow(void);
 asmlinkage void bounds(void);
 asmlinkage void invalid_op(void);
@@ -54,6 +55,7 @@ asmlinkage void trace_page_fault(void);
 #define trace_alignment_check alignment_check
 #define trace_simd_coprocessor_error simd_coprocessor_error
 #define trace_async_page_fault async_page_fault
+#define trace_xen_page_fault xen_page_fault
 #endif
 
 dotraplinkage void do_divide_error(struct pt_regs *, long);
@@ -74,6 +76,7 @@ asmlinkage struct pt_regs *sync_regs(struct pt_regs *);
 #endif
 dotraplinkage void do_general_protection(struct pt_regs *, long);
 dotraplinkage void do_page_fault(struct pt_regs *, unsigned long);
+dotraplinkage void xen_do_page_fault(struct pt_regs *, unsigned long);
 #ifdef CONFIG_TRACING
 dotraplinkage void trace_do_page_fault(struct pt_regs *, unsigned long);
 #else
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 7ab2951..eaee9d3 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -17,7 +17,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -1067,4 +1070,19 @@ void __init xen_arch_setup(void)
 #ifdef CONFIG_NUMA
 	numa_off = 1;
 #endif
+
+	sort_main_extable();
+	set_intr_gate(X86_TRAP_PF, xen_page_fault);
+}
+
+/*
+ * Early page fault handler being capable to handle page faults resulting
+ * from accesses via xen_safe_read_ulong().
+ * This page fault handler will be active in early boot only. It is being
+ * 

Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Jan Beulich
>>> On 11.05.16 at 07:49,  wrote:
> On 10/05/16 18:35, Boris Ostrovsky wrote:
>> On 05/10/2016 11:43 AM, Juergen Gross wrote:
>>> On 10/05/16 17:35, Jan Beulich wrote:
>>> On 10.05.16 at 17:19,  wrote:
> On 10/05/16 15:57, Jan Beulich wrote:
> On 10.05.16 at 15:39,  wrote:
>>> I didn't finish unwrapping the stack yesterday. Here it is:
>>>
>>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>> Ah, that makes sense. Yet why would early_ioremap() involve an
>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>> lookups.
> early_ioremap()->
>   __early_ioremap()->
> __early_set_fixmap()->
>   set_pte()->
> xen_set_pte_init()->
>   mask_rw_pte()->
> pte_pfn()->
>   pte_val()->
> xen_pte_val()->
>   pte_mfn_to_pfn()
 Well, I understand (also from Boris' first reply) that's how it is,
 but not why it is so. I.e. the call flow above doesn't answer my
 question.
>>> On x86 early_ioremap() and early_memremap() share a common sub-function
>>> __early_ioremap(). This together with pvops requires a common set_pte()
>>> implementation leading to the mfn validation in the end.
>> 
>> Do we make any assumptions about where DMI data lives?
> 
> I don't think so.
> 
> So the basic problem is the page fault due to the sparse m2p map before
> the #PF handler is registered.
> 
> What do you think about registering a minimal #PF handler in
> xen_arch_setup() being capable to handle this problem? This should be
> doable without major problems. I can do a patch.

To me that would feel like working around the issue instead of
admitting that the removal of _PAGE_IOMAP was a mistake.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Juergen Gross
On 10/05/16 18:35, Boris Ostrovsky wrote:
> On 05/10/2016 11:43 AM, Juergen Gross wrote:
>> On 10/05/16 17:35, Jan Beulich wrote:
>> On 10.05.16 at 17:19,  wrote:
 On 10/05/16 15:57, Jan Beulich wrote:
 On 10.05.16 at 15:39,  wrote:
>> I didn't finish unwrapping the stack yesterday. Here it is:
>>
>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
> Ah, that makes sense. Yet why would early_ioremap() involve an
> M2P lookup? As said, MMIO addresses shouldn't be subject to such
> lookups.
 early_ioremap()->
   __early_ioremap()->
 __early_set_fixmap()->
   set_pte()->
 xen_set_pte_init()->
   mask_rw_pte()->
 pte_pfn()->
   pte_val()->
 xen_pte_val()->
   pte_mfn_to_pfn()
>>> Well, I understand (also from Boris' first reply) that's how it is,
>>> but not why it is so. I.e. the call flow above doesn't answer my
>>> question.
>> On x86 early_ioremap() and early_memremap() share a common sub-function
>> __early_ioremap(). This together with pvops requires a common set_pte()
>> implementation leading to the mfn validation in the end.
> 
> Do we make any assumptions about where DMI data lives?

I don't think so.

So the basic problem is the page fault due to the sparse m2p map before
the #PF handler is registered.

What do you think about registering a minimal #PF handler in
xen_arch_setup() being capable to handle this problem? This should be
doable without major problems. I can do a patch.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Boris Ostrovsky
On 05/10/2016 12:11 PM, Kevin Moraga wrote:
>

Can you boot your system bare-metal and post output of 'biosdecode' command?

-boris

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Boris Ostrovsky
On 05/10/2016 11:43 AM, Juergen Gross wrote:
> On 10/05/16 17:35, Jan Beulich wrote:
> On 10.05.16 at 17:19,  wrote:
>>> On 10/05/16 15:57, Jan Beulich wrote:
>>> On 10.05.16 at 15:39,  wrote:
> I didn't finish unwrapping the stack yesterday. Here it is:
>
> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
 Ah, that makes sense. Yet why would early_ioremap() involve an
 M2P lookup? As said, MMIO addresses shouldn't be subject to such
 lookups.
>>> early_ioremap()->
>>>   __early_ioremap()->
>>> __early_set_fixmap()->
>>>   set_pte()->
>>> xen_set_pte_init()->
>>>   mask_rw_pte()->
>>> pte_pfn()->
>>>   pte_val()->
>>> xen_pte_val()->
>>>   pte_mfn_to_pfn()
>> Well, I understand (also from Boris' first reply) that's how it is,
>> but not why it is so. I.e. the call flow above doesn't answer my
>> question.
> On x86 early_ioremap() and early_memremap() share a common sub-function
> __early_ioremap(). This together with pvops requires a common set_pte()
> implementation leading to the mfn validation in the end.

Do we make any assumptions about where DMI data lives?

-boris



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Kevin Moraga


On 05/10/2016 01:23 AM, Jan Beulich wrote:
 On 09.05.16 at 20:40,  wrote:
>> On 05/09/2016 01:22 PM, Kevin Moraga wrote:
>>> On 05/09/2016 11:15 AM, Boris Ostrovsky wrote:
 On 05/09/2016 12:40 PM, Kevin Moraga wrote:
> On 05/09/2016 09:53 AM, Jan Beulich wrote:
> On 09.05.16 at 16:52,  wrote:
>>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>> On 09.05.16 at 00:51,  wrote:
> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 
> 4.6.0
> and Intel Skylake processor (Intel Core i7-6600U)
>
> This kernel is crashing almost in the same way as explained in this
> thread... But my problem is mainly with Skylake. Because the same
> configuration works within another machine but with another processor
> (Intel Core i5-3340M). Attached are the boot logs.
 The address the fault occurs on (806bdee0) is bogus, so
 from the register and stack dump alone I don't think we can derive
 much. What we'd need is access to the kernel binary used (or
 really the vmlinux accompanying the vmlinuz that was used), in
 order to see where exactly the kernel died, and hence where this
 bogus address originates from. As I understand it this is a kernel
 you built yourself - can you make said binary from exactly that
 build available somewhere? 
>>> Yes I have it. But I get the same crash on various 4.4.X and also with
>>> 4.5.3.
>>>
>>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
>> Well, this doesn't contain the file I'm after (vmlinux), and taking
>> apart vmlinuz would be quite cumbersome.
>>
>> Jan
>>
> Oh sorry, here is the link to vmlinux
>
>
>> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing
>>  
 This is still vmlinuz but the failure is at

 81007ef3:   48 3b 1d 4e 2e ec 00cmp   
 0xec2e4e(%rip),%rbx# 0x81ecad48
 81007efa:   73 51   jae0x81007f4d
 81007efc:   31 c0   xor%eax,%eax
 81007efe:   48 8b 15 03 d2 c0 00mov   
 0xc0d203(%rip),%rdx# 0x81c15108
 81007f05:   90  nop
 81007f06:   90  nop
 81007f07:   90  nop
 81007f08:   4c 8b 2c da mov   
 (%rdx,%rbx,8),%r13<==
 81007f0c:   90  nop
 81007f0d:   90  nop
 81007f0e:   90  nop
 81007f0f:   85 c0   test   %eax,%eax
 81007f11:   78 3a   js 0x81007f4d
 81007f13:   48 8b 05 ee 11 d2 00mov   
 0xd211ee(%rip),%rax# 0x81d29108
 81007f1a:   49 39 c5cmp%rax,%r13
 81007f1d:   73 6f   jae0x81007f8e
 81007f1f:   48 8b 05 ea 11 d2 00mov   
 0xd211ea(%rip),%rax# 0x81d29110
 81007f26:   4a 8b 04 e8 mov(%rax,%r13,8),%rax

 Any chance you could provide an un-stripped binary or System.map?
>>> Here is the link for System.map
>>>
>>>
>> https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing
>>  
>>
>> So my semi-educated guess at your stack is
>> __early_ioremap
>>   -> __early_set_fixmap
>> -> set_pte
>>   -> xen_set_pte_init
>> -> mask_rw_pte
>>   -> pte_pfn
>> -> pte_val
>>-> xen_pte_val
>>  -> pte_mfn_to_pfn
>>-> mfn_to_pfn_no_overrides
>>  -> ret =
>> xen_safe_read_ulong(&machine_to_phys_mapping[mfn], &pfn)
>>
>>
>> With 81007f08 being the faulted address the last one looks
>> plausible:
>>
>>
>> 81007efe:   48 8b 15 03 d2 c0 00mov   
>> 0xc0d203(%rip),%rdx# 0x81c15108
>> 81007f05:   90  nop
>> 81007f06:   90  nop
>> 81007f07:   90  nop
>> 81007f08:   4c 8b 2c da   mov(%rdx,%rbx,8),%r13
>>
>> since
>>
>> ostr@workbase> grep  81c15108
>> /tmp/System.map-4.4.8-9.pvops.qubes.x86_64
>> 81c15108 D machine_to_phys_mapping
>> ostr@workbase>
>>
>> But %rdx is not 81c15108, it is 8000:
>>
>> (XEN) rax:    rbx: 000d7bdc   rcx: 880002059000
>> (XEN) rdx: 8000   rsi: 8000d7bdc063   rdi: 8000d7bdc063
> But that's a MOV above, i.e. %rdx = [0x81c15108], which
> sensibly is MACH2PHYS_VIRT_START. And the MFN in %rbx
> would then match with the value in %cr2. Question is -

Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Juergen Gross
On 10/05/16 17:35, Jan Beulich wrote:
 On 10.05.16 at 17:19,  wrote:
>> On 10/05/16 15:57, Jan Beulich wrote:
>> On 10.05.16 at 15:39,  wrote:
 I didn't finish unwrapping the stack yesterday. Here it is:

 setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>>>
>>> Ah, that makes sense. Yet why would early_ioremap() involve an
>>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>>> lookups.
>>
>> early_ioremap()->
>>   __early_ioremap()->
>> __early_set_fixmap()->
>>   set_pte()->
>> xen_set_pte_init()->
>>   mask_rw_pte()->
>> pte_pfn()->
>>   pte_val()->
>> xen_pte_val()->
>>   pte_mfn_to_pfn()
> 
> Well, I understand (also from Boris' first reply) that's how it is,
> but not why it is so. I.e. the call flow above doesn't answer my
> question.

On x86 early_ioremap() and early_memremap() share a common sub-function
__early_ioremap(). This together with pvops requires a common set_pte()
implementation leading to the mfn validation in the end.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Jan Beulich
>>> On 10.05.16 at 17:19,  wrote:
> On 10/05/16 15:57, Jan Beulich wrote:
> On 10.05.16 at 15:39,  wrote:
>>> I didn't finish unwrapping the stack yesterday. Here it is:
>>>
>>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
>> 
>> Ah, that makes sense. Yet why would early_ioremap() involve an
>> M2P lookup? As said, MMIO addresses shouldn't be subject to such
>> lookups.
> 
> early_ioremap()->
>   __early_ioremap()->
> __early_set_fixmap()->
>   set_pte()->
> xen_set_pte_init()->
>   mask_rw_pte()->
> pte_pfn()->
>   pte_val()->
> xen_pte_val()->
>   pte_mfn_to_pfn()

Well, I understand (also from Boris' first reply) that's how it is,
but not why it is so. I.e. the call flow above doesn't answer my
question.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Juergen Gross
On 10/05/16 15:57, Jan Beulich wrote:
 On 10.05.16 at 15:39,  wrote:
>> I didn't finish unwrapping the stack yesterday. Here it is:
>>
>> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap
> 
> Ah, that makes sense. Yet why would early_ioremap() involve an
> M2P lookup? As said, MMIO addresses shouldn't be subject to such
> lookups.

early_ioremap()->
  __early_ioremap()->
__early_set_fixmap()->
  set_pte()->
xen_set_pte_init()->
  mask_rw_pte()->
pte_pfn()->
  pte_val()->
xen_pte_val()->
  pte_mfn_to_pfn()


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Jan Beulich
>>> On 10.05.16 at 15:39,  wrote:
> I didn't finish unwrapping the stack yesterday. Here it is:
> 
> setup_arch -> dmi_scan_machine -> dmi_walk_early -> early_ioremap

Ah, that makes sense. Yet why would early_ioremap() involve an
M2P lookup? As said, MMIO addresses shouldn't be subject to such
lookups.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Boris Ostrovsky
On 05/10/2016 03:23 AM, Jan Beulich wrote:
 On 09.05.16 at 20:40,  wrote:
>> On 05/09/2016 01:22 PM, Kevin Moraga wrote:
>>> On 05/09/2016 11:15 AM, Boris Ostrovsky wrote:
 On 05/09/2016 12:40 PM, Kevin Moraga wrote:
> On 05/09/2016 09:53 AM, Jan Beulich wrote:
> On 09.05.16 at 16:52,  wrote:
>>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>> On 09.05.16 at 00:51,  wrote:
> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 
> 4.6.0
> and Intel Skylake processor (Intel Core i7-6600U)
>
> This kernel is crashing almost in the same way as explained in this
> thread... But my problem is mainly with Skylake. Because the same
> configuration works within another machine but with another processor
> (Intel Core i5-3340M). Attached are the boot logs.
 The address the fault occurs on (806bdee0) is bogus, so
 from the register and stack dump alone I don't think we can derive
 much. What we'd need is access to the kernel binary used (or
 really the vmlinux accompanying the vmlinuz that was used), in
 order to see where exactly the kernel died, and hence where this
 bogus address originates from. As I understand it this is a kernel
 you built yourself - can you make said binary from exactly that
 build available somewhere? 
>>> Yes I have it. But I get the same crash on various 4.4.X and also with
>>> 4.5.3.
>>>
>>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
>> Well, this doesn't contain the file I'm after (vmlinux), and taking
>> apart vmlinuz would be quite cumbersome.
>>
>> Jan
>>
> Oh sorry, here is the link to vmlinux
>
>
>> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing
>>  
 This is still vmlinuz but the failure is at

 81007ef3:   48 3b 1d 4e 2e ec 00cmp   
 0xec2e4e(%rip),%rbx# 0x81ecad48
 81007efa:   73 51   jae0x81007f4d
 81007efc:   31 c0   xor%eax,%eax
 81007efe:   48 8b 15 03 d2 c0 00mov   
 0xc0d203(%rip),%rdx# 0x81c15108
 81007f05:   90  nop
 81007f06:   90  nop
 81007f07:   90  nop
 81007f08:   4c 8b 2c da mov   
 (%rdx,%rbx,8),%r13<==
 81007f0c:   90  nop
 81007f0d:   90  nop
 81007f0e:   90  nop
 81007f0f:   85 c0   test   %eax,%eax
 81007f11:   78 3a   js 0x81007f4d
 81007f13:   48 8b 05 ee 11 d2 00mov   
 0xd211ee(%rip),%rax# 0x81d29108
 81007f1a:   49 39 c5cmp%rax,%r13
 81007f1d:   73 6f   jae0x81007f8e
 81007f1f:   48 8b 05 ea 11 d2 00mov   
 0xd211ea(%rip),%rax# 0x81d29110
 81007f26:   4a 8b 04 e8 mov(%rax,%r13,8),%rax

 Any chance you could provide an un-stripped binary or System.map?
>>> Here is the link for System.map
>>>
>>>
>> https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing
>>  
>>
>> So my semi-educated guess at your stack is
>> __early_ioremap
>>   -> __early_set_fixmap
>> -> set_pte
>>   -> xen_set_pte_init
>> -> mask_rw_pte
>>   -> pte_pfn
>> -> pte_val
>>-> xen_pte_val
>>  -> pte_mfn_to_pfn
>>-> mfn_to_pfn_no_overrides
>>  -> ret =
>> xen_safe_read_ulong(&machine_to_phys_mapping[mfn], &pfn)
>>
>>
>> With 81007f08 being the faulted address the last one looks
>> plausible:
>>
>>
>> 81007efe:   48 8b 15 03 d2 c0 00mov   
>> 0xc0d203(%rip),%rdx# 0x81c15108
>> 81007f05:   90  nop
>> 81007f06:   90  nop
>> 81007f07:   90  nop
>> 81007f08:   4c 8b 2c da   mov(%rdx,%rbx,8),%r13
>>
>> since
>>
>> ostr@workbase> grep  81c15108
>> /tmp/System.map-4.4.8-9.pvops.qubes.x86_64
>> 81c15108 D machine_to_phys_mapping
>> ostr@workbase>
>>
>> But %rdx is not 81c15108, it is 8000:
>>
>> (XEN) rax:    rbx: 000d7bdc   rcx: 880002059000
>> (XEN) rdx: 8000   rsi: 8000d7bdc063   rdi: 8000d7bdc063
> But that's a MOV above, i.e. %rdx = [0x81c15108], which
> sensibly is MACH2PHYS_VIRT_START. 

 of course!

> And the MFN in %rbx
> would then match with the value in %cr2

Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-10 Thread Jan Beulich
>>> On 09.05.16 at 20:40,  wrote:
> On 05/09/2016 01:22 PM, Kevin Moraga wrote:
>>
>> On 05/09/2016 11:15 AM, Boris Ostrovsky wrote:
>>> On 05/09/2016 12:40 PM, Kevin Moraga wrote:
 On 05/09/2016 09:53 AM, Jan Beulich wrote:
 On 09.05.16 at 16:52,  wrote:
>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>> On 09.05.16 at 00:51,  wrote:
 I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
 and Intel Skylake processor (Intel Core i7-6600U)

 This kernel is crashing almost in the same way as explained in this
 thread... But my problem is mainly with Skylake. Because the same
 configuration works within another machine but with another processor
 (Intel Core i5-3340M). Attached are the boot logs.
>>> The address the fault occurs on (806bdee0) is bogus, so
>>> from the register and stack dump alone I don't think we can derive
>>> much. What we'd need is access to the kernel binary used (or
>>> really the vmlinux accompanying the vmlinuz that was used), in
>>> order to see where exactly the kernel died, and hence where this
>>> bogus address originates from. As I understand it this is a kernel
>>> you built yourself - can you make said binary from exactly that
>>> build available somewhere? 
>> Yes I have it. But I get the same crash on various 4.4.X and also with
>> 4.5.3.
>>
>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
> Well, this doesn't contain the file I'm after (vmlinux), and taking
> apart vmlinuz would be quite cumbersome.
>
> Jan
>
 Oh sorry, here is the link to vmlinux

 
> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing 
>>> This is still vmlinuz but the failure is at
>>>
>>> 81007ef3:   48 3b 1d 4e 2e ec 00cmp   
>>> 0xec2e4e(%rip),%rbx# 0x81ecad48
>>> 81007efa:   73 51   jae0x81007f4d
>>> 81007efc:   31 c0   xor%eax,%eax
>>> 81007efe:   48 8b 15 03 d2 c0 00mov   
>>> 0xc0d203(%rip),%rdx# 0x81c15108
>>> 81007f05:   90  nop
>>> 81007f06:   90  nop
>>> 81007f07:   90  nop
>>> 81007f08:   4c 8b 2c da mov   
>>> (%rdx,%rbx,8),%r13<==
>>> 81007f0c:   90  nop
>>> 81007f0d:   90  nop
>>> 81007f0e:   90  nop
>>> 81007f0f:   85 c0   test   %eax,%eax
>>> 81007f11:   78 3a   js 0x81007f4d
>>> 81007f13:   48 8b 05 ee 11 d2 00mov   
>>> 0xd211ee(%rip),%rax# 0x81d29108
>>> 81007f1a:   49 39 c5cmp%rax,%r13
>>> 81007f1d:   73 6f   jae0x81007f8e
>>> 81007f1f:   48 8b 05 ea 11 d2 00mov   
>>> 0xd211ea(%rip),%rax# 0x81d29110
>>> 81007f26:   4a 8b 04 e8 mov(%rax,%r13,8),%rax
>>>
>>> Any chance you could provide an un-stripped binary or System.map?
>> Here is the link for System.map
>>
>> 
> https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing 
>>
> 
> 
> So my semi-educated guess at your stack is
> __early_ioremap
>   -> __early_set_fixmap
> -> set_pte
>   -> xen_set_pte_init
> -> mask_rw_pte
>   -> pte_pfn
> -> pte_val
>-> xen_pte_val
>  -> pte_mfn_to_pfn
>-> mfn_to_pfn_no_overrides
>  -> ret =
> xen_safe_read_ulong(&machine_to_phys_mapping[mfn], &pfn)
> 
> 
> With 81007f08 being the faulted address the last one looks
> plausible:
> 
> 
> 81007efe:   48 8b 15 03 d2 c0 00mov   
> 0xc0d203(%rip),%rdx# 0x81c15108
> 81007f05:   90  nop
> 81007f06:   90  nop
> 81007f07:   90  nop
> 81007f08:   4c 8b 2c da   mov(%rdx,%rbx,8),%r13
> 
> since
> 
> ostr@workbase> grep  81c15108
> /tmp/System.map-4.4.8-9.pvops.qubes.x86_64
> 81c15108 D machine_to_phys_mapping
> ostr@workbase>
> 
> But %rdx is not 81c15108, it is 8000:
> 
> (XEN) rax:    rbx: 000d7bdc   rcx: 880002059000
> (XEN) rdx: 8000   rsi: 8000d7bdc063   rdi: 8000d7bdc063

But that's a MOV above, i.e. %rdx = [0x81c15108], which
sensibly is MACH2PHYS_VIRT_START. And the MFN in %rbx
would then match with the value in %cr2. Question is - where
does MFN 0xd7bdc come from (it's in a reserved range, and hence
can only be MMIO, which shouldn't be subject to M2P translation),
and why is this a

Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-09 Thread Boris Ostrovsky
On 05/09/2016 01:22 PM, Kevin Moraga wrote:
>
> On 05/09/2016 11:15 AM, Boris Ostrovsky wrote:
>> On 05/09/2016 12:40 PM, Kevin Moraga wrote:
>>> On 05/09/2016 09:53 AM, Jan Beulich wrote:
>>> On 09.05.16 at 16:52,  wrote:
> On 05/09/2016 04:08 AM, Jan Beulich wrote:
> On 09.05.16 at 00:51,  wrote:
>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>>> and Intel Skylake processor (Intel Core i7-6600U)
>>>
>>> This kernel is crashing almost in the same way as explained in this
>>> thread... But my problem is mainly with Skylake. Because the same
>>> configuration works within another machine but with another processor
>>> (Intel Core i5-3340M). Attached are the boot logs.
>> The address the fault occurs on (806bdee0) is bogus, so
>> from the register and stack dump alone I don't think we can derive
>> much. What we'd need is access to the kernel binary used (or
>> really the vmlinux accompanying the vmlinuz that was used), in
>> order to see where exactly the kernel died, and hence where this
>> bogus address originates from. As I understand it this is a kernel
>> you built yourself - can you make said binary from exactly that
>> build available somewhere? 
> Yes I have it. But I get the same crash on various 4.4.X and also with
> 4.5.3.
>
> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
 Well, this doesn't contain the file I'm after (vmlinux), and taking
 apart vmlinuz would be quite cumbersome.

 Jan

>>> Oh sorry, here is the link to vmlinux
>>>
>>> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing
>> This is still vmlinuz but the failure is at
>>
>> 81007ef3:   48 3b 1d 4e 2e ec 00cmp   
>> 0xec2e4e(%rip),%rbx# 0x81ecad48
>> 81007efa:   73 51   jae0x81007f4d
>> 81007efc:   31 c0   xor%eax,%eax
>> 81007efe:   48 8b 15 03 d2 c0 00mov   
>> 0xc0d203(%rip),%rdx# 0x81c15108
>> 81007f05:   90  nop
>> 81007f06:   90  nop
>> 81007f07:   90  nop
>> 81007f08:   4c 8b 2c da mov   
>> (%rdx,%rbx,8),%r13<==
>> 81007f0c:   90  nop
>> 81007f0d:   90  nop
>> 81007f0e:   90  nop
>> 81007f0f:   85 c0   test   %eax,%eax
>> 81007f11:   78 3a   js 0x81007f4d
>> 81007f13:   48 8b 05 ee 11 d2 00mov   
>> 0xd211ee(%rip),%rax# 0x81d29108
>> 81007f1a:   49 39 c5cmp%rax,%r13
>> 81007f1d:   73 6f   jae0x81007f8e
>> 81007f1f:   48 8b 05 ea 11 d2 00mov   
>> 0xd211ea(%rip),%rax# 0x81d29110
>> 81007f26:   4a 8b 04 e8 mov(%rax,%r13,8),%rax
>>
>> Any chance you could provide an un-stripped binary or System.map?
> Here is the link for System.map
>
> https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing
>


So my semi-educated guess at your stack is
__early_ioremap
  -> __early_set_fixmap
-> set_pte
  -> xen_set_pte_init
-> mask_rw_pte
  -> pte_pfn
-> pte_val
   -> xen_pte_val
 -> pte_mfn_to_pfn
   -> mfn_to_pfn_no_overrides
 -> ret =
xen_safe_read_ulong(&machine_to_phys_mapping[mfn], &pfn)


With 81007f08 being the faulted address the last one looks
plausible:


81007efe:   48 8b 15 03 d2 c0 00mov   
0xc0d203(%rip),%rdx# 0x81c15108
81007f05:   90  nop
81007f06:   90  nop
81007f07:   90  nop
81007f08:   4c 8b 2c da   mov(%rdx,%rbx,8),%r13

since

ostr@workbase> grep  81c15108
/tmp/System.map-4.4.8-9.pvops.qubes.x86_64
81c15108 D machine_to_phys_mapping
ostr@workbase>

But %rdx is not 81c15108, it is 8000:

(XEN) rax:    rbx: 000d7bdc   rcx: 880002059000
(XEN) rdx: 8000   rsi: 8000d7bdc063   rdi: 8000d7bdc063

Perhaps we jumped to 81007f08 from somewhere, but I can't 
81007f0* as a target anywhere.


-boris
  


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-09 Thread Kevin Moraga


On 05/09/2016 11:15 AM, Boris Ostrovsky wrote:
> On 05/09/2016 12:40 PM, Kevin Moraga wrote:
>> On 05/09/2016 09:53 AM, Jan Beulich wrote:
>> On 09.05.16 at 16:52,  wrote:
 On 05/09/2016 04:08 AM, Jan Beulich wrote:
 On 09.05.16 at 00:51,  wrote:
>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>> and Intel Skylake processor (Intel Core i7-6600U)
>>
>> This kernel is crashing almost in the same way as explained in this
>> thread... But my problem is mainly with Skylake. Because the same
>> configuration works within another machine but with another processor
>> (Intel Core i5-3340M). Attached are the boot logs.
> The address the fault occurs on (806bdee0) is bogus, so
> from the register and stack dump alone I don't think we can derive
> much. What we'd need is access to the kernel binary used (or
> really the vmlinux accompanying the vmlinuz that was used), in
> order to see where exactly the kernel died, and hence where this
> bogus address originates from. As I understand it this is a kernel
> you built yourself - can you make said binary from exactly that
> build available somewhere? 
 Yes I have it. But I get the same crash on various 4.4.X and also with
 4.5.3.

 **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
>>> Well, this doesn't contain the file I'm after (vmlinux), and taking
>>> apart vmlinuz would be quite cumbersome.
>>>
>>> Jan
>>>
>> Oh sorry, here is the link to vmlinux
>>
>> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing
>
> This is still vmlinuz but the failure is at
>
> 81007ef3:   48 3b 1d 4e 2e ec 00cmp   
> 0xec2e4e(%rip),%rbx# 0x81ecad48
> 81007efa:   73 51   jae0x81007f4d
> 81007efc:   31 c0   xor%eax,%eax
> 81007efe:   48 8b 15 03 d2 c0 00mov   
> 0xc0d203(%rip),%rdx# 0x81c15108
> 81007f05:   90  nop
> 81007f06:   90  nop
> 81007f07:   90  nop
> 81007f08:   4c 8b 2c da mov   
> (%rdx,%rbx,8),%r13<==
> 81007f0c:   90  nop
> 81007f0d:   90  nop
> 81007f0e:   90  nop
> 81007f0f:   85 c0   test   %eax,%eax
> 81007f11:   78 3a   js 0x81007f4d
> 81007f13:   48 8b 05 ee 11 d2 00mov   
> 0xd211ee(%rip),%rax# 0x81d29108
> 81007f1a:   49 39 c5cmp%rax,%r13
> 81007f1d:   73 6f   jae0x81007f8e
> 81007f1f:   48 8b 05 ea 11 d2 00mov   
> 0xd211ea(%rip),%rax# 0x81d29110
> 81007f26:   4a 8b 04 e8 mov(%rax,%r13,8),%rax
>
> Any chance you could provide an un-stripped binary or System.map?
Here is the link for System.map

https://drive.google.com/file/d/0B6Ol0ob95UxXYVE4SzdMcENsWWs/view?usp=sharing

-- 
Sincerely,
Kevin Moraga
PGP: F258EDCB
Fingerprint: 3915 A5A9 959C D18F 0A89 B47E FB4B 55F5 F258 EDCB




signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-09 Thread Boris Ostrovsky
On 05/09/2016 12:40 PM, Kevin Moraga wrote:
>
> On 05/09/2016 09:53 AM, Jan Beulich wrote:
> On 09.05.16 at 16:52,  wrote:
>>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>>> On 09.05.16 at 00:51,  wrote:
> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
> and Intel Skylake processor (Intel Core i7-6600U)
>
> This kernel is crashing almost in the same way as explained in this
> thread... But my problem is mainly with Skylake. Because the same
> configuration works within another machine but with another processor
> (Intel Core i5-3340M). Attached are the boot logs.
 The address the fault occurs on (806bdee0) is bogus, so
 from the register and stack dump alone I don't think we can derive
 much. What we'd need is access to the kernel binary used (or
 really the vmlinux accompanying the vmlinuz that was used), in
 order to see where exactly the kernel died, and hence where this
 bogus address originates from. As I understand it this is a kernel
 you built yourself - can you make said binary from exactly that
 build available somewhere? 
>>> Yes I have it. But I get the same crash on various 4.4.X and also with
>>> 4.5.3.
>>>
>>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
>> Well, this doesn't contain the file I'm after (vmlinux), and taking
>> apart vmlinuz would be quite cumbersome.
>>
>> Jan
>>
> Oh sorry, here is the link to vmlinux
>
> https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing


This is still vmlinuz but the failure is at

81007ef3:   48 3b 1d 4e 2e ec 00cmp   
0xec2e4e(%rip),%rbx# 0x81ecad48
81007efa:   73 51   jae0x81007f4d
81007efc:   31 c0   xor%eax,%eax
81007efe:   48 8b 15 03 d2 c0 00mov   
0xc0d203(%rip),%rdx# 0x81c15108
81007f05:   90  nop
81007f06:   90  nop
81007f07:   90  nop
81007f08:   4c 8b 2c da mov   
(%rdx,%rbx,8),%r13<==
81007f0c:   90  nop
81007f0d:   90  nop
81007f0e:   90  nop
81007f0f:   85 c0   test   %eax,%eax
81007f11:   78 3a   js 0x81007f4d
81007f13:   48 8b 05 ee 11 d2 00mov   
0xd211ee(%rip),%rax# 0x81d29108
81007f1a:   49 39 c5cmp%rax,%r13
81007f1d:   73 6f   jae0x81007f8e
81007f1f:   48 8b 05 ea 11 d2 00mov   
0xd211ea(%rip),%rax# 0x81d29110
81007f26:   4a 8b 04 e8 mov(%rax,%r13,8),%rax

Any chance you could provide an un-stripped binary or System.map?

-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-09 Thread Kevin Moraga


On 05/09/2016 09:53 AM, Jan Beulich wrote:
 On 09.05.16 at 16:52,  wrote:
>> On 05/09/2016 04:08 AM, Jan Beulich wrote:
>> On 09.05.16 at 00:51,  wrote:
 I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
 and Intel Skylake processor (Intel Core i7-6600U)

 This kernel is crashing almost in the same way as explained in this
 thread... But my problem is mainly with Skylake. Because the same
 configuration works within another machine but with another processor
 (Intel Core i5-3340M). Attached are the boot logs.
>>> The address the fault occurs on (806bdee0) is bogus, so
>>> from the register and stack dump alone I don't think we can derive
>>> much. What we'd need is access to the kernel binary used (or
>>> really the vmlinux accompanying the vmlinuz that was used), in
>>> order to see where exactly the kernel died, and hence where this
>>> bogus address originates from. As I understand it this is a kernel
>>> you built yourself - can you make said binary from exactly that
>>> build available somewhere? 
>> Yes I have it. But I get the same crash on various 4.4.X and also with
>> 4.5.3.
>>
>> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 
> Well, this doesn't contain the file I'm after (vmlinux), and taking
> apart vmlinuz would be quite cumbersome.
>
> Jan
>

Oh sorry, here is the link to vmlinux

https://drive.google.com/file/d/0B6Ol0ob95UxXN0dDMWM1a29vMEk/view?usp=sharing

-- 
Sincerely,
Kevin Moraga
PGP: F258EDCB
Fingerprint: 3915 A5A9 959C D18F 0A89 B47E FB4B 55F5 F258 EDCB




signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-09 Thread Jan Beulich
>>> On 09.05.16 at 16:52,  wrote:
> On 05/09/2016 04:08 AM, Jan Beulich wrote:
> On 09.05.16 at 00:51,  wrote:
>>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>>> and Intel Skylake processor (Intel Core i7-6600U)
>>>
>>> This kernel is crashing almost in the same way as explained in this
>>> thread... But my problem is mainly with Skylake. Because the same
>>> configuration works within another machine but with another processor
>>> (Intel Core i5-3340M). Attached are the boot logs.
>> The address the fault occurs on (806bdee0) is bogus, so
>> from the register and stack dump alone I don't think we can derive
>> much. What we'd need is access to the kernel binary used (or
>> really the vmlinux accompanying the vmlinuz that was used), in
>> order to see where exactly the kernel died, and hence where this
>> bogus address originates from. As I understand it this is a kernel
>> you built yourself - can you make said binary from exactly that
>> build available somewhere? 
> Yes I have it. But I get the same crash on various 4.4.X and also with
> 4.5.3.
> 
> **https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E 

Well, this doesn't contain the file I'm after (vmlinux), and taking
apart vmlinuz would be quite cumbersome.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-09 Thread Kevin Moraga
On 05/09/2016 04:08 AM, Jan Beulich wrote:
 On 09.05.16 at 00:51,  wrote:
>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>> and Intel Skylake processor (Intel Core i7-6600U)
>>
>> This kernel is crashing almost in the same way as explained in this
>> thread... But my problem is mainly with Skylake. Because the same
>> configuration works within another machine but with another processor
>> (Intel Core i5-3340M). Attached are the boot logs.
> The address the fault occurs on (806bdee0) is bogus, so
> from the register and stack dump alone I don't think we can derive
> much. What we'd need is access to the kernel binary used (or
> really the vmlinux accompanying the vmlinuz that was used), in
> order to see where exactly the kernel died, and hence where this
> bogus address originates from. As I understand it this is a kernel
> you built yourself - can you make said binary from exactly that
> build available somewhere? 
Yes I have it. But I get the same crash on various 4.4.X and also with
4.5.3.

**https://drive.google.com/open?id=0B6Ol0ob95UxXQV9HM1BWMmhCZ0E

Also I compiled 4.2.28 / 4.1.X and it works fine with this processor,
using i915.preliminary_hw_support, but we are experiencing problems with
suspend/wakeup (but that's another story)

> Or if you don't have it anymore, obtain
> fresh logs for whichever binary you're going to make available?
>
> Jan

Also there are more reports about the same crash with this kernel
compiled by someone else: 
**http://yum.qubes-os.org/r3.1/unstable/dom0/fc20/rpm/kernel-4.4.8-9.pvops.qubes.x86_64.rpm


signature.asc
Description: OpenPGP digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-09 Thread Jan Beulich
>>> On 09.05.16 at 00:51,  wrote:
> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
> and Intel Skylake processor (Intel Core i7-6600U)
> 
> This kernel is crashing almost in the same way as explained in this
> thread... But my problem is mainly with Skylake. Because the same
> configuration works within another machine but with another processor
> (Intel Core i5-3340M). Attached are the boot logs.

The address the fault occurs on (806bdee0) is bogus, so
from the register and stack dump alone I don't think we can derive
much. What we'd need is access to the kernel binary used (or
really the vmlinux accompanying the vmlinuz that was used), in
order to see where exactly the kernel died, and hence where this
bogus address originates from. As I understand it this is a kernel
you built yourself - can you make said binary from exactly that
build available somewhere? Or if you don't have it anymore, obtain
fresh logs for whichever binary you're going to make available?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-09 Thread Jan Beulich
>>> On 09.05.16 at 09:23,  wrote:
> On 08/05/2016 23:51, Kevin Moraga wrote:
>> Hi,
>> I don't know if this is the exact same issue... but is the most related
>> one that I found.
>>
>> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
>> and Intel Skylake processor (Intel Core i7-6600U)
>>
>> This kernel is crashing almost in the same way as explained in this
>> thread... But my problem is mainly with Skylake. Because the same
>> configuration works within another machine but with another processor
>> (Intel Core i5-3340M). Attached are the boot logs.
>>
>> A kernel configuration could be found in:
>>
>> https://github.com/marmarek/qubes-linux-kernel devel-4.4 branch
>>
>>
>> I don't know if anybody else is having this issue.
> 
> Can you try booting Xen with "xsave=0" on the command line.

That's what he has done for the second of the logs attached,
with no change to the crash.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-09 Thread Andrew Cooper
On 08/05/2016 23:51, Kevin Moraga wrote:
> Hi,
> I don't know if this is the exact same issue... but is the most related
> one that I found.
>
> I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
> and Intel Skylake processor (Intel Core i7-6600U)
>
> This kernel is crashing almost in the same way as explained in this
> thread... But my problem is mainly with Skylake. Because the same
> configuration works within another machine but with another processor
> (Intel Core i5-3340M). Attached are the boot logs.
>
> A kernel configuration could be found in:
>
> https://github.com/marmarek/qubes-linux-kernel devel-4.4 branch
>
>
> I don't know if anybody else is having this issue.

Can you try booting Xen with "xsave=0" on the command line.

I notice dom0 found:

[0.00] x86/fpu: Supporting XSAVE feature 0x08: 'MPX bounds
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x10: 'MPX CSR'

And there sadly usually bugs like this when PV guest kernels start using
new cpu features.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-05-08 Thread Kevin Moraga
Hi,
I don't know if this is the exact same issue... but is the most related
one that I found.

I'm try to compile kernel 4.4.8 (using fedora 23) to run with Xen 4.6.0
and Intel Skylake processor (Intel Core i7-6600U)

This kernel is crashing almost in the same way as explained in this
thread... But my problem is mainly with Skylake. Because the same
configuration works within another machine but with another processor
(Intel Core i5-3340M). Attached are the boot logs.

A kernel configuration could be found in:

https://github.com/marmarek/qubes-linux-kernel devel-4.4 branch


I don't know if anybody else is having this issue.

Thanks,
Kevin Moraga
 Xen 4.6.0-13.fc20
(XEN) Xen version 4.6.0 (user@) (gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)) 
debug=n Thu Feb 11 03:34:22 UTC 2016
(XEN) Latest ChangeSet: 
(XEN) Console output is synchronous.
(XEN) Bootloader: GRUB 2.00
(XEN) Command line: placeholder noreboot=true sync_console 
com1=115200,8n1,0xe080,0 console=com1,vga dom0_mem=min:1024M dom0_mem=max:4096M
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)   - 0009d000 (usable)
(XEN)  0009d000 - 000a (reserved)
(XEN)  000e - 0010 (reserved)
(XEN)  0010 - b9ba4000 (usable)
(XEN)  b9ba4000 - cca77000 (reserved)
(XEN)  cca77000 - cca78000 (ACPI NVS)
(XEN)  cca78000 - d7f77000 (reserved)
(XEN)  d7f77000 - d7f78000 (ACPI NVS)
(XEN)  d7f78000 - d7f79000 (reserved)
(XEN)  d7f79000 - d7fc7000 (ACPI NVS)
(XEN)  d7fc7000 - d7fff000 (ACPI data)
(XEN)  d7fff000 - d810 (reserved)
(XEN)  d860 - dc80 (reserved)
(XEN)  f800 - fc00 (reserved)
(XEN)  fd00 - fe80 (reserved)
(XEN)  fec0 - fec01000 (reserved)
(XEN)  fed0 - fed01000 (reserved)
(XEN)  fed1 - fed1a000 (reserved)
(XEN)  fed84000 - fed85000 (reserved)
(XEN)  fee0 - fee01000 (reserved)
(XEN)  ff80 - 0001 (reserved)
(XEN)  0001 - 00082180 (usable)
(XEN) ACPI: RSDP 000F0120, 0024 (r2 LENOVO)
(XEN) ACPI: XSDT D7FD1188, 00CC (r1 LENOVO TP-R06  0 PTEC2)
(XEN) ACPI: FACP D7FF6000, 00F4 (r5 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: DSDT D7FDF000, 12692 (r2 LENOVO TP-R06   1070 INTL 20141107)
(XEN) ACPI: FACS D7FAB000, 0040
(XEN) ACPI: UEFI D7FC2000, 0042 (r1 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: SSDT D7FF8000, 4E2E (r2 LENOVO  SaSsdt  3000 INTL 20141107)
(XEN) ACPI: SSDT D7FF7000, 05C5 (r2 LENOVO PerfTune 1000 INTL 20141107)
(XEN) ACPI: ECDT D7FF5000, 0052 (r1 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: HPET D7FF4000, 0038 (r1 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: APIC D7FF3000, 00BC (r3 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: MCFG D7FF2000, 003C (r1 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: SSDT D7FDD000, 18D2 (r1 LENOVO SataAhci 1000 INTL 20141107)
(XEN) ACPI: SSDT D7FDC000, 0152 (r1 LENOVO Rmv_Batt 1000 INTL 20141107)
(XEN) ACPI: DBGP D7FDB000, 0034 (r1 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: DBG2 D7FDA000, 0054 (r0 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: BOOT D7FD9000, 0028 (r1 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: BATB D7FD8000, 0046 (r1 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: SSDT D7FD7000, 0E73 (r2 LENOVO  CpuSsdt 3000 INTL 20141107)
(XEN) ACPI: SSDT D7FD6000, 03D9 (r2 LENOVOCtdpB 1000 INTL 20141107)
(XEN) ACPI: MSDM D7FD5000, 0055 (r3 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: DMAR D7FD4000, 00A8 (r1 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: ASF! D7FD3000, 00A5 (r32 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: FPDT D7FD2000, 0044 (r1 LENOVO TP-R06   1070 PTEC2)
(XEN) ACPI: UEFI D7FA9000, 012A (r1 LENOVO TP-R06   1070 PTEC2)
(XEN) System RAM: 32179MB (32951556kB)
(XEN) Domain heap initialised
(XEN) ACPI: 32/64X FACS address mismatch in FADT - d7fab000/, 
using 32
(XEN) Processor #0 6:14 APIC version 21
(XEN) Processor #2 6:14 APIC version 21
(XEN) Processor #1 6:14 APIC version 21
(XEN) Processor #3 6:14 APIC version 21
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec0, GSI 0-119
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) Failed to enable Interrupt Remapping: Will not enable x2APIC.
(XEN) xstate_init: using cntxt_size: 0x440 and states: 0x1f
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2814.268 MHz processor.
(XEN) Initing memory sharing.
(XEN) Intel VT-d iommu 0 supported pa

Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-03-29 Thread Konrad Rzeszutek Wilk
On Mon, Mar 28, 2016 at 06:00:33PM +0100, Michael Young wrote:
> I get a crash on boot with my Fedora xen-4.6.1-3.fc24 packages. This seems
> to be related to how it is compiled because the same code compiled under
> Fedora 23 works. The boot logs are attached. The address mentioned in the
> crash has the code
>0x82d08023d3c3 :
> je 0x82d08023e90a 
> but I have compared it with the Fedora 23 version of create_bounce_frame and
> as far as I can see the code is the same, so I am a bit stuck on how to
> debug this further.

Same machine?

Oh, you are doing this as guest:
> 
>   Michael Young

>  Xen 4.6.1-3.fc24
> (XEN) Xen version 4.6.1 (mockbuild@[unknown]) (gcc (GCC) 6.0.0 20160305 (Red 
> Hat 6.0.0-0.15)) debug=n Tue Mar  8 00:10:50 UTC 2016
> (XEN) Latest ChangeSet: 
> (XEN) Bootloader: GRUB 2.02~beta3
> (XEN) Command line: placeholder loglvl=all guest_loglvl=all console=com1,vga
> (XEN) Video information:
> (XEN)  VGA is text mode 80x25, font 8x16
> (XEN) Disc information:
> (XEN)  Found 1 MBR signatures
> (XEN)  Found 1 EDD information structures
> (XEN) Xen-e820 RAM map:
> (XEN)   - 0009fc00 (usable)
> (XEN)  0009fc00 - 000a (reserved)
> (XEN)  000f - 0010 (reserved)
> (XEN)  0010 - 3ffe (usable)
> (XEN)  3ffe - 4000 (reserved)
> (XEN)  feffc000 - ff00 (reserved)
> (XEN)  fffc - 0001 (reserved)
> (XEN) System RAM: 1023MB (1048060kB)
> (XEN) ACPI: RSDP 000F6300, 0014 (r0 BOCHS )



Does this happen with normal machines?

> (XEN) ACPI: RSDT 3FFE16EE, 0034 (r1 BOCHS  BXPCRSDT1 BXPC1)
> (XEN) ACPI: FACP 3FFE0C14, 0074 (r1 BOCHS  BXPCFACP1 BXPC1)
> (XEN) ACPI: DSDT 3FFE0040, 0BD4 (r1 BOCHS  BXPCDSDT1 BXPC1)
> (XEN) ACPI: FACS 3FFE, 0040
> (XEN) ACPI: SSDT 3FFE0C88, 09B6 (r1 BOCHS  BXPCSSDT1 BXPC1)
> (XEN) ACPI: APIC 3FFE163E, 0078 (r1 BOCHS  BXPCAPIC1 BXPC1)
> (XEN) ACPI: HPET 3FFE16B6, 0038 (r1 BOCHS  BXPCHPET1 BXPC1)
> (XEN) No NUMA configuration found
> (XEN) Faking a node at -3ffe
> (XEN) Domain heap initialised
> (XEN) found SMP MP-table at 000f64e0
> (XEN) DMI 2.8 present.
> (XEN) Using APIC driver default
> (XEN) ACPI: PM-Timer IO Port: 0x608
> (XEN) ACPI: SLEEP INFO: pm1x_cnt[1:604,1:0], pm1x_evt[1:600,1:0]
> (XEN) ACPI: wakeup_vec[3ffe000c], vec_size[20]
> (XEN) ACPI: Local APIC address 0xfee0
> (XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
> (XEN) Processor #0 6:6 APIC version 20
> (XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
> (XEN) ACPI: IOAPIC (id[0x00] address[0xfec0] gsi_base[0])
> (XEN) IOAPIC[0]: apic_id 0, version 17, address 0xfec0, GSI 0-23
> (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> (XEN) ACPI: IRQ0 used by override.
> (XEN) ACPI: IRQ2 used by override.
> (XEN) ACPI: IRQ5 used by override.
> (XEN) ACPI: IRQ9 used by override.
> (XEN) ACPI: IRQ10 used by override.
> (XEN) ACPI: IRQ11 used by override.
> (XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
> (XEN) ACPI: HPET id: 0x8086a201 base: 0xfed0
> (XEN) ERST table was not found
> (XEN) Using ACPI (MADT) for SMP configuration information
> (XEN) SMP: Allowing 1 CPUs (0 hotplug CPUs)
> (XEN) IRQ limits: 24 GSI, 184 MSI/MSI-X
> (XEN) Not enabling x2APIC: depends on iommu_supports_eim.
> (XEN) XSM Framework v1.0.0 initialized
> (XEN) Flask:  Access controls disabled until policy is loaded.
> (XEN) Intel machine check reporting enabled
> (XEN) Using scheduler: SMP Credit Scheduler (credit)
> (XEN) Detected 2394.587 MHz processor.
> (XEN) Initing memory sharing.
> (XEN) alt table 82d0802d4730 -> 82d0802d5960
> (XEN) I/O virtualisation disabled
> (XEN) nr_sockets: 1
> (XEN) Enabled directed EOI with ioapic_ack_old on!
> (XEN) ENABLING IO-APIC IRQs
> (XEN)  -> Using old ACK method
> (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
> (XEN) Platform timer is 100.000MHz HPET
> (XEN) Allocated console ring of 16 KiB.
> (XEN) mwait-idle: does not run on family 6 model 6
> (XEN) Brought up 1 CPUs
> (XEN) HPET: 0 timers usable for broadcast (3 total)
> (XEN) ACPI sleep modes: S3
> (XEN) VPMU: disabled
> (XEN) mcheck_poll: Machine check polling timer started.
> (XEN) xenoprof: Initialization failed. Intel processor family 6 model 6is not 
> supported
> (XEN) Dom0 has maximum 208 PIRQs
> (XEN) NX (Execute Disable) protection active
> (XEN) *** LOADING DOMAIN 0 ***
> (XEN)  Xen  kernel: 64-bit, lsb, compat32
> (XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x100 -> 0x2084000
> (XEN)

Re: [Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-03-29 Thread Jan Beulich
>>> On 28.03.16 at 19:00,  wrote:
> I get a crash on boot with my Fedora xen-4.6.1-3.fc24 packages. This seems 
> to be related to how it is compiled because the same code compiled under 
> Fedora 23 works. The boot logs are attached. The address mentioned in the 
> crash has the code
> 0x82d08023d3c3 :
>  je 0x82d08023e90a 
> but I have compared it with the Fedora 23 version of create_bounce_frame 
> and as far as I can see the code is the same, so I am a bit stuck on how 
> to debug this further.

Well, it doesn't look like your problem is with create_bounce_frame(),
but instead this

(XEN) d0v0: unhandled page fault (ec=)
(XEN) Pagetable walk from 81d6b665:
(XEN)  L4[0x1ff] = 3a088067 2088
(XEN)  L3[0x1fe] = 3a087067 2087
(XEN)  L2[0x00e] = 3a096067 2096 
(XEN)  L1[0x16b] = 001039d6b067 1d6b

is pointing at an issue with paging of Dom0. The walk shown doesn't,
to me, indicate any reason why a page fault would have got raised
in the first place (not even a missing TLB flush could account for
that, since any fault condition would result in a hardware re-walk).
Some of the data in the registers and on the stack suggest there
are page table manipulations going on in Dom0 around the time of
the crash, so you may want to check where exactly Dom0 was when
that crash occurred.

And then the question of course is: If the crash occurs reliably
with the F24 built binary (but not the F23 one), perhaps you need
to go and compare more than just the one function?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] crash on boot with 4.6.1 on fedora 24

2016-03-28 Thread Michael Young
I get a crash on boot with my Fedora xen-4.6.1-3.fc24 packages. This seems 
to be related to how it is compiled because the same code compiled under 
Fedora 23 works. The boot logs are attached. The address mentioned in the 
crash has the code

   0x82d08023d3c3 :
je 0x82d08023e90a 
but I have compared it with the Fedora 23 version of create_bounce_frame 
and as far as I can see the code is the same, so I am a bit stuck on how 
to debug this further.


Michael Young Xen 4.6.1-3.fc24
(XEN) Xen version 4.6.1 (mockbuild@[unknown]) (gcc (GCC) 6.0.0 20160305 (Red 
Hat 6.0.0-0.15)) debug=n Tue Mar  8 00:10:50 UTC 2016
(XEN) Latest ChangeSet: 
(XEN) Bootloader: GRUB 2.02~beta3
(XEN) Command line: placeholder loglvl=all guest_loglvl=all console=com1,vga
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)   - 0009fc00 (usable)
(XEN)  0009fc00 - 000a (reserved)
(XEN)  000f - 0010 (reserved)
(XEN)  0010 - 3ffe (usable)
(XEN)  3ffe - 4000 (reserved)
(XEN)  feffc000 - ff00 (reserved)
(XEN)  fffc - 0001 (reserved)
(XEN) System RAM: 1023MB (1048060kB)
(XEN) ACPI: RSDP 000F6300, 0014 (r0 BOCHS )
(XEN) ACPI: RSDT 3FFE16EE, 0034 (r1 BOCHS  BXPCRSDT1 BXPC1)
(XEN) ACPI: FACP 3FFE0C14, 0074 (r1 BOCHS  BXPCFACP1 BXPC1)
(XEN) ACPI: DSDT 3FFE0040, 0BD4 (r1 BOCHS  BXPCDSDT1 BXPC1)
(XEN) ACPI: FACS 3FFE, 0040
(XEN) ACPI: SSDT 3FFE0C88, 09B6 (r1 BOCHS  BXPCSSDT1 BXPC1)
(XEN) ACPI: APIC 3FFE163E, 0078 (r1 BOCHS  BXPCAPIC1 BXPC1)
(XEN) ACPI: HPET 3FFE16B6, 0038 (r1 BOCHS  BXPCHPET1 BXPC1)
(XEN) No NUMA configuration found
(XEN) Faking a node at -3ffe
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 000f64e0
(XEN) DMI 2.8 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x608
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:604,1:0], pm1x_evt[1:600,1:0]
(XEN) ACPI: wakeup_vec[3ffe000c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee0
(XEN) ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
(XEN) Processor #0 6:6 APIC version 20
(XEN) ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
(XEN) ACPI: IOAPIC (id[0x00] address[0xfec0] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 0, version 17, address 0xfec0, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ5 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) ACPI: IRQ10 used by override.
(XEN) ACPI: IRQ11 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a201 base: 0xfed0
(XEN) ERST table was not found
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) SMP: Allowing 1 CPUs (0 hotplug CPUs)
(XEN) IRQ limits: 24 GSI, 184 MSI/MSI-X
(XEN) Not enabling x2APIC: depends on iommu_supports_eim.
(XEN) XSM Framework v1.0.0 initialized
(XEN) Flask:  Access controls disabled until policy is loaded.
(XEN) Intel machine check reporting enabled
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2394.587 MHz processor.
(XEN) Initing memory sharing.
(XEN) alt table 82d0802d4730 -> 82d0802d5960
(XEN) I/O virtualisation disabled
(XEN) nr_sockets: 1
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) Platform timer is 100.000MHz HPET
(XEN) Allocated console ring of 16 KiB.
(XEN) mwait-idle: does not run on family 6 model 6
(XEN) Brought up 1 CPUs
(XEN) HPET: 0 timers usable for broadcast (3 total)
(XEN) ACPI sleep modes: S3
(XEN) VPMU: disabled
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) xenoprof: Initialization failed. Intel processor family 6 model 6is not 
supported
(XEN) Dom0 has maximum 208 PIRQs
(XEN) NX (Execute Disable) protection active
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x100 -> 0x2084000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   3800->3c00 (225611 pages to be 
allocated)
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: 8100->82084000
(XEN)  Init. ramdisk: ->
(XEN)  Phys-Mach map: 0080->0080001d8a58
(XEN)  Start info: