On 02.09.25 14:22, Juergen Gross wrote:
On 02.09.25 12:56, Manuel Bouyer wrote:
On Tue, Sep 02, 2025 at 11:44:36AM +0100, Andrew Cooper wrote:
On 02/09/2025 11:17 am, Manuel Bouyer wrote:
Hello,
I'm trying to boot a NetBSD PVH dom0 on Xen 4.20.
The same NetBSD kernel works fine with Xen 4.18

The boot options are:
menu=Boot netbsd-current PVH Xen420:dev hd0f:;load /netbsd-PVH console=com0 root=wd0f; multiboot /xen420-debug.gz dom0_mem=1024M console=com1 com1=38400,8n1 loglvl=all guest_loglvl=all gnttab_max_nr_frames=64 sync_console=1 dom0=pvh

and the full log from serial console is attached.

With 4.20 the boot fails with:

(XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
(XEN) Freed 664kB init memory
(XEN) d0v0 Triple fault - invoking HVM shutdown action 1
(XEN) *** Dumping Dom0 vcpu#0 state: ***
(XEN) ----[ Xen-4.20.2-pre_20250821nb0  x86_64  debug=y  Tainted:   C    ]----
(XEN) CPU:    7
(XEN) RIP:    0008:[<000000000020e268>]
(XEN) RFLAGS: 0000000000010006   CONTEXT: hvm guest (d0v0)
(XEN) rax: 000000002024c003   rbx: 000000000020e260   rcx: 00000000000dfeb7
(XEN) rdx: 0000000000100000   rsi: 0000000000103000   rdi: 000000000013e000
(XEN) rbp: 0000000080000000   rsp: 00000000014002e4   r8:  0000000000000000
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: 0000000000000000
(XEN) r15: 0000000000000000   cr0: 0000000000000011   cr4: 0000000000000000
(XEN) cr3: 0000000000000000   cr2: 0000000000000000
(XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
(XEN) ds: 0010   es: 0010   fs: 0000   gs: 0000   ss: 0010   cs: 0008

because of the triple fault the RIP above doens't point to the code.

I tracked it down to this code:
         cmpl    $0,%ecx                 ;       /* zero-sized? */       \
         je      2f                      ; \
         pushl   %ebp                    ; \
         movl    RELOC(nox_flag),%ebp    ; \
1:      movl    %ebp,(PDE_SIZE-4)(%ebx) ;       /* upper 32 bits: NX */ \
         movl    %eax,(%ebx)             ;       /* store phys addr */   \
         addl    $PDE_SIZE,%ebx          ;       /* next PTE/PDE */      \
         addl    $PAGE_SIZE,%eax         ;       /* next phys page */    \
         loop    1b                      ; \
         popl    %ebp                    ; \
2:                                      ;

there are others pushl/popl before so I don't think that's the problem
(in fact the exact same fragment is called just before with different
inputs and it doesn't fault). So the culprit it probably the write to (%ebx),
which would be 0x20e260
This is in the range:
(XEN)  [0000000000100000, 0000000040068e77] (usable)
so I can't see why this would be a problem.

Any idea, including how to debug this further, welcome

Even though triple fault's are aborts, they're generally accurate under
virt, so 0x20e268 is most likely where things die.

but that's the RIP of the last fault, not the first one, right ?
0x20e268 isn't in the text segment of the kernel, my guess is that the
first fault triggers an exception, but the exeption handler isn't set up yet
so we end up jumping to some random value.


What puzzles me is that:

- %cr2 is 0, so probably the first fault wasn't a page fault
- RIP is %ebx + 8, so maybe the code was just clobbered by the loop?

Could it be the code has been moved to this location, or is about to
be moved away afterwards?

And indeed: from the full boot log I can see:

(XEN)     virt_base        = 0x0
(XEN)     elf_paddr_offset = 0x0
(XEN)     virt_offset      = 0x0
(XEN)     virt_kstart      = 0x200000
(XEN)     virt_kend        = 0x17bab90
(XEN)     virt_entry       = 0x20e4d0

So virt_kentry is very near to the RIP.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to