Re: [PATCH 0/4] x86/PVH: Dom0 building adjustments
On 01.09.2021 18:13, Roger Pau Monné wrote: > On Wed, Sep 01, 2021 at 04:19:40PM +0200, Jan Beulich wrote: >> On 01.09.2021 15:56, Roger Pau Monné wrote: >>> On Tue, Aug 31, 2021 at 10:53:59AM +0200, Jan Beulich wrote: On 30.08.2021 15:01, Jan Beulich wrote: > The code building PVH Dom0 made use of sequences of P2M changes > which are disallowed as of XSA-378. First of all population of the > first Mb of memory needs to be redone. Then, largely as a > workaround, checking introduced by XSA-378 needs to be slightly > relaxed. > > Note that with these adjustments I get Dom0 to start booting on my > development system, but the Dom0 kernel then gets stuck. Since it > was the first time for me to try PVH Dom0 in this context (see > below for why I was hesitant), I cannot tell yet whether this is > due further fallout from the XSA, or some further unrelated > problem. >>> >>> Iff you have some time could you check without the XSA applied? I have >>> to admit I haven't been testing staging, so it's possible some >>> breakage as slipped in (however osstest seemed fine with it). >> >> Well, I'd rather try to use the time to find the actual issue. From >> osstest being fine I'm kind of inferring this might be machine >> specific, or this might be due to yet some other of the overly many >> patches I'm carrying. So if I can't infer anything from the stack >> once I can actually dump that, I may indeed need to bisect my pile, >> which would then also include the XSA-378 patches (as I didn't have >> time to re-base so far). >> > Dom0's BSP is in VPF_blocked state while all APs are > still in VPF_down. The 'd' debug key, unhelpfully, doesn't produce > any output, so it's non-trivial to check whether (like PV likes to > do) Dom0 has panic()ed without leaving any (visible) output. >>> >>> Not sure it would help much, but maybe you can post the Xen+Linux >>> output? >> >> There's no Linux output yet by that point (and either >> "earlyprintk=xen" doesn't work in PVH mode, or it's even too early >> for that). All Xen has to say is >> >> (XEN) Dom0 callback via changed to Direct Vector 0xf3 >> (XEN) vmx.c:3265:d0v0 RDMSR 0x064e unimplemented >> (XEN) vmx.c:3265:d0v0 RDMSR 0x0034 unimplemented > > Weird, I don't see why earlyprintk=xen shouldn't work in PVH mode, > unless it's not properly wired up. Certainly needs checking and > fixing, or else we won't be able to make much progress I think. Right - I'm intending to check this, including whether at least xen_raw_console_write() would work. Correction: I did mean '0' here, producing merely (XEN) '0' pressed -> dumping Dom0's registers (XEN) *** Dumping Dom0 vcpu#0 state: *** (XEN) *** Dumping Dom0 vcpu#1 state: *** (XEN) *** Dumping Dom0 vcpu#2 state: *** (XEN) *** Dumping Dom0 vcpu#3 state: *** 'd' output supports the "system is idle" that was also visible from 'q' output. >>> >>> Can you dump the state of the VMCS and see where the IP points to in >>> Linux? >> >> Both that and the register dumping I have meanwhile working tell >> me that it's the HLT in default_idle(). IOW Dom0 gives the impression >> of also being idle, at the first glance. The stack pointer, however, >> is farther away from the stack top than I would have expected, so it >> may still have entered default_idle() for other reasons. >> >> The VMCS also told me that the last VM entry was to deliver an >> interrupt at vector 0xf3 (i.e. the "callback" one). > > That's all quite weird. Did dom0 setup the vCPU timer? Ah - I had meant to check active timers, but then forgot. Otoh I thought I could observe vCPU0 waking up from HLT, as RIP in the registers dumped has been pointing either at it or right past it. Now that I write this I'm wondering though whether that's an artifact rather than reflection of something that's really happening, in particular because of this (XEN) RSP = 0x81c03eb8 (0x81c03eb8) RIP = 0x814be422 (0x814be423) in the VMCS dump. > What version of Linux are you using? 5.13.2; didn't get around to switching to 5.14 yet, but I also don't expect this to make a difference. > It seems to get stuck very early (or either fail to output anything > while booting), which seems unlikely to be related to your specific > hardware. Well, it can't be extremely early - I see the ACPI IRQ getting set up (from "iommu=debug" output mentioning GSI 9), and I see PCI device BARs being played with (from debug messages I had added to vPCI to monitor what P2M adjustments are being requested). As said on another sub-thread, I get all the way through start_kernel() and rest_init(), just that apparently some of the steps don't do what they're supposed to do. I'm meanwhile wondering whether I'm using a badly configured kernel, i.e. whether there are any Kconfig settings which I ought to enable, but which aren't "select"-ed nor have proper "depends on".
Re: [PATCH 0/4] x86/PVH: Dom0 building adjustments
On Wed, Sep 01, 2021 at 04:19:40PM +0200, Jan Beulich wrote: > On 01.09.2021 15:56, Roger Pau Monné wrote: > > On Tue, Aug 31, 2021 at 10:53:59AM +0200, Jan Beulich wrote: > >> On 30.08.2021 15:01, Jan Beulich wrote: > >>> The code building PVH Dom0 made use of sequences of P2M changes > >>> which are disallowed as of XSA-378. First of all population of the > >>> first Mb of memory needs to be redone. Then, largely as a > >>> workaround, checking introduced by XSA-378 needs to be slightly > >>> relaxed. > >>> > >>> Note that with these adjustments I get Dom0 to start booting on my > >>> development system, but the Dom0 kernel then gets stuck. Since it > >>> was the first time for me to try PVH Dom0 in this context (see > >>> below for why I was hesitant), I cannot tell yet whether this is > >>> due further fallout from the XSA, or some further unrelated > >>> problem. > > > > Iff you have some time could you check without the XSA applied? I have > > to admit I haven't been testing staging, so it's possible some > > breakage as slipped in (however osstest seemed fine with it). > > Well, I'd rather try to use the time to find the actual issue. From > osstest being fine I'm kind of inferring this might be machine > specific, or this might be due to yet some other of the overly many > patches I'm carrying. So if I can't infer anything from the stack > once I can actually dump that, I may indeed need to bisect my pile, > which would then also include the XSA-378 patches (as I didn't have > time to re-base so far). > > >>> Dom0's BSP is in VPF_blocked state while all APs are > >>> still in VPF_down. The 'd' debug key, unhelpfully, doesn't produce > >>> any output, so it's non-trivial to check whether (like PV likes to > >>> do) Dom0 has panic()ed without leaving any (visible) output. > > > > Not sure it would help much, but maybe you can post the Xen+Linux > > output? > > There's no Linux output yet by that point (and either > "earlyprintk=xen" doesn't work in PVH mode, or it's even too early > for that). All Xen has to say is > > (XEN) Dom0 callback via changed to Direct Vector 0xf3 > (XEN) vmx.c:3265:d0v0 RDMSR 0x064e unimplemented > (XEN) vmx.c:3265:d0v0 RDMSR 0x0034 unimplemented Weird, I don't see why earlyprintk=xen shouldn't work in PVH mode, unless it's not properly wired up. Certainly needs checking and fixing, or else we won't be able to make much progress I think. > > Do you have iommu debug/verbose enabled to catch iommu faults? > > I'll try to remember to check that, but since Linux hasn't > brought up APs yet I don't think there's any device activity > just yet. > > >> Correction: I did mean '0' here, producing merely > >> > >> (XEN) '0' pressed -> dumping Dom0's registers > >> (XEN) *** Dumping Dom0 vcpu#0 state: *** > >> (XEN) *** Dumping Dom0 vcpu#1 state: *** > >> (XEN) *** Dumping Dom0 vcpu#2 state: *** > >> (XEN) *** Dumping Dom0 vcpu#3 state: *** > >> > >> 'd' output supports the "system is idle" that was also visible from > >> 'q' output. > > > > Can you dump the state of the VMCS and see where the IP points to in > > Linux? > > Both that and the register dumping I have meanwhile working tell > me that it's the HLT in default_idle(). IOW Dom0 gives the impression > of also being idle, at the first glance. The stack pointer, however, > is farther away from the stack top than I would have expected, so it > may still have entered default_idle() for other reasons. > > The VMCS also told me that the last VM entry was to deliver an > interrupt at vector 0xf3 (i.e. the "callback" one). That's all quite weird. Did dom0 setup the vCPU timer? What version of Linux are you using? It seems to get stuck very early (or either fail to output anything while booting), which seems unlikely to be related to your specific hardware. Thanks, Roger.
Re: [PATCH 0/4] x86/PVH: Dom0 building adjustments
On 01.09.2021 17:24, Juergen Gross wrote: > On 01.09.21 17:06, Jan Beulich wrote: >> On 30.08.2021 15:01, Jan Beulich wrote: >>> The code building PVH Dom0 made use of sequences of P2M changes >>> which are disallowed as of XSA-378. First of all population of the >>> first Mb of memory needs to be redone. Then, largely as a >>> workaround, checking introduced by XSA-378 needs to be slightly >>> relaxed. >>> >>> Note that with these adjustments I get Dom0 to start booting on my >>> development system, but the Dom0 kernel then gets stuck. Since it >>> was the first time for me to try PVH Dom0 in this context (see >>> below for why I was hesitant), I cannot tell yet whether this is >>> due further fallout from the XSA, or some further unrelated >>> problem. Dom0's BSP is in VPF_blocked state while all APs are >>> still in VPF_down. The '0' debug key, unhelpfully, doesn't produce >>> any output, so it's non-trivial to check whether (like PV likes to >>> do) Dom0 has panic()ed without leaving any (visible) output. >> >> Having made '0' work at least partly, I can now see that Dom0's >> vCPU0 enters its idle loop after having gone through all normal >> initialization. Clearly certain things must not have worked as >> intended (no APs booted, no drivers loaded afaict), but I'm >> having a hard time seeing how to find out what that might be >> when there's no output at all. PV Dom0 does not require any >> special command line option to do output to both the VGA console >> and through hvc_xen (making its output also go to the serial >> log) - is this perhaps different for PVH? I couldn't find >> anything under docs/ ... > > Did you add earlyprintk=xen to the dom0 boot parameters? Yes (I did mention this before) - no difference at all. I guess I'll try again, just in case I made a stupid mistake. Jan
Re: [PATCH 0/4] x86/PVH: Dom0 building adjustments
On 01.09.21 17:06, Jan Beulich wrote: On 30.08.2021 15:01, Jan Beulich wrote: The code building PVH Dom0 made use of sequences of P2M changes which are disallowed as of XSA-378. First of all population of the first Mb of memory needs to be redone. Then, largely as a workaround, checking introduced by XSA-378 needs to be slightly relaxed. Note that with these adjustments I get Dom0 to start booting on my development system, but the Dom0 kernel then gets stuck. Since it was the first time for me to try PVH Dom0 in this context (see below for why I was hesitant), I cannot tell yet whether this is due further fallout from the XSA, or some further unrelated problem. Dom0's BSP is in VPF_blocked state while all APs are still in VPF_down. The '0' debug key, unhelpfully, doesn't produce any output, so it's non-trivial to check whether (like PV likes to do) Dom0 has panic()ed without leaving any (visible) output. Having made '0' work at least partly, I can now see that Dom0's vCPU0 enters its idle loop after having gone through all normal initialization. Clearly certain things must not have worked as intended (no APs booted, no drivers loaded afaict), but I'm having a hard time seeing how to find out what that might be when there's no output at all. PV Dom0 does not require any special command line option to do output to both the VGA console and through hvc_xen (making its output also go to the serial log) - is this perhaps different for PVH? I couldn't find anything under docs/ ... Did you add earlyprintk=xen to the dom0 boot parameters? Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 0/4] x86/PVH: Dom0 building adjustments
On 30.08.2021 15:01, Jan Beulich wrote: > The code building PVH Dom0 made use of sequences of P2M changes > which are disallowed as of XSA-378. First of all population of the > first Mb of memory needs to be redone. Then, largely as a > workaround, checking introduced by XSA-378 needs to be slightly > relaxed. > > Note that with these adjustments I get Dom0 to start booting on my > development system, but the Dom0 kernel then gets stuck. Since it > was the first time for me to try PVH Dom0 in this context (see > below for why I was hesitant), I cannot tell yet whether this is > due further fallout from the XSA, or some further unrelated > problem. Dom0's BSP is in VPF_blocked state while all APs are > still in VPF_down. The '0' debug key, unhelpfully, doesn't produce > any output, so it's non-trivial to check whether (like PV likes to > do) Dom0 has panic()ed without leaving any (visible) output. Having made '0' work at least partly, I can now see that Dom0's vCPU0 enters its idle loop after having gone through all normal initialization. Clearly certain things must not have worked as intended (no APs booted, no drivers loaded afaict), but I'm having a hard time seeing how to find out what that might be when there's no output at all. PV Dom0 does not require any special command line option to do output to both the VGA console and through hvc_xen (making its output also go to the serial log) - is this perhaps different for PVH? I couldn't find anything under docs/ ... Jan
Re: [PATCH 0/4] x86/PVH: Dom0 building adjustments
On 01.09.2021 16:19, Jan Beulich wrote: > On 01.09.2021 15:56, Roger Pau Monné wrote: >> Do you have iommu debug/verbose enabled to catch iommu faults? > > I'll try to remember to check that, but since Linux hasn't > brought up APs yet I don't think there's any device activity > just yet. No IOMMU faults, as expected. Jan
Re: [PATCH 0/4] x86/PVH: Dom0 building adjustments
On 01.09.2021 15:56, Roger Pau Monné wrote: > On Tue, Aug 31, 2021 at 10:53:59AM +0200, Jan Beulich wrote: >> On 30.08.2021 15:01, Jan Beulich wrote: >>> The code building PVH Dom0 made use of sequences of P2M changes >>> which are disallowed as of XSA-378. First of all population of the >>> first Mb of memory needs to be redone. Then, largely as a >>> workaround, checking introduced by XSA-378 needs to be slightly >>> relaxed. >>> >>> Note that with these adjustments I get Dom0 to start booting on my >>> development system, but the Dom0 kernel then gets stuck. Since it >>> was the first time for me to try PVH Dom0 in this context (see >>> below for why I was hesitant), I cannot tell yet whether this is >>> due further fallout from the XSA, or some further unrelated >>> problem. > > Iff you have some time could you check without the XSA applied? I have > to admit I haven't been testing staging, so it's possible some > breakage as slipped in (however osstest seemed fine with it). Well, I'd rather try to use the time to find the actual issue. From osstest being fine I'm kind of inferring this might be machine specific, or this might be due to yet some other of the overly many patches I'm carrying. So if I can't infer anything from the stack once I can actually dump that, I may indeed need to bisect my pile, which would then also include the XSA-378 patches (as I didn't have time to re-base so far). >>> Dom0's BSP is in VPF_blocked state while all APs are >>> still in VPF_down. The 'd' debug key, unhelpfully, doesn't produce >>> any output, so it's non-trivial to check whether (like PV likes to >>> do) Dom0 has panic()ed without leaving any (visible) output. > > Not sure it would help much, but maybe you can post the Xen+Linux > output? There's no Linux output yet by that point (and either "earlyprintk=xen" doesn't work in PVH mode, or it's even too early for that). All Xen has to say is (XEN) Dom0 callback via changed to Direct Vector 0xf3 (XEN) vmx.c:3265:d0v0 RDMSR 0x064e unimplemented (XEN) vmx.c:3265:d0v0 RDMSR 0x0034 unimplemented > Do you have iommu debug/verbose enabled to catch iommu faults? I'll try to remember to check that, but since Linux hasn't brought up APs yet I don't think there's any device activity just yet. >> Correction: I did mean '0' here, producing merely >> >> (XEN) '0' pressed -> dumping Dom0's registers >> (XEN) *** Dumping Dom0 vcpu#0 state: *** >> (XEN) *** Dumping Dom0 vcpu#1 state: *** >> (XEN) *** Dumping Dom0 vcpu#2 state: *** >> (XEN) *** Dumping Dom0 vcpu#3 state: *** >> >> 'd' output supports the "system is idle" that was also visible from >> 'q' output. > > Can you dump the state of the VMCS and see where the IP points to in > Linux? Both that and the register dumping I have meanwhile working tell me that it's the HLT in default_idle(). IOW Dom0 gives the impression of also being idle, at the first glance. The stack pointer, however, is farther away from the stack top than I would have expected, so it may still have entered default_idle() for other reasons. The VMCS also told me that the last VM entry was to deliver an interrupt at vector 0xf3 (i.e. the "callback" one). Jan
Re: [PATCH 0/4] x86/PVH: Dom0 building adjustments
On Tue, Aug 31, 2021 at 10:53:59AM +0200, Jan Beulich wrote: > On 30.08.2021 15:01, Jan Beulich wrote: > > The code building PVH Dom0 made use of sequences of P2M changes > > which are disallowed as of XSA-378. First of all population of the > > first Mb of memory needs to be redone. Then, largely as a > > workaround, checking introduced by XSA-378 needs to be slightly > > relaxed. > > > > Note that with these adjustments I get Dom0 to start booting on my > > development system, but the Dom0 kernel then gets stuck. Since it > > was the first time for me to try PVH Dom0 in this context (see > > below for why I was hesitant), I cannot tell yet whether this is > > due further fallout from the XSA, or some further unrelated > > problem. Iff you have some time could you check without the XSA applied? I have to admit I haven't been testing staging, so it's possible some breakage as slipped in (however osstest seemed fine with it). > > Dom0's BSP is in VPF_blocked state while all APs are > > still in VPF_down. The 'd' debug key, unhelpfully, doesn't produce > > any output, so it's non-trivial to check whether (like PV likes to > > do) Dom0 has panic()ed without leaving any (visible) output. Not sure it would help much, but maybe you can post the Xen+Linux output? Do you have iommu debug/verbose enabled to catch iommu faults? > Correction: I did mean '0' here, producing merely > > (XEN) '0' pressed -> dumping Dom0's registers > (XEN) *** Dumping Dom0 vcpu#0 state: *** > (XEN) *** Dumping Dom0 vcpu#1 state: *** > (XEN) *** Dumping Dom0 vcpu#2 state: *** > (XEN) *** Dumping Dom0 vcpu#3 state: *** > > 'd' output supports the "system is idle" that was also visible from > 'q' output. Can you dump the state of the VMCS and see where the IP points to in Linux? Thanks, Roger.
Re: [PATCH 0/4] x86/PVH: Dom0 building adjustments
On 30.08.2021 15:01, Jan Beulich wrote: > The code building PVH Dom0 made use of sequences of P2M changes > which are disallowed as of XSA-378. First of all population of the > first Mb of memory needs to be redone. Then, largely as a > workaround, checking introduced by XSA-378 needs to be slightly > relaxed. > > Note that with these adjustments I get Dom0 to start booting on my > development system, but the Dom0 kernel then gets stuck. Since it > was the first time for me to try PVH Dom0 in this context (see > below for why I was hesitant), I cannot tell yet whether this is > due further fallout from the XSA, or some further unrelated > problem. Dom0's BSP is in VPF_blocked state while all APs are > still in VPF_down. The 'd' debug key, unhelpfully, doesn't produce > any output, so it's non-trivial to check whether (like PV likes to > do) Dom0 has panic()ed without leaving any (visible) output. Correction: I did mean '0' here, producing merely (XEN) '0' pressed -> dumping Dom0's registers (XEN) *** Dumping Dom0 vcpu#0 state: *** (XEN) *** Dumping Dom0 vcpu#1 state: *** (XEN) *** Dumping Dom0 vcpu#2 state: *** (XEN) *** Dumping Dom0 vcpu#3 state: *** 'd' output supports the "system is idle" that was also visible from 'q' output. Jan
[PATCH 0/4] x86/PVH: Dom0 building adjustments
The code building PVH Dom0 made use of sequences of P2M changes which are disallowed as of XSA-378. First of all population of the first Mb of memory needs to be redone. Then, largely as a workaround, checking introduced by XSA-378 needs to be slightly relaxed. Note that with these adjustments I get Dom0 to start booting on my development system, but the Dom0 kernel then gets stuck. Since it was the first time for me to try PVH Dom0 in this context (see below for why I was hesitant), I cannot tell yet whether this is due further fallout from the XSA, or some further unrelated problem. Dom0's BSP is in VPF_blocked state while all APs are still in VPF_down. The 'd' debug key, unhelpfully, doesn't produce any output, so it's non-trivial to check whether (like PV likes to do) Dom0 has panic()ed without leaving any (visible) output. [And there was another rather basic issue to fight first (patch will be submitted separately): vPCI wasn't aware of hidden PCI devices, hitting an ASSERT(). Obviously I couldn't afford not having a functioning serial console.] In the course I ran into an oom condition while populating Dom0's RAM. Hence next some re-work of dom0_compute_nr_pages(). In turn in the course of putting that together I did notice that PV Dom0, when run in shadow mode, wouldn't have its shadow allocation properly set. 1: PVH: de-duplicate mappings for first Mb of Dom0 memory 2: P2M: relax guarding of MMIO entries 3: PVH: improve Dom0 memory size calculation 4: PV: properly set shadow allocation for Dom0 Jan