On 01.09.2021 18:13, Roger Pau Monné wrote:
> On Wed, Sep 01, 2021 at 04:19:40PM +0200, Jan Beulich wrote:
>> On 01.09.2021 15:56, Roger Pau Monné wrote:
>>> On Tue, Aug 31, 2021 at 10:53:59AM +0200, Jan Beulich wrote:
>>>> On 30.08.2021 15:01, Jan Beulich wrote:
>>>>> The code building PVH Dom0 made use of sequences of P2M changes
>>>>> which are disallowed as of XSA-378. First of all population of the
>>>>> first Mb of memory needs to be redone. Then, largely as a
>>>>> workaround, checking introduced by XSA-378 needs to be slightly
>>>>> relaxed.
>>>>>
>>>>> Note that with these adjustments I get Dom0 to start booting on my
>>>>> development system, but the Dom0 kernel then gets stuck. Since it
>>>>> was the first time for me to try PVH Dom0 in this context (see
>>>>> below for why I was hesitant), I cannot tell yet whether this is
>>>>> due further fallout from the XSA, or some further unrelated
>>>>> problem.
>>>
>>> Iff you have some time could you check without the XSA applied? I have
>>> to admit I haven't been testing staging, so it's possible some
>>> breakage as slipped in (however osstest seemed fine with it).
>>
>> Well, I'd rather try to use the time to find the actual issue. From
>> osstest being fine I'm kind of inferring this might be machine
>> specific, or this might be due to yet some other of the overly many
>> patches I'm carrying. So if I can't infer anything from the stack
>> once I can actually dump that, I may indeed need to bisect my pile,
>> which would then also include the XSA-378 patches (as I didn't have
>> time to re-base so far).
>>
>>>>> Dom0's BSP is in VPF_blocked state while all APs are
>>>>> still in VPF_down. The 'd' debug key, unhelpfully, doesn't produce
>>>>> any output, so it's non-trivial to check whether (like PV likes to
>>>>> do) Dom0 has panic()ed without leaving any (visible) output.
>>>
>>> Not sure it would help much, but maybe you can post the Xen+Linux
>>> output?
>>
>> There's no Linux output yet by that point (and either
>> "earlyprintk=xen" doesn't work in PVH mode, or it's even too early
>> for that). All Xen has to say is
>>
>> (XEN) Dom0 callback via changed to Direct Vector 0xf3
>> (XEN) vmx.c:3265:d0v0 RDMSR 0x0000064e unimplemented
>> (XEN) vmx.c:3265:d0v0 RDMSR 0x00000034 unimplemented
> 
> Weird, I don't see why earlyprintk=xen shouldn't work in PVH mode,
> unless it's not properly wired up. Certainly needs checking and
> fixing, or else we won't be able to make much progress I think.

Right - I'm intending to check this, including whether at least
xen_raw_console_write() would work.

>>>> Correction: I did mean '0' here, producing merely
>>>>
>>>> (XEN) '0' pressed -> dumping Dom0's registers
>>>> (XEN) *** Dumping Dom0 vcpu#0 state: ***
>>>> (XEN) *** Dumping Dom0 vcpu#1 state: ***
>>>> (XEN) *** Dumping Dom0 vcpu#2 state: ***
>>>> (XEN) *** Dumping Dom0 vcpu#3 state: ***
>>>>
>>>> 'd' output supports the "system is idle" that was also visible from
>>>> 'q' output.
>>>
>>> Can you dump the state of the VMCS and see where the IP points to in
>>> Linux?
>>
>> Both that and the register dumping I have meanwhile working tell
>> me that it's the HLT in default_idle(). IOW Dom0 gives the impression
>> of also being idle, at the first glance. The stack pointer, however,
>> is farther away from the stack top than I would have expected, so it
>> may still have entered default_idle() for other reasons.
>>
>> The VMCS also told me that the last VM entry was to deliver an
>> interrupt at vector 0xf3 (i.e. the "callback" one).
> 
> That's all quite weird. Did dom0 setup the vCPU timer?

Ah - I had meant to check active timers, but then forgot. Otoh I
thought I could observe vCPU0 waking up from HLT, as RIP in the
registers dumped has been pointing either at it or right past it.
Now that I write this I'm wondering though whether that's an
artifact rather than reflection of something that's really
happening, in particular because of this

(XEN) RSP = 0xffffffff81c03eb8 (0xffffffff81c03eb8)  RIP = 0xffffffff814be422 
(0xffffffff814be423)

in the VMCS dump.

> What version of Linux are you using?

5.13.2; didn't get around to switching to 5.14 yet, but I also don't
expect this to make a difference.

> It seems to get stuck very early (or either fail to output anything
> while booting), which seems unlikely to be related to your specific
> hardware.

Well, it can't be extremely early - I see the ACPI IRQ getting set
up (from "iommu=debug" output mentioning GSI 9), and I see PCI
device BARs being played with (from debug messages I had added to
vPCI to monitor what P2M adjustments are being requested). As said
on another sub-thread, I get all the way through start_kernel()
and rest_init(), just that apparently some of the steps don't do
what they're supposed to do.

I'm meanwhile wondering whether I'm using a badly configured
kernel, i.e. whether there are any Kconfig settings which I ought
to enable, but which aren't "select"-ed nor have proper
"depends on". What I did is simply take my XEN_PV=y config,
replacing that by XEN_PVH=y. I did observe that this let XEN_DOM0
go off, but according to my checking (at the time) nothing this
crucial should have been affected by that.

Jan


Reply via email to