Domain Birth Time
Hi, We've added a feature to Xen 4.15 such that `xl uptime -b` reports the birth time of the domain (i.e. a value preserved across migrations). If this would be of wider interest I can try porting this to a more recent release and submitting it for review. Regards, James
Re: XSA-446 relevance on Intel
On Tue, Dec 12, 2023 at 10:56:48AM +, Andrew Cooper wrote: > On 12/12/2023 9:43 am, James Dingwall wrote: > > Hi, > > > > We were experiencing a crash during PV domU boot on several different models > > of hardware but all with Intel CPUs. The Xen version was based on > > stable-4.15 > > at 4a4daf6bddbe8a741329df5cc8768f7dec664aed (XSA-444) with some local > > patches. Since updating the branch to > > b918c4cdc7ab2c1c9e9a9b54fa9d9c595913e028 > > (XSA-446) we have not observed the same crash. > > That range covers: > > 1f5f515da0f6 - iommu/amd-vi: use correct level for quarantine domain > page tables > b918c4cdc7ab - x86/spec-ctrl: Remove conditional IRQs-on-ness for INT > $0x80/0x82 paths > > so yeah - not much in the way of change. > > > The occurrence was on 1-2% of boots and we couldn't determine a particular > > sequence of events that would trigger it. The kernel is based on Ubuntu's > > 5.15.0-91 tag but we also observed the same with -85. Due to the low > > frequency it is possible that we simply haven't observed it again since > > updating our Xen build. > > > > If I have followed the early startup this is happening shortly after > > detection > > of possible CPU vulnerabilities and patching in alternative instructions. > > As > > the RIP was native_irq_return_iret and XSA-446 related to interupt > > management > > I wondered if it was possible that despite "Xen is not believed to be > > vulnerable > > in default configurations on CPUs from other hardware vendors." there could > > be some conditions in which an Intel CPU is affected? > > In short, XSA-446 isn't plausibly related. It's completely internal to > Xen, with no alteration on guest state. > > It is an error that Linux has ended up in native_irq_return_iret. Linux > cannot return to itself with an IRET instruction, and must use > HYPERCALL_iret instead. > > In recent versions of Linux, this is fixed up as about the earliest > action a PV kernel takes, but on older versions of Linux, any > interrupt/exception early enough on boot was fatal in this way. > > > This part of the backtrace is odd: > > [ 0.398962] ? native_iret+0x7/0x7 > [ 0.398967] ? insn_decode+0x79/0x100 > [ 0.398975] ? insn_decode+0xcf/0x100 > [ 0.398980] optimize_nops+0x68/0x150 > > as it's not clear how we've ended up in a case wanting to return back to > the kernel to begin with. However, it's most likely a pagefault, as > optimize_nops() is making changes in arbitrary locations. > > It is possible that a change in visible features has altered the > behaviour enough not to crash, but if everything is still the same as > far as you can tell, then it's likely just chance that you haven't seen > it again. > > This is definitely a Linux bug, so I suspect something bad has been > backported into Ubuntu. > > ~Andrew Thanks for the response. I had a look at the more recent kernels and managed to backport "x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel()" without too much trouble. It may still be a coincidence that we haven't encountered the problem but it seems to have gone away for now. Regards, James
XSA-446 relevance on Intel
Hi, We were experiencing a crash during PV domU boot on several different models of hardware but all with Intel CPUs. The Xen version was based on stable-4.15 at 4a4daf6bddbe8a741329df5cc8768f7dec664aed (XSA-444) with some local patches. Since updating the branch to b918c4cdc7ab2c1c9e9a9b54fa9d9c595913e028 (XSA-446) we have not observed the same crash. The occurrence was on 1-2% of boots and we couldn't determine a particular sequence of events that would trigger it. The kernel is based on Ubuntu's 5.15.0-91 tag but we also observed the same with -85. Due to the low frequency it is possible that we simply haven't observed it again since updating our Xen build. If I have followed the early startup this is happening shortly after detection of possible CPU vulnerabilities and patching in alternative instructions. As the RIP was native_irq_return_iret and XSA-446 related to interupt management I wondered if it was possible that despite "Xen is not believed to be vulnerable in default configurations on CPUs from other hardware vendors." there could be some conditions in which an Intel CPU is affected? Thanks, James [0.374957] GDS: Unknown: Dependent on hypervisor status [0.375007] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [0.375016] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [0.375022] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [0.375027] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask' [0.375033] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256' [0.375038] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256' [0.375047] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.375053] x86/fpu: xstate_offset[5]: 1088, xstate_sizes[5]: 64 [0.375059] x86/fpu: xstate_offset[6]: 1152, xstate_sizes[6]: 512 [0.375053] x86/fpu: xstate_offset[5]: 1088, xstate_sizes[5]: 64 [0.375059] x86/fpu: xstate_offset[6]: 1152, xstate_sizes[6]: 512 [0.375064] x86/fpu: xstate_offset[7]: 1664, xstate_sizes[7]: 1024 [0.375047] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [0.375053] x86/fpu: xstate_offset[5]: 1088, xstate_sizes[5]: 64 [0.375059] x86/fpu: xstate_offset[6]: 1152, xstate_sizes[6]: 512 [0.375064] x86/fpu: xstate_offset[7]: 1664, xstate_sizes[7]: 1024 [0.375070] x86/fpu: Enabled xstate features 0xe7, context size is 2688 bytes, using 'standard' format. [0.398765] segment-related general protection fault: e030 [#1] SMP NOPTI [0.398784] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-91-generic #101~20.04.1 [0.398792] RIP: e030:native_irq_return_iret+0x0/0x2 [0.398806] Code: 5b 41 5b 41 5a 41 59 41 58 58 59 5a 5e 5f 48 83 c4 08 eb 0f 0f 1f 00 90 66 66 2e 0f 1f 84 00 00 00 00 00 f6 44 24 20 04 75 02 <48> cf 57 0f 01 f8 eb 12 0f 20 df 90 90 90 90 90 48 81 e7 ff e7 ff [0.398818] RSP: e02b:82e03bd8 EFLAGS: 00010046 [0.398825] RAX: RBX: 82e03c30 RCX: [0.398831] RDX: 000f RSI: 81e011f4 RDI: 82e03ca0 [0.398836] RBP: 82e03c10 R08: 81e011ef R09: 0005 [0.398842] R10: 0006 R11: e8ae0feb75ccff49 R12: 81e011ef [0.398848] R13: 0006 R14: 81e011f4 R15: 0005 [0.398860] FS: () GS:88802dc0() knlGS: [0.398866] CS: 1e030 DS: ES: CR0: 80050033 [0.398872] CR2: CR3: 02e1 CR4: 00050660 [0.398880] Call Trace: [0.398883] [0.398887] ? show_trace_log_lvl+0x1d6/0x2ea [0.398896] ? show_trace_log_lvl+0x1d6/0x2ea [0.398902] ? optimize_nops+0x68/0x150 [0.398909] ? show_regs.part.0+0x23/0x29 [0.398914] ? __die_body.cold+0x8/0xd [0.398919] ? die_addr+0x3e/0x60 [0.398925] ? exc_general_protection+0x1c1/0x350 [0.398933] ? asm_exc_general_protection+0x27/0x30 [0.398939] ? restore_regs_and_return_to_kernel+0x20/0x2c [0.398945] ? restore_regs_and_return_to_kernel+0x1b/0x2c [0.398950] ? restore_regs_and_return_to_kernel+0x1b/0x2c [0.398956] ? restore_regs_and_return_to_kernel+0x20/0x2c [0.398962] ? native_iret+0x7/0x7 [0.398967] ? insn_decode+0x79/0x100 [0.398975] ? insn_decode+0xcf/0x100 [0.398980] optimize_nops+0x68/0x150 [0.398986] apply_alternatives+0x181/0x3a0 [0.398991] ? restore_regs_and_return_to_kernel+0x1b/0x2c [0.398996] ? fb_is_primary_device+0x25/0x73 [0.399003] ? restore_regs_and_return_to_kernel+0x1b/0x2c [0.399009] ? apply_alternatives+0x8/0x3a0 [0.399014] ? fb_is_primary_device+0x6e/0x73 [0.399019] ? apply_returns+0xfc/0x180 [0.399024] ? fb_is_primary_device+0x6e/0x73 [0.399029] ? sanitize_boot_params.constprop.0+0xa/0xef [0.399035] ? fb_is_primary_device+0x73/0x73 [0.399040]
Re: xen 4.15.5: msr_relaxed required for MSR 0x1a2
On Mon, Nov 20, 2023 at 10:24:05AM +0100, Roger Pau Monné wrote: > On Mon, Nov 20, 2023 at 08:27:36AM +0000, James Dingwall wrote: > > On Fri, Nov 17, 2023 at 10:56:30AM +0100, Jan Beulich wrote: > > > On 17.11.2023 10:18, James Dingwall wrote: > > > > On Thu, Nov 16, 2023 at 04:32:47PM +, Andrew Cooper wrote: > > > >> On 16/11/2023 4:15 pm, James Dingwall wrote: > > > >>> Hi, > > > >>> > > > >>> Per the msr_relaxed documentation: > > > >>> > > > >>>"If using this option is necessary to fix an issue, please report > > > >>> a bug." > > > >>> > > > >>> After recently upgrading an environment from Xen 4.14.5 to Xen 4.15.5 > > > >>> we > > > >>> started experiencing a BSOD at boot with one of our Windows guests. > > > >>> We found > > > >>> that enabling `msr_relaxed = 1` in the guest configuration has > > > >>> resolved the > > > >>> problem. With a debug build of Xen and `hvm_debug=2048` on the > > > >>> command line > > > >>> the following messages were caught as the BSOD happened: > > > >>> > > > >>> (XEN) [HVM:11.0] ecx=0x1a2 > > > >>> (XEN) vmx.c:3298:d11v0 RDMSR 0x01a2 unimplemented > > > >>> (XEN) d11v0 VIRIDIAN CRASH: 1e c096 f80b8de81eb5 0 0 > > > >>> > > > >>> I found that MSR 0x1a2 is MSR_TEMPERATURE_TARGET and from that this > > > >>> patch > > > >>> series from last month: > > > >>> > > > >>> https://patchwork.kernel.org/project/xen-devel/list/?series=796550 > > > >>> > > > >>> Picking out just a small part of that fixes the problem for us. > > > >>> Although the > > > >>> the patch is against 4.15.5 I think it would be relevant to more > > > >>> recent > > > >>> releases too. > > > >> > > > >> Which version of Windows, and what hardware? > > > >> > > > >> The Viridian Crash isn't about the RDMSR itself - it's presumably > > > >> collateral damage shortly thereafter. > > > >> > > > >> Does filling in 0 for that MSR also resolve the issue? It's model > > > >> specific and we absolutely cannot pass it through from real hardware > > > >> like that. > > > >> > > > > > > > > Hi Andrew, > > > > > > > > Thanks for your response. The guest is running Windows 10 and the crash > > > > happens in a proprietary hardware driver. A little bit of knowledge as > > > > they say was enough to stop the crash but I don't understand the impact > > > > of what I've actually done... > > > > > > > > To rework the patch I'd need a bit of guidance, if I understand your > > > > suggestion I set the MSR to 0 with this change in emul-priv-op.c: > > > > > > For the purpose of the experiment suggested by Andrew ... > > > > > > > diff --git a/xen/arch/x86/pv/emul-priv-op.c > > > > b/xen/arch/x86/pv/emul-priv-op.c > > > > index ed97b1d6fcc..66f5e417df6 100644 > > > > --- a/xen/arch/x86/pv/emul-priv-op.c > > > > +++ b/xen/arch/x86/pv/emul-priv-op.c > > > > @@ -976,6 +976,10 @@ static int read_msr(unsigned int reg, uint64_t > > > > *val, > > > > *val = 0; > > > > return X86EMUL_OKAY; > > > > > > > > +case MSR_TEMPERATURE_TARGET: > > > > +*val = 0; > > > > +return X86EMUL_OKAY; > > > > + > > > > case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7): > > > > case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3): > > > > case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2: > > > > > > ... you wouldn't need this (affects PV domains only), and ... > > > > > > > and this in vmx.c: > > > > > > > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > > > > index 54023a92587..bbf37b7f272 100644 > > > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > > > @@ -3259,6 +3259,11 @@ static int vmx_msr_read_intercept(unsigned int > > > > msr, uint64_t *msr_content) > > > > if ( !nvmx_msr_read_intercept(msr, msr_content) ) > > > > goto gp_fault; > > > > break; > > > > + > > > > +case MSR_TEMPERATURE_TARGET: > > > > +*msr_content = 0; > > > > +break; > > I think the preference now is to add such handling directly in > guest_rdmsr()? Protected with a: > > if ( !(cp->x86_vendor & (X86_VENDOR_INTEL)) ) > goto gp_fault; > It is possible we can patch the the driver which is triggering the BSOD but it seems unlileky we'd be able to roll that out in advance of doing the Xen upgrade for dom0. If the problem we are encountering is specific to our situation rather than a general case issue then we can easily carry a patch for that. Thanks for the help, James
Re: xen 4.15.5: msr_relaxed required for MSR 0x1a2
On Fri, Nov 17, 2023 at 11:17:46AM +0100, Roger Pau Monné wrote: > On Fri, Nov 17, 2023 at 09:18:39AM +0000, James Dingwall wrote: > > On Thu, Nov 16, 2023 at 04:32:47PM +, Andrew Cooper wrote: > > > On 16/11/2023 4:15 pm, James Dingwall wrote: > > > > Hi, > > > > > > > > Per the msr_relaxed documentation: > > > > > > > >"If using this option is necessary to fix an issue, please report a > > > > bug." > > > > > > > > After recently upgrading an environment from Xen 4.14.5 to Xen 4.15.5 we > > > > started experiencing a BSOD at boot with one of our Windows guests. We > > > > found > > > > that enabling `msr_relaxed = 1` in the guest configuration has resolved > > > > the > > > > problem. With a debug build of Xen and `hvm_debug=2048` on the command > > > > line > > > > the following messages were caught as the BSOD happened: > > > > > > > > (XEN) [HVM:11.0] ecx=0x1a2 > > > > (XEN) vmx.c:3298:d11v0 RDMSR 0x01a2 unimplemented > > > > (XEN) d11v0 VIRIDIAN CRASH: 1e c096 f80b8de81eb5 0 0 > > > > > > > > I found that MSR 0x1a2 is MSR_TEMPERATURE_TARGET and from that this > > > > patch > > > > series from last month: > > > > > > > > https://patchwork.kernel.org/project/xen-devel/list/?series=796550 > > > > > > > > Picking out just a small part of that fixes the problem for us. > > > > Although the > > > > the patch is against 4.15.5 I think it would be relevant to more recent > > > > releases too. > > > > > > Which version of Windows, and what hardware? > > > > > > The Viridian Crash isn't about the RDMSR itself - it's presumably > > > collateral damage shortly thereafter. > > > > > > Does filling in 0 for that MSR also resolve the issue? It's model > > > specific and we absolutely cannot pass it through from real hardware > > > like that. > > > > > > > Hi Andrew, > > > > Thanks for your response. The guest is running Windows 10 and the crash > > happens in a proprietary hardware driver. > > When you say proprietary you mean a custom driver made for your > use-case, or is this some vendor driver widely available? > Hi Roger, We have emulated some point of sale hardware with a custom qemu device. It is reasonably common but limited to its particular sector. As the physical hardware is all built to the same specification I assume the driver has made assumptions about the availability of MSR_TEMPERATURE_TARGET and doesn't handle the case it is absent which leads to the BSOD in the Windows guest. Regards, James
Re: xen 4.15.5: msr_relaxed required for MSR 0x1a2
On Fri, Nov 17, 2023 at 10:56:30AM +0100, Jan Beulich wrote: > On 17.11.2023 10:18, James Dingwall wrote: > > On Thu, Nov 16, 2023 at 04:32:47PM +, Andrew Cooper wrote: > >> On 16/11/2023 4:15 pm, James Dingwall wrote: > >>> Hi, > >>> > >>> Per the msr_relaxed documentation: > >>> > >>>"If using this option is necessary to fix an issue, please report a > >>> bug." > >>> > >>> After recently upgrading an environment from Xen 4.14.5 to Xen 4.15.5 we > >>> started experiencing a BSOD at boot with one of our Windows guests. We > >>> found > >>> that enabling `msr_relaxed = 1` in the guest configuration has resolved > >>> the > >>> problem. With a debug build of Xen and `hvm_debug=2048` on the command > >>> line > >>> the following messages were caught as the BSOD happened: > >>> > >>> (XEN) [HVM:11.0] ecx=0x1a2 > >>> (XEN) vmx.c:3298:d11v0 RDMSR 0x01a2 unimplemented > >>> (XEN) d11v0 VIRIDIAN CRASH: 1e c096 f80b8de81eb5 0 0 > >>> > >>> I found that MSR 0x1a2 is MSR_TEMPERATURE_TARGET and from that this patch > >>> series from last month: > >>> > >>> https://patchwork.kernel.org/project/xen-devel/list/?series=796550 > >>> > >>> Picking out just a small part of that fixes the problem for us. Although > >>> the > >>> the patch is against 4.15.5 I think it would be relevant to more recent > >>> releases too. > >> > >> Which version of Windows, and what hardware? > >> > >> The Viridian Crash isn't about the RDMSR itself - it's presumably > >> collateral damage shortly thereafter. > >> > >> Does filling in 0 for that MSR also resolve the issue? It's model > >> specific and we absolutely cannot pass it through from real hardware > >> like that. > >> > > > > Hi Andrew, > > > > Thanks for your response. The guest is running Windows 10 and the crash > > happens in a proprietary hardware driver. A little bit of knowledge as > > they say was enough to stop the crash but I don't understand the impact > > of what I've actually done... > > > > To rework the patch I'd need a bit of guidance, if I understand your > > suggestion I set the MSR to 0 with this change in emul-priv-op.c: > > For the purpose of the experiment suggested by Andrew ... > > > diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c > > index ed97b1d6fcc..66f5e417df6 100644 > > --- a/xen/arch/x86/pv/emul-priv-op.c > > +++ b/xen/arch/x86/pv/emul-priv-op.c > > @@ -976,6 +976,10 @@ static int read_msr(unsigned int reg, uint64_t *val, > > *val = 0; > > return X86EMUL_OKAY; > > > > +case MSR_TEMPERATURE_TARGET: > > +*val = 0; > > +return X86EMUL_OKAY; > > + > > case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7): > > case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3): > > case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2: > > ... you wouldn't need this (affects PV domains only), and ... > > > and this in vmx.c: > > > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > > index 54023a92587..bbf37b7f272 100644 > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -3259,6 +3259,11 @@ static int vmx_msr_read_intercept(unsigned int msr, > > uint64_t *msr_content) > > if ( !nvmx_msr_read_intercept(msr, msr_content) ) > > goto gp_fault; > > break; > > + > > +case MSR_TEMPERATURE_TARGET: > > +*msr_content = 0; > > +break; > > + > > case MSR_IA32_MISC_ENABLE: > > rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content); > > /* Debug Trace Store is not supported. */ > > ... indeed this ought to do. An eventual real patch may want to look > different, though. > Thanks Jan, based on the information I've reduced the patch to what seems the minimal necessary to workaround the BSOD. I assume simply not ending up at X86EMUL_EXCEPTION is the resolution regardless of what value is set. Regards, James diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 54023a92587..bbf37b7f272 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3259,6 +3259,11 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content) if ( !nvmx_msr_read_intercept(msr, msr_content) ) goto gp_fault; break; + +case MSR_TEMPERATURE_TARGET: +*msr_content = 0; +break; + case MSR_IA32_MISC_ENABLE: rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content); /* Debug Trace Store is not supported. */ diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h index 8b3ad575dbc..34e800fdc01 100644 --- a/xen/include/asm-x86/msr-index.h +++ b/xen/include/asm-x86/msr-index.h @@ -498,6 +498,9 @@ #define MSR_IA32_MISC_ENABLE_XD_DISABLE (1ULL << 34) #define MSR_IA32_TSC_DEADLINE 0x06E0 + +#define MSR_TEMPERATURE_TARGET 0x01a2 + #define MSR_IA32_ENERGY_PERF_BIAS 0x01b0 /* Platform Shared Resource MSRs */
Re: xen 4.15.5: msr_relaxed required for MSR 0x1a2
On Thu, Nov 16, 2023 at 04:32:47PM +, Andrew Cooper wrote: > On 16/11/2023 4:15 pm, James Dingwall wrote: > > Hi, > > > > Per the msr_relaxed documentation: > > > >"If using this option is necessary to fix an issue, please report a bug." > > > > After recently upgrading an environment from Xen 4.14.5 to Xen 4.15.5 we > > started experiencing a BSOD at boot with one of our Windows guests. We > > found > > that enabling `msr_relaxed = 1` in the guest configuration has resolved the > > problem. With a debug build of Xen and `hvm_debug=2048` on the command line > > the following messages were caught as the BSOD happened: > > > > (XEN) [HVM:11.0] ecx=0x1a2 > > (XEN) vmx.c:3298:d11v0 RDMSR 0x01a2 unimplemented > > (XEN) d11v0 VIRIDIAN CRASH: 1e c096 f80b8de81eb5 0 0 > > > > I found that MSR 0x1a2 is MSR_TEMPERATURE_TARGET and from that this patch > > series from last month: > > > > https://patchwork.kernel.org/project/xen-devel/list/?series=796550 > > > > Picking out just a small part of that fixes the problem for us. Although the > > the patch is against 4.15.5 I think it would be relevant to more recent > > releases too. > > Which version of Windows, and what hardware? > > The Viridian Crash isn't about the RDMSR itself - it's presumably > collateral damage shortly thereafter. > > Does filling in 0 for that MSR also resolve the issue? It's model > specific and we absolutely cannot pass it through from real hardware > like that. > Hi Andrew, Thanks for your response. The guest is running Windows 10 and the crash happens in a proprietary hardware driver. A little bit of knowledge as they say was enough to stop the crash but I don't understand the impact of what I've actually done... To rework the patch I'd need a bit of guidance, if I understand your suggestion I set the MSR to 0 with this change in emul-priv-op.c: diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c index ed97b1d6fcc..66f5e417df6 100644 --- a/xen/arch/x86/pv/emul-priv-op.c +++ b/xen/arch/x86/pv/emul-priv-op.c @@ -976,6 +976,10 @@ static int read_msr(unsigned int reg, uint64_t *val, *val = 0; return X86EMUL_OKAY; +case MSR_TEMPERATURE_TARGET: +*val = 0; +return X86EMUL_OKAY; + case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7): case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3): case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2: and this in vmx.c: diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 54023a92587..bbf37b7f272 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3259,6 +3259,11 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content) if ( !nvmx_msr_read_intercept(msr, msr_content) ) goto gp_fault; break; + +case MSR_TEMPERATURE_TARGET: +*msr_content = 0; +break; + case MSR_IA32_MISC_ENABLE: rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content); /* Debug Trace Store is not supported. */ Thanks, James
xen 4.15.5: msr_relaxed required for MSR 0x1a2
Hi, Per the msr_relaxed documentation: "If using this option is necessary to fix an issue, please report a bug." After recently upgrading an environment from Xen 4.14.5 to Xen 4.15.5 we started experiencing a BSOD at boot with one of our Windows guests. We found that enabling `msr_relaxed = 1` in the guest configuration has resolved the problem. With a debug build of Xen and `hvm_debug=2048` on the command line the following messages were caught as the BSOD happened: (XEN) [HVM:11.0] ecx=0x1a2 (XEN) vmx.c:3298:d11v0 RDMSR 0x01a2 unimplemented (XEN) d11v0 VIRIDIAN CRASH: 1e c096 f80b8de81eb5 0 0 I found that MSR 0x1a2 is MSR_TEMPERATURE_TARGET and from that this patch series from last month: https://patchwork.kernel.org/project/xen-devel/list/?series=796550 Picking out just a small part of that fixes the problem for us. Although the the patch is against 4.15.5 I think it would be relevant to more recent releases too. Thanks, James diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 54023a92587..3f64471c8a8 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -3259,6 +3259,14 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content) if ( !nvmx_msr_read_intercept(msr, msr_content) ) goto gp_fault; break; + +case MSR_TEMPERATURE_TARGET: +if ( !rdmsr_safe(msr, *msr_content) ) +break; +/* RO for guests, MSR_PLATFORM_INFO bits set accordingly in msr.c to indicate lack of write + * support. */ +goto gp_fault; + case MSR_IA32_MISC_ENABLE: rdmsrl(MSR_IA32_MISC_ENABLE, *msr_content); /* Debug Trace Store is not supported. */ diff --git a/xen/arch/x86/pv/emul-priv-op.c b/xen/arch/x86/pv/emul-priv-op.c index ed97b1d6fcc..eb9eb45e820 100644 --- a/xen/arch/x86/pv/emul-priv-op.c +++ b/xen/arch/x86/pv/emul-priv-op.c @@ -976,6 +976,9 @@ static int read_msr(unsigned int reg, uint64_t *val, *val = 0; return X86EMUL_OKAY; +case MSR_TEMPERATURE_TARGET: +goto normal; + case MSR_P6_PERFCTR(0) ... MSR_P6_PERFCTR(7): case MSR_P6_EVNTSEL(0) ... MSR_P6_EVNTSEL(3): case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR2: diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h index 8b3ad575dbc..34e800fdc01 100644 --- a/xen/include/asm-x86/msr-index.h +++ b/xen/include/asm-x86/msr-index.h @@ -498,6 +498,9 @@ #define MSR_IA32_MISC_ENABLE_XD_DISABLE (1ULL << 34) #define MSR_IA32_TSC_DEADLINE 0x06E0 + +#define MSR_TEMPERATURE_TARGET 0x01a2 + #define MSR_IA32_ENERGY_PERF_BIAS 0x01b0 /* Platform Shared Resource MSRs */
Re: live migration fails: qemu placing pci devices at different locations
On Tue, Oct 31, 2023 at 10:07:29AM +, James Dingwall wrote: > Hi, > > I'm having a bit of trouble performing live migration between hvm guests. The > sending side is xen 4.14.5 (qemu 5.0), receiving 4.15.5 (qemu 5.1). The error > message recorded in qemu-dm---incoming.log: > > qemu-system-i386: Unknown savevm section or instance ':00:04.0/vga' 0. > Make sure that your current VM setup matches your saved VM setup, including > any hotplugged devices > > I have patched libxl_dm.c to explicitly assign `addr=xx` values for various > devices and when these are correct the domain migrates correctly. However > the configuration differences between guests means that the values are not > consistent. The domain config file doesn't allow the pci address to be > expressed in the configuration for, e.g. `soundhw="DEVICE"` > > e.g. > > diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c > index 6e531863ac0..daa7c49846f 100644 > --- a/tools/libs/light/libxl_dm.c > +++ b/tools/libs/light/libxl_dm.c > @@ -1441,7 +1441,7 @@ static int libxl__build_device_model_args_new(libxl__gc > *gc, > flexarray_append(dm_args, "-spice"); > flexarray_append(dm_args, spiceoptions); > if (libxl_defbool_val(b_info->u.hvm.spice.vdagent)) { > -flexarray_vappend(dm_args, "-device", "virtio-serial", > +flexarray_vappend(dm_args, "-device", > "virtio-serial,addr=04", > "-chardev", "spicevmc,id=vdagent,name=vdagent", > "-device", > "virtserialport,chardev=vdagent,name=com.redhat.spice.0", > NULL); > > The order of devices on the qemu command line (below) appears to be the same > so my assumption is that the internals of qemu have resulted in things being > connected in a different order. The output of a Windows `lspci` tool is > also included. > > Could anyone make any additional suggestions on how I could try to gain > consistency between the different qemu versions? After a bit more head scratching we worked out the cause and a solution for our case. In xen 4.15.4 d65ebacb78901b695bc5e8a075ad1ad865a78928 was introduced to stop using the deprecated qemu `-soundhw` option. The qemu device initialisation code looks like: ... soundhw_init(); // handles old -soundhw option ... /* init generic devices */ rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE); qemu_opts_foreach(qemu_find_opts("device"), device_init_func, NULL, _fatal); ... So for the old -soundhw option this was processed before any -device options and the sound card was assigned the next available slot on the bus and then any further -devices were added according to the command line order. After that xen change the sound card was added as a -device and depending on the other emulated hardware would be added at a different point to the equivalent -soundhw option. By re-ordering the qemu command line building in libxl_dm.c we can make the sound card be the first -device which resolves the migration problem. I think this would also have been a problem for live migration between 4.15.3 and 4.15.4 for a vm with a sound card and not just the major version jump we are doing. James
live migration fails: qemu placing pci devices at different locations
Hi, I'm having a bit of trouble performing live migration between hvm guests. The sending side is xen 4.14.5 (qemu 5.0), receiving 4.15.5 (qemu 5.1). The error message recorded in qemu-dm---incoming.log: qemu-system-i386: Unknown savevm section or instance ':00:04.0/vga' 0. Make sure that your current VM setup matches your saved VM setup, including any hotplugged devices I have patched libxl_dm.c to explicitly assign `addr=xx` values for various devices and when these are correct the domain migrates correctly. However the configuration differences between guests means that the values are not consistent. The domain config file doesn't allow the pci address to be expressed in the configuration for, e.g. `soundhw="DEVICE"` e.g. diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c index 6e531863ac0..daa7c49846f 100644 --- a/tools/libs/light/libxl_dm.c +++ b/tools/libs/light/libxl_dm.c @@ -1441,7 +1441,7 @@ static int libxl__build_device_model_args_new(libxl__gc *gc, flexarray_append(dm_args, "-spice"); flexarray_append(dm_args, spiceoptions); if (libxl_defbool_val(b_info->u.hvm.spice.vdagent)) { -flexarray_vappend(dm_args, "-device", "virtio-serial", +flexarray_vappend(dm_args, "-device", "virtio-serial,addr=04", "-chardev", "spicevmc,id=vdagent,name=vdagent", "-device", "virtserialport,chardev=vdagent,name=com.redhat.spice.0", NULL); The order of devices on the qemu command line (below) appears to be the same so my assumption is that the internals of qemu have resulted in things being connected in a different order. The output of a Windows `lspci` tool is also included. Could anyone make any additional suggestions on how I could try to gain consistency between the different qemu versions? Thanks, James xen 4.14.5 /usr/lib/xen/bin/qemu-system-i386 -xen-domid 19 -no-shutdown -chardev socket,id=libxl-cmd,fd=19,server,nowait -S -mon chardev=libxl-cmd,mode=control -chardev socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-19,server,nowait -mon chardev=libxenstat-cmd,mode=control -nodefaults -no-user-config -name -vnc 0.0.0.0:93 -display none -k en-us -spice port=35993,tls-port=0,addr=127.0.0.1,disable-ticketing,agent-mouse=on,disable-copy-paste,image-compression=auto_glz -device virtio-serial -chardev spicevmc,id=vdagent,name=vdagent -device virtserialport,chardev=vdagent,name=com.redhat.spice.0 -device VGA,vgamem_mb=16 -boot order=cn -usb -usbdevice tablet -soundhw hda -smp 2,maxcpus=2 -device rtl8139,id=nic0,netdev=net0,mac=00:16:3e:64:c8:68 -netdev type=tap,id=net0,ifname=vif19.0-emu,script=no,downscript=no -object tls-creds-x509,id=tls0,endpoint=client,dir=/etc/certificates/usbredir,verify-peer=yes -chardev socket,id=charredir_serial0,host=127.0.0.1,port=48052,reconnect=2,nodelay,keepalive=on,user-timeout=5 -device isa-serial,chardev=charredir_serial0 -chardev socket,id=charredir_serial1,host=127.0.0.1,port=48054,reconnect=2,nodelay,keepalive=on,user-timeout=5 -device isa-serial,chardev=charredir_serial1 -chardev socket,id=charredir_serial2,host=127.0.0.1,port=48055,reconnect=2,nodelay,keepalive=on,user-timeout=5 -device pci-serial,chardev=charredir_serial2 -trace events=/etc/xen/qemu-trace-options -machine xenfv -m 2032 -drive file=/dev/drbd1002,if=ide,index=0,media=disk,format=raw,cache=writeback -drive file=/dev/drbd1003,if=ide,index=1,media=disk,format=raw,cache=writeback -runas 131091:131072 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) 00:03.0 Audio device: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) High Definition Audio Controller (rev 01) 00:04.0 Communication controller: Red Hat, Inc Virtio console 00:05.0 VGA compatible controller: Device 1234: (rev 02) 00:07.0 Serial controller: Red Hat, Inc. QEMU PCI 16550A Adapter (rev 01) xen 4.15.5 /usr/lib/xen/bin/qemu-system-i386 -xen-domid 15 -no-shutdown -chardev socket,id=libxl-cmd,fd=19,server=on,wait=off -S -mon chardev=libxl-cmd,mode=control -chardev socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-15,server=on,wait=off -mon chardev=libxenstat-cmd,mode=control -nodefaults -no-user-config -name -vnc 0.0.0.0:93 -display none -k en-us -spice port=35993,tls-port=0,addr=127.0.0.1,disable-ticketing=on,agent-mouse=on,disable-copy-paste=on,image-compression=auto_glz -device virtio-serial -chardev spicevmc,id=vdagent,name=vdagent -device
Re: [PATCH] fix invalid frontend path for set_mtu
On 2022-04-27 10:17, Anthony PERARD wrote: On Tue, Apr 19, 2022 at 01:04:18PM +0100, James Dingwall wrote: Thank you for your feedback. I've updated the patch as suggested. I've also incorporated two other changes, one is a simple style change for consistency, the other is to change a the test for a valid mtu from > 0 to >= 68. I can resubmit the original patch if either of these are a problem. The style change is fine, but I'd rather have the change to the mtu check in a different patch. Otherwise, the patch looks better, thanks. Here is a revised version of the patch that removes the mtu change. Thanks, Jamescommit f6ec92717522e74b4cc3aa4160b8ad6884e0b50c Author: James Dingwall Date: Tue Apr 19 12:45:31 2022 +0100 The set_mtu() function of xen-network-common.sh currently has this code: if [ ${type_if} = vif ] then local dev_=${dev#vif} local domid=${dev_%.*} local devid=${dev_#*.} local FRONTEND_PATH="/local/domain/$domid/device/vif/$devid" xenstore_write "$FRONTEND_PATH/mtu" ${mtu} fi This works fine if the device has its default name but if the xen config defines the vifname parameter the FRONTEND_PATH is incorrectly constructed. Learn the frontend path by reading the appropriate value from the backend. Also change use of `...` to $(...) for a consistent style in the script. Signed-off-by: James Dingwall diff --git a/tools/hotplug/Linux/xen-network-common.sh b/tools/hotplug/Linux/xen-network-common.sh index 42fa704e8d..7a63308a9e 100644 --- a/tools/hotplug/Linux/xen-network-common.sh +++ b/tools/hotplug/Linux/xen-network-common.sh @@ -171,7 +171,7 @@ set_mtu () { local mtu=$(xenstore_read_default "$XENBUS_PATH/mtu" "") if [ -z "$mtu" ] then -mtu="`ip link show dev ${bridge}| awk '/mtu/ { print $5 }'`" +mtu="$(ip link show dev ${bridge}| awk '/mtu/ { print $5 }')" if [ -n "$mtu" ] then log debug "$bridge MTU is $mtu" @@ -184,11 +184,7 @@ set_mtu () { if [ ${type_if} = vif ] then -local dev_=${dev#vif} -local domid=${dev_%.*} -local devid=${dev_#*.} - -local FRONTEND_PATH="/local/domain/$domid/device/vif/$devid" +local FRONTEND_PATH="$(xenstore_read "$XENBUS_PATH/frontend")" xenstore_write "$FRONTEND_PATH/mtu" ${mtu} fi
Re: [PATCH] fix invalid frontend path for set_mtu
Hi Anthony, On Tue, Apr 12, 2022 at 02:03:17PM +0100, Anthony PERARD wrote: > Hi James, > > On Tue, Mar 01, 2022 at 09:35:13AM +0000, James Dingwall wrote: > > The set_mtu() function of xen-network-common.sh currently has this code: > > > > if [ ${type_if} = vif ] > > then > > local dev_=${dev#vif} > > local domid=${dev_%.*} > > local devid=${dev_#*.} > > > > local FRONTEND_PATH="/local/domain/$domid/device/vif/$devid" > > > > xenstore_write "$FRONTEND_PATH/mtu" ${mtu} > > fi > > > > This works fine if the device has its default name but if the xen config > > defines the vifname parameter the FRONTEND_PATH is incorrectly constructed. > > Learn the frontend path by reading the appropriate value from the backend. > > The patch looks fine, thanks. It is only missing a line > "Signed-off-by: your_name " at the end of the description. > The meaning of this line is described in the file CONTRIBUTING, section > "Developer's Certificate of Origin". > Thank you for your feedback. I've updated the patch as suggested. I've also incorporated two other changes, one is a simple style change for consistency, the other is to change a the test for a valid mtu from > 0 to >= 68. I can resubmit the original patch if either of these are a problem. Thanks, James commit 03ad5670f8a7402e30b288a55d088e87685cd1a1 Author: James Dingwall Date: Tue Apr 19 12:45:31 2022 +0100 The set_mtu() function of xen-network-common.sh currently has this code: if [ ${type_if} = vif ] then local dev_=${dev#vif} local domid=${dev_%.*} local devid=${dev_#*.} local FRONTEND_PATH="/local/domain/$domid/device/vif/$devid" xenstore_write "$FRONTEND_PATH/mtu" ${mtu} fi This works fine if the device has its default name but if the xen config defines the vifname parameter the FRONTEND_PATH is incorrectly constructed. Learn the frontend path by reading the appropriate value from the backend. Also change use of `...` to $(...) for a consistent style in the script and adjust the valid check from `mtu > 0` to `mtu >= 68` per RFC 791. Signed-off-by: James Dingwall diff --git a/tools/hotplug/Linux/xen-network-common.sh b/tools/hotplug/Linux/xen-network-common.sh index 42fa704e8d..9a382c39f4 100644 --- a/tools/hotplug/Linux/xen-network-common.sh +++ b/tools/hotplug/Linux/xen-network-common.sh @@ -171,24 +171,20 @@ set_mtu () { local mtu=$(xenstore_read_default "$XENBUS_PATH/mtu" "") if [ -z "$mtu" ] then -mtu="`ip link show dev ${bridge}| awk '/mtu/ { print $5 }'`" +mtu="$(ip link show dev ${bridge}| awk '/mtu/ { print $5 }')" if [ -n "$mtu" ] then log debug "$bridge MTU is $mtu" fi fi -if [ -n "$mtu" ] && [ "$mtu" -gt 0 ] +if [ -n "$mtu" ] && [ "$mtu" -ge 68 ] then log debug "setting $dev MTU to $mtu" ip link set dev ${dev} mtu ${mtu} || : if [ ${type_if} = vif ] then -local dev_=${dev#vif} -local domid=${dev_%.*} -local devid=${dev_#*.} - -local FRONTEND_PATH="/local/domain/$domid/device/vif/$devid" +local FRONTEND_PATH="$(xenstore_read "$XENBUS_PATH/frontend")" xenstore_write "$FRONTEND_PATH/mtu" ${mtu} fi
[PATCH] fix invalid frontend path for set_mtu
Hi, The set_mtu() function of xen-network-common.sh currently has this code: if [ ${type_if} = vif ] then local dev_=${dev#vif} local domid=${dev_%.*} local devid=${dev_#*.} local FRONTEND_PATH="/local/domain/$domid/device/vif/$devid" xenstore_write "$FRONTEND_PATH/mtu" ${mtu} fi This works fine if the device has its default name but if the xen config defines the vifname parameter the FRONTEND_PATH is incorrectly constructed. Learn the frontend path by reading the appropriate value from the backend. diff --git a/tools/hotplug/Linux/xen-network-common.sh b/tools/hotplug/Linux/xen-network-common.sh index 02e2388600..cd98f0d486 100644 --- a/tools/hotplug/Linux/xen-network-common.sh +++ b/tools/hotplug/Linux/xen-network-common.sh @@ -163,11 +163,7 @@ set_mtu () { if [ ${type_if} = vif ] then -local dev_=${dev#vif} -local domid=${dev_%.*} -local devid=${dev_#*.} - -local FRONTEND_PATH="/local/domain/$domid/device/vif/$devid" +local FRONTEND_PATH=$(xenstore_read "$XENBUS_PATH/frontend") xenstore_write "$FRONTEND_PATH/mtu" ${mtu} fi Thanks, James
Re: [RFC] kernel: xenfs parameter to hide deprecated files
Hi Juergen, On Fri, Feb 25, 2022 at 03:09:05PM +0100, Juergen Gross wrote: > On 23.02.22 19:08, James Dingwall wrote: > > Hi, > > > > I have been investigating a very intermittent issue we have with xenstore > > access hanging. Typically it seems to happen when all domains are stopped > > prior to a system reboot. xenstore is running in a stubdom and using the > > hypervisor debug keys indicates the domain is still there. > > Could it be dom0 shutdown handling is unloading some modules which are > needed for Xenstore communication? E.g. xen-evtchn? > > > > > I have come across some old list threads which suggested access via > > /proc/xen/xenbus could cause problems but it seems patches went in to the > > kernel for 4.10. However to eliminate this entirely as a possibility > > I came up with this kernel patch to hide deprecated entries in xenfs. > > I don't see how this patch could help. > > libxenstore is using /dev/xen/xenbus if it is available. So the only > case where your patch would avoid accessing /proc/xen/xenbus would be > if /dev/xen/xenbus isn't there. But this wouldn't make Xenstore more > reactive, I guess. ;-) > > > I found this old thread for a similar change where the entries were made > > conditional on kernel config options instead of a module parameter but > > this was never merged. > > > > https://lkml.org/lkml/2015/11/30/761 > > > > If this would be a useful feature I would welcome feedback. > > I'm not sure how helpful it is to let the user specify a boot parameter > for hiding the files. It will probably not get used a lot. Thank you for taking the time to look this over. I did suspect it might not be relevant for most people. I'll keep it in our build for now to see if we improve our xenstore stability. Thank you also for your suggestions about why we might be having a xenstore problem. Next time we encounter that I'll check the status of the loaded modules. Regards, James
[RFC] kernel: xenfs parameter to hide deprecated files
Hi, I have been investigating a very intermittent issue we have with xenstore access hanging. Typically it seems to happen when all domains are stopped prior to a system reboot. xenstore is running in a stubdom and using the hypervisor debug keys indicates the domain is still there. I have come across some old list threads which suggested access via /proc/xen/xenbus could cause problems but it seems patches went in to the kernel for 4.10. However to eliminate this entirely as a possibility I came up with this kernel patch to hide deprecated entries in xenfs. I found this old thread for a similar change where the entries were made conditional on kernel config options instead of a module parameter but this was never merged. https://lkml.org/lkml/2015/11/30/761 If this would be a useful feature I would welcome feedback. Thanks, James diff --git a/drivers/xen/xenfs/super.c b/drivers/xen/xenfs/super.c index d7d64235010d..d02c451f6a4d 100644 --- a/drivers/xen/xenfs/super.c +++ b/drivers/xen/xenfs/super.c @@ -3,6 +3,11 @@ * xenfs.c - a filesystem for passing info between the a domain and * the hypervisor. * + * 2022-02-12 James Dingwall Introduce hide_deprecated module parameter to + * mask: + * - xenbus (deprecated in xen 4.6.0) + * - privcmd (deprecated in xen 4.7.0) + * * 2008-10-07 Alex ZefferttReplaced /proc/xen/xenbus with xenfs filesystem * and /proc/xen compatibility mount point. * Turned xenfs into a loadable module. @@ -28,6 +33,13 @@ MODULE_DESCRIPTION("Xen filesystem"); MODULE_LICENSE("GPL"); +static bool __read_mostly hide_deprecated = 0; +module_param(hide_deprecated, bool, 0444); +MODULE_PARM_DESC(hide_deprecated, + "Allow deprecated files to be hidden in xenfs.\n"\ + " 0 - (default) show deprecated xenfs files."\ + " 1 - hide deprecated xenfs files [xenbus, privcmd].\n"); + static ssize_t capabilities_read(struct file *file, char __user *buf, size_t size, loff_t *off) { @@ -69,8 +81,32 @@ static int xenfs_fill_super(struct super_block *sb, struct fs_context *fc) xen_initial_domain() ? xenfs_init_files : xenfs_files); } +static int xenfs_fill_super_hide_deprecated(struct super_block *sb, struct fs_context *fc) +{ + static const struct tree_descr xenfs_files[] = { + [2] = { "capabilities", _file_ops, S_IRUGO }, + {""}, + }; + + static const struct tree_descr xenfs_init_files[] = { + [2] = { "capabilities", _file_ops, S_IRUGO }, + { "xsd_kva", _kva_file_ops, S_IRUSR|S_IWUSR}, + { "xsd_port", _port_file_ops, S_IRUSR|S_IWUSR}, +#ifdef CONFIG_XEN_SYMS + { "xensyms", _ops, S_IRUSR}, +#endif + {""}, + }; + + return simple_fill_super(sb, XENFS_SUPER_MAGIC, + xen_initial_domain() ? xenfs_init_files : xenfs_files); +} + static int xenfs_get_tree(struct fs_context *fc) { + if(hide_deprecated) + return get_tree_single(fc, xenfs_fill_super_hide_deprecated); + return get_tree_single(fc, xenfs_fill_super); }
tools: propogate MTU to vif frontends (backporting)
Hi, I've been backporting this series to xen 4.14 and everything relating to the backend seems to be working well. For the frontend I can see the mtu value published to xenstore but it does't appear to be consumed to set the matching mtu in the guest. https://lists.xenproject.org/archives/html/xen-devel/2020-08/msg00458.html Is the expected solution a custom script running in the guest to make the necessary change or have I missed something in how this is supposed to operate? Thanks, James
Re: possible kernel/libxl race with xl network-attach
On Mon, Jan 24, 2022 at 10:07:54AM +0100, Roger Pau Monné wrote: > On Fri, Jan 21, 2022 at 03:05:07PM +0000, James Dingwall wrote: > > On Fri, Jan 21, 2022 at 03:00:29PM +0100, Roger Pau Monné wrote: > > > On Fri, Jan 21, 2022 at 01:34:54PM +0000, James Dingwall wrote: > > > > On 2022-01-13 16:11, Roger Pau Monné wrote: > > > > > On Thu, Jan 13, 2022 at 11:19:46AM +, James Dingwall wrote: > > > > > > > > > > > > I have been trying to debug a problem where a vif with the backend > > > > > > in a > > > > > > driver domain is added to dom0. Intermittently the hotplug script > > > > > > is > > > > > > not invoked by libxl (running as xl devd) in the driver domain. By > > > > > > enabling some debug for the driver domain kernel and libxl I have > > > > > > these > > > > > > messages: > > > > > > > > > > > > driver domain kernel (Ubuntu 5.4.0-92-generic): > > > > > > > > > > > > [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: > > > > > > Successfully created xenvif > > > > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: > > > > > > /local/domain/0/device/vif/0 -> Initialising > > > > > > [Thu Jan 13 01:39:31 2022] [26] 470: > > > > > > xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait > > > > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: > > > > > > /local/domain/0/device/vif/0 -> Connected > > > > > > [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready > > > > > > [Thu Jan 13 01:39:31 2022] [26] 470: > > > > > > xen_netback:backend_switch_state: backend/vif/0/0 -> Connected > > > > > > > > > > > > xl devd (Xen 4.14.3): > > > > > > > > > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 > > > > > > wpath=/local/domain/2/backend token=3/0: event > > > > > > epath=/local/domain/2/backend/vif/0/0/state > > > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: > > > > > > nested ao, parent 0x5633ac567f90 > > > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 > > > > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event > > > > > > epath=/local/domain/2/backend/vif/0/0/state > > > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > > > libxl_event.c:1055:devstate_callback: backend > > > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting > > > > > > state 4 > > > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 > > > > > > wpath=/local/domain/2/backend token=3/0: event > > > > > > epath=/local/domain/2/backend/vif/0/0/state > > > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: > > > > > > nested ao, parent 0x5633ac567f90 > > > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 > > > > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event > > > > > > epath=/local/domain/2/backend/vif/0/0/state > > > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > > > libxl_event.c:1055:devstate_callback: backend > > > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting > > > > > > state 4 > > > > > > 2022-01-13 01:39:51 UTC libxl: debug: > > > > > > libxl_aoutils.c:88:xswait_timeout_callback: backend > > > > > > /local/domain/2/backend/vif/0/0/state (hoping for state change to > > > > > > 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state) > > > > > > 2022-01-13 01:39:51 UTC libxl: debug: > > > > > > libxl_event.c:850:libxl__ev_xswatch_deregister: watch > > > > > > w=0x5633ac569180 wpath=/local/domain/
Re: possible kernel/libxl race with xl network-attach
On Fri, Jan 21, 2022 at 03:00:29PM +0100, Roger Pau Monné wrote: > On Fri, Jan 21, 2022 at 01:34:54PM +0000, James Dingwall wrote: > > On 2022-01-13 16:11, Roger Pau Monné wrote: > > > On Thu, Jan 13, 2022 at 11:19:46AM +0000, James Dingwall wrote: > > > > > > > > I have been trying to debug a problem where a vif with the backend > > > > in a > > > > driver domain is added to dom0. Intermittently the hotplug script is > > > > not invoked by libxl (running as xl devd) in the driver domain. By > > > > enabling some debug for the driver domain kernel and libxl I have > > > > these > > > > messages: > > > > > > > > driver domain kernel (Ubuntu 5.4.0-92-generic): > > > > > > > > [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: > > > > Successfully created xenvif > > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: > > > > /local/domain/0/device/vif/0 -> Initialising > > > > [Thu Jan 13 01:39:31 2022] [26] 470: > > > > xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait > > > > [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: > > > > /local/domain/0/device/vif/0 -> Connected > > > > [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready > > > > [Thu Jan 13 01:39:31 2022] [26] 470: > > > > xen_netback:backend_switch_state: backend/vif/0/0 -> Connected > > > > > > > > xl devd (Xen 4.14.3): > > > > > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 > > > > wpath=/local/domain/2/backend token=3/0: event > > > > epath=/local/domain/2/backend/vif/0/0/state > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: > > > > nested ao, parent 0x5633ac567f90 > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 > > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event > > > > epath=/local/domain/2/backend/vif/0/0/state > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > libxl_event.c:1055:devstate_callback: backend > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting > > > > state 4 > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 > > > > wpath=/local/domain/2/backend token=3/0: event > > > > epath=/local/domain/2/backend/vif/0/0/state > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: > > > > nested ao, parent 0x5633ac567f90 > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 > > > > wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event > > > > epath=/local/domain/2/backend/vif/0/0/state > > > > 2022-01-13 01:39:31 UTC libxl: debug: > > > > libxl_event.c:1055:devstate_callback: backend > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting > > > > state 4 > > > > 2022-01-13 01:39:51 UTC libxl: debug: > > > > libxl_aoutils.c:88:xswait_timeout_callback: backend > > > > /local/domain/2/backend/vif/0/0/state (hoping for state change to > > > > 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state) > > > > 2022-01-13 01:39:51 UTC libxl: debug: > > > > libxl_event.c:850:libxl__ev_xswatch_deregister: watch > > > > w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state > > > > token=2/1: deregister slotnum=2 > > > > 2022-01-13 01:39:51 UTC libxl: debug: > > > > libxl_event.c:1039:devstate_callback: backend > > > > /local/domain/2/backend/vif/0/0/state wanted state 2 timed out > > > > 2022-01-13 01:39:51 UTC libxl: debug: > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch > > > > w=0x5633ac569180: deregister unregistered > > > > 2022-01-13 01:39:51 UTC libxl: debug: > > > > libxl_device.c:1092:device_backend_callback: calling > > > > device_backend_cleanup > > > > 2022-01-13 01:39:51 UTC libxl: debug: > > > > libxl_event.c:864:libxl__ev_xswatch_deregister: watch > > > > w=0x5633
Re: possible kernel/libxl race with xl network-attach
On 2022-01-13 16:11, Roger Pau Monné wrote: On Thu, Jan 13, 2022 at 11:19:46AM +, James Dingwall wrote: I have been trying to debug a problem where a vif with the backend in a driver domain is added to dom0. Intermittently the hotplug script is not invoked by libxl (running as xl devd) in the driver domain. By enabling some debug for the driver domain kernel and libxl I have these messages: driver domain kernel (Ubuntu 5.4.0-92-generic): [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: Successfully created xenvif [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: /local/domain/0/device/vif/0 -> Initialising [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: /local/domain/0/device/vif/0 -> Connected [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: backend/vif/0/0 -> Connected xl devd (Xen 4.14.3): 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: nested ao, parent 0x5633ac567f90 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting state 4 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: nested ao, parent 0x5633ac567f90 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting state 4 2022-01-13 01:39:51 UTC libxl: debug: libxl_aoutils.c:88:xswait_timeout_callback: backend /local/domain/2/backend/vif/0/0/state (hoping for state change to 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state) 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: deregister slotnum=2 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:1039:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 timed out 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: deregister unregistered 2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1092:device_backend_callback: calling device_backend_cleanup 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: deregister unregistered 2022-01-13 01:39:51 UTC libxl: error: libxl_device.c:1105:device_backend_callback: unable to add device with path /local/domain/2/backend/vif/0/0 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569280: deregister unregistered 2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1470:device_complete: device /local/domain/2/backend/vif/0/0 add failed 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 0x5633ac568f30: destroy the xenstore content for the backend: # xenstore-ls /local/domain/2/backend/vif/0 0 = "" frontend = "/local/domain/0/device/vif/0" frontend-id = "0" online = "1" state = "4" script = "/etc/xen/scripts/vif-zynstra" vifname = "dom0.0" mac = "00:16:3e:6c:de:82" bridge = "cluster" handle = "0" type = "vif" feature-sg = "1" feature-gso-tcpv4 = "1" feature-gso-tcpv6 = "1" feature-ipv6-csum-offload = "1" feature-rx-copy = "1" feature-rx-flip = "0" feature-multicast-control = "1" feature-dynamic-multicast-control = "1" feature-split-event-channels = "1" multi-queue-max-queues = "2" feature-ctrl-ring = "1" hotplug-status = "connected" My guess is that the libxl callback is started waiting for the backend state key to be set to XenbusStateInitWait (2) but the frontend in dom0 has already triggered the backend to
possible kernel/libxl race with xl network-attach
Hi, I have been trying to debug a problem where a vif with the backend in a driver domain is added to dom0. Intermittently the hotplug script is not invoked by libxl (running as xl devd) in the driver domain. By enabling some debug for the driver domain kernel and libxl I have these messages: driver domain kernel (Ubuntu 5.4.0-92-generic): [Thu Jan 13 01:39:31 2022] [1408] 564: vif vif-0-0 vif0.0: Successfully created xenvif [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: /local/domain/0/device/vif/0 -> Initialising [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: backend/vif/0/0 -> InitWait [Thu Jan 13 01:39:31 2022] [26] 583: xen_netback:frontend_changed: /local/domain/0/device/vif/0 -> Connected [Thu Jan 13 01:39:31 2022] vif vif-0-0 vif0.0: Guest Rx ready [Thu Jan 13 01:39:31 2022] [26] 470: xen_netback:backend_switch_state: backend/vif/0/0 -> Connected xl devd (Xen 4.14.3): 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac569700: nested ao, parent 0x5633ac567f90 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting state 4 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x7ffd416b0528 wpath=/local/domain/2/backend token=3/0: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:2445:libxl__nested_ao_create: ao 0x5633ac56a220: nested ao, parent 0x5633ac567f90 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:750:watchfd_callback: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: event epath=/local/domain/2/backend/vif/0/0/state 2022-01-13 01:39:31 UTC libxl: debug: libxl_event.c:1055:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 still waiting state 4 2022-01-13 01:39:51 UTC libxl: debug: libxl_aoutils.c:88:xswait_timeout_callback: backend /local/domain/2/backend/vif/0/0/state (hoping for state change to 2): xswait timeout (path=/local/domain/2/backend/vif/0/0/state) 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:850:libxl__ev_xswatch_deregister: watch w=0x5633ac569180 wpath=/local/domain/2/backend/vif/0/0/state token=2/1: deregister slotnum=2 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:1039:devstate_callback: backend /local/domain/2/backend/vif/0/0/state wanted state 2 timed out 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: deregister unregistered 2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1092:device_backend_callback: calling device_backend_cleanup 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569180: deregister unregistered 2022-01-13 01:39:51 UTC libxl: error: libxl_device.c:1105:device_backend_callback: unable to add device with path /local/domain/2/backend/vif/0/0 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:864:libxl__ev_xswatch_deregister: watch w=0x5633ac569280: deregister unregistered 2022-01-13 01:39:51 UTC libxl: debug: libxl_device.c:1470:device_complete: device /local/domain/2/backend/vif/0/0 add failed 2022-01-13 01:39:51 UTC libxl: debug: libxl_event.c:2035:libxl__ao__destroy: ao 0x5633ac568f30: destroy the xenstore content for the backend: # xenstore-ls /local/domain/2/backend/vif/0 0 = "" frontend = "/local/domain/0/device/vif/0" frontend-id = "0" online = "1" state = "4" script = "/etc/xen/scripts/vif-zynstra" vifname = "dom0.0" mac = "00:16:3e:6c:de:82" bridge = "cluster" handle = "0" type = "vif" feature-sg = "1" feature-gso-tcpv4 = "1" feature-gso-tcpv6 = "1" feature-ipv6-csum-offload = "1" feature-rx-copy = "1" feature-rx-flip = "0" feature-multicast-control = "1" feature-dynamic-multicast-control = "1" feature-split-event-channels = "1" multi-queue-max-queues = "2" feature-ctrl-ring = "1" hotplug-status = "connected" My guess is that the libxl callback is started waiting for the backend state key to be set to XenbusStateInitWait (2) but the frontend in dom0 has already triggered the backend to transition to XenbusStateConnected (4) and therefore it does not successfully complete. Does this seem a reasonable explanation for the problem and what would the best approach to try and solve it? Thanks, James
Re: xen 4.14.3 incorrect (~3x) cpu frequency reported
On Fri, Jan 07, 2022 at 12:39:04PM +0100, Jan Beulich wrote: > On 06.01.2022 16:08, James Dingwall wrote: > >>> On Wed, Jul 21, 2021 at 12:59:11PM +0200, Jan Beulich wrote: > >>> > >>>> On 21.07.2021 11:29, James Dingwall wrote: > >>>> > >>>>> We have a system which intermittently starts up and reports an > >>>>> incorrect cpu frequency: > > ... > >>> I'm sorry to ask, but have you got around to actually doing that? Or > >>> else is resolving this no longer of interest? > > > > We have experienced an occurence of this issue on 4.14.3 with 'loglvl=all' > > present on the xen command line. I have attached the 'xl dmesg' output for > > the fast MHz boot, the diff from the normal case is small so I've not added > > that log separately: > > > > --- normal-mhz/xl-dmesg.txt 2022-01-06 14:13:47.231465234 + > > +++ funny-mhz/xl-dmesg.txt 2022-01-06 13:45:43.825148510 + > > @@ -211,7 +211,7 @@ > > (XEN) cap enforcement granularity: 10ms > > (XEN) load tracking window length 1073741824 ns > > (XEN) Platform timer is 24.000MHz HPET > > -(XEN) Detected 2294.639 MHz processor. > > +(XEN) Detected 7623.412 MHz processor. > > (XEN) EFI memory map: > > (XEN) 0-07fff type=3 attr=000f > > (XEN) 08000-3cfff type=7 attr=000f > > Below is a patch (suitably adjusted for 4.14.3) which I would hope can > take care of the issue (assuming my vague guess on the reasons wasn't > entirely off). It has some debugging code intentionally left in, and > it's also not complete yet (other timer code needing similar > adjustment). Given the improvements I've observed independent of your > issue, I may not wait with submission until getting feedback from you, > since - aiui - it may take some time for you to actually run into a > case where the change would actually make an observable difference. I'll get it added to our build and see what we find... Thanks, James > > Jan > > x86: improve TSC / CPU freq calibration accuracy > > While the problem report was for extreme errors, even smaller ones would > better be avoided: The calculated period to run calibration loops over > can (and usually will) be shorter than the actual time elapsed between > first and last platform timer and TSC reads. Adjust values returned from > the init functions accordingly. > > On a Skylake system I've tested this on accuracy (using HPET) went from > detecting in some cases more than 220kHz too high a value to about > ±1kHz. On other systems the original error range was much smaller, with > less (in some cases only very little) improvement. > > Reported-by: James Dingwall > Signed-off-by: Jan Beulich > --- > TBD: Do we think we need to guard against the bizarre case of > "target + count" overflowing (i.e. wrapping)? > TBD: Accuracy could be slightly further improved by using a (to be > introduced) rounding variant of muldiv64(). > TBD: I'm not entirely sure how useful the conditionals are - there > shouldn't be any inaccuracies from the division when count equals > target (upon entry to the conditionals), as then the divisor is > what the original value was just multiplied by. > > --- a/xen/arch/x86/time.c > +++ b/xen/arch/x86/time.c > @@ -378,8 +378,9 @@ static u64 read_hpet_count(void) > > static int64_t __init init_hpet(struct platform_timesource *pts) > { > -uint64_t hpet_rate, start; > +uint64_t hpet_rate, start, expired; > uint32_t count, target; > +unsigned int i;//temp > > if ( hpet_address && strcmp(opt_clocksource, pts->id) && > cpuidle_using_deep_cstate() ) > @@ -415,16 +416,35 @@ static int64_t __init init_hpet(struct p > > pts->frequency = hpet_rate; > > +for(i = 0; i < 16; ++i) {//temp > count = hpet_read32(HPET_COUNTER); > start = rdtsc_ordered(); > target = count + CALIBRATE_VALUE(hpet_rate); > if ( target < count ) > while ( hpet_read32(HPET_COUNTER) >= count ) > continue; > -while ( hpet_read32(HPET_COUNTER) < target ) > +while ( (count = hpet_read32(HPET_COUNTER)) < target ) > continue; > > -return (rdtsc_ordered() - start) * CALIBRATE_FRAC; > +expired = rdtsc_ordered() - start; > + > +if ( likely(count > target
Re: xen 4.14.3 incorrect (~3x) cpu frequency reported
Hi Jan, > > On Wed, Jul 21, 2021 at 12:59:11PM +0200, Jan Beulich wrote: > > > >> On 21.07.2021 11:29, James Dingwall wrote: > >> > >>> We have a system which intermittently starts up and reports an incorrect > >>> cpu frequency: ... > > I'm sorry to ask, but have you got around to actually doing that? Or > > else is resolving this no longer of interest? We have experienced an occurence of this issue on 4.14.3 with 'loglvl=all' present on the xen command line. I have attached the 'xl dmesg' output for the fast MHz boot, the diff from the normal case is small so I've not added that log separately: --- normal-mhz/xl-dmesg.txt 2022-01-06 14:13:47.231465234 + +++ funny-mhz/xl-dmesg.txt 2022-01-06 13:45:43.825148510 + @@ -211,7 +211,7 @@ (XEN) cap enforcement granularity: 10ms (XEN) load tracking window length 1073741824 ns (XEN) Platform timer is 24.000MHz HPET -(XEN) Detected 2294.639 MHz processor. +(XEN) Detected 7623.412 MHz processor. (XEN) EFI memory map: (XEN) 0-07fff type=3 attr=000f (XEN) 08000-3cfff type=7 attr=000f @@ -616,6 +616,7 @@ (XEN) PCI add device :b7:00.1 (XEN) PCI add device :b7:00.2 (XEN) PCI add device :b7:00.3 +(XEN) Platform timer appears to have unexpectedly wrapped 10 or more times. (XEN) [VT-D]d0:PCIe: unmap :65:00.2 (XEN) [VT-D]d32753:PCIe: map :65:00.2 (XEN) [VT-D]d0:PCIe: unmap :65:00.1 I also have the dom0 kernel dmesg available if that would be useful but I've left it off initially because the log is quite large. I don't see much in the diff between boots except where speed/times are reported and where things are initialised in a slightly different order. Thanks, James (XEN) parameter "basevideo" unknown! Xen 4.14.3 (XEN) Xen version 4.14.3 (@) (gcc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0) debug=n Fri Dec 10 16:11:21 UTC 2021 (XEN) Latest ChangeSet: Fri Dec 10 16:10:15 2021 + git:a598336409-dirty (XEN) build-id: 7b441504c9977229a3c6779041ea6493 (XEN) Bootloader: EFI (XEN) Command line: console=vga,com2 com2=115200,8n1 basevideo dom0_max_vcpus=4 dom0_mem=min:6144,max:65536m iommu=on,required,intpost,verbose,debug sched=credit2 flask=enforcing gnttab_max_frames=128 xpti=off smt=on loglvl=all (XEN) Xen image load base address: 0x5d40 (XEN) Video information: (XEN) VGA is graphics mode 1024x768, 32 bpp (XEN) Disc information: (XEN) Found 0 MBR signatures (XEN) Found 2 EDD information structures (XEN) CPU Vendor: Intel, Family 6 (0x6), Model 85 (0x55), Stepping 4 (raw 00050654) (XEN) EFI RAM map: (XEN) [, 0009] (usable) (XEN) [000a, 000f] (reserved) (XEN) [0010, 6965efff] (usable) (XEN) [6965f000, 6bee5fff] (reserved) (XEN) [6bee6000, 6c0a6fff] (usable) (XEN) [6c0a7000, 6ca43fff] (ACPI NVS) (XEN) [6ca44000, 6ed16fff] (reserved) (XEN) [6ed17000, 6fff] (usable) (XEN) [7000, 8fff] (reserved) (XEN) [fd00, fe7f] (reserved) (XEN) [fed2, fed44fff] (reserved) (XEN) [ff00, ] (reserved) (XEN) [0001, 00207fff] (usable) (XEN) ACPI: RSDP 6C0A7000, 0024 (r2 SUPERM) (XEN) ACPI: XSDT 6C0A70C8, 0114 (r1 SUPERM SUPERM 1072009 AMI 10013) (XEN) ACPI: FACP 6C0E9D78, 0114 (r6 SUPERM SMCI--MB 1072009 INTL 20091013) (XEN) ACPI: DSDT 6C0A7278, 42AFC (r2 SUPERM SMCI--MB 1072009 INTL 20091013) (XEN) ACPI: FACS 6CA42080, 0040 (XEN) ACPI: FPDT 6C0E9E90, 0044 (r1 1072009 AMI 10013) (XEN) ACPI: FIDT 6C0E9ED8, 009C (r1 SUPERM SMCI--MB 1072009 AMI 10013) (XEN) ACPI: SPMI 6C0E9F78, 0041 (r5 SUPERM SMCI--MB0 AMI.0) (XEN) ACPI: UEFI 6C0E9FC0, 0048 (r1 SUPERM SMCI--MB 1072009 113) (XEN) ACPI: UEFI 6C0EA008, 005C (r1 INTEL RstUefiV0 0) (XEN) ACPI: MCFG 6C0EA068, 003C (r1 SUPERM SMCI--MB 1072009 MSFT 97) (XEN) ACPI: HPET 6C0EA0A8, 0038 (r1 SUPERM SMCI--MB1 INTL 20091013) (XEN) ACPI: APIC 6C0EA0E0, 071E (r3 SUPERM SMCI--MB0 INTL 20091013) (XEN) ACPI: MIGT 6C0EA800, 0040 (r1 SUPERM SMCI--MB0 INTL 20091013) (XEN) ACPI: MSCT 6C0EA840, 004E (r1 SUPERM SMCI--MB1 INTL 20091013) (XEN) ACPI: PCAT 6C0EA890, 0068 (r2 SUPERM SMCI--MB2 INTL 20091013) (XEN) ACPI: PCCT 6C0EA8F8, 006E (r1 SUPERM SMCI--MB2 INTL 20091013) (XEN) ACPI: RASF 6C0EA968, 0030 (r1 SUPERM SMCI--MB1 INTL 20091013) (XEN) ACPI: SLIT 6C0EA998, 002D (r1 SUPERM SMCI--MB1 INTL 20091013) (XEN) ACPI: SRAT 6C0EA9C8
Re: xen 4.11.4 incorrect (~3x) cpu frequency reported
Hi Jan, On Fri, Nov 05, 2021 at 01:50:04PM +0100, Jan Beulich wrote: > On 26.07.2021 14:33, James Dingwall wrote: > > Hi Jan, > > > > Thank you for taking the time to reply. > > > > On Wed, Jul 21, 2021 at 12:59:11PM +0200, Jan Beulich wrote: > >> On 21.07.2021 11:29, James Dingwall wrote: > >>> We have a system which intermittently starts up and reports an incorrect > >>> cpu frequency: > >>> > >>> # grep -i mhz /var/log/kern.log > >>> Jul 14 17:47:47 dom0 kernel: [0.000475] tsc: Detected 2194.846 MHz > >>> processor > >>> Jul 14 22:03:37 dom0 kernel: [0.000476] tsc: Detected 2194.878 MHz > >>> processor > >>> Jul 14 23:05:13 dom0 kernel: [0.000478] tsc: Detected 2194.848 MHz > >>> processor > >>> Jul 14 23:20:47 dom0 kernel: [0.000474] tsc: Detected 2194.856 MHz > >>> processor > >>> Jul 14 23:57:39 dom0 kernel: [0.000476] tsc: Detected 2194.906 MHz > >>> processor > >>> Jul 15 01:04:09 dom0 kernel: [0.000476] tsc: Detected 2194.858 MHz > >>> processor > >>> Jul 15 01:27:15 dom0 kernel: [0.000482] tsc: Detected 2194.870 MHz > >>> processor > >>> Jul 15 02:00:13 dom0 kernel: [0.000481] tsc: Detected 2194.924 MHz > >>> processor > >>> Jul 15 03:09:23 dom0 kernel: [0.000475] tsc: Detected 2194.892 MHz > >>> processor > >>> Jul 15 03:32:50 dom0 kernel: [0.000482] tsc: Detected 2194.856 MHz > >>> processor > >>> Jul 15 04:05:27 dom0 kernel: [0.000480] tsc: Detected 2194.886 MHz > >>> processor > >>> Jul 15 05:00:38 dom0 kernel: [0.000473] tsc: Detected 2194.914 MHz > >>> processor > >>> Jul 15 05:59:33 dom0 kernel: [0.000480] tsc: Detected 2194.924 MHz > >>> processor > >>> Jul 15 06:22:31 dom0 kernel: [0.000474] tsc: Detected 2194.910 MHz > >>> processor > >>> Jul 15 17:52:57 dom0 kernel: [0.000474] tsc: Detected 2194.854 MHz > >>> processor > >>> Jul 15 18:51:36 dom0 kernel: [0.000474] tsc: Detected 2194.900 MHz > >>> processor > >>> Jul 15 19:07:26 dom0 kernel: [0.000478] tsc: Detected 2194.902 MHz > >>> processor > >>> Jul 15 19:43:56 dom0 kernel: [0.000154] tsc: Detected 6895.384 MHz > >>> processor > >> > >> Well, this is output from Dom0. What we'd need to see (in addition) > >> is the corresponding hypervisor log at maximum verbosity (loglvl=all). > > > > This was just to illustrate that the dom0 usually reports the correct > > speed. I'll update the xen boot options with loglvl=all and try to collect > > the boot messages for each case. > > > >> > >>> The xen 's' debug output: > >>> > >>> (XEN) TSC marked as reliable, warp = 0 (count=4) > >>> (XEN) dom1: mode=0,ofs=0x1d1ac8bf8e,khz=6895385,inc=1 > >>> (XEN) dom2: mode=0,ofs=0x28bc24c746,khz=6895385,inc=1 > >>> (XEN) dom3: mode=0,ofs=0x345696b138,khz=6895385,inc=1 > >>> (XEN) dom4: mode=0,ofs=0x34f2635f31,khz=6895385,inc=1 > >>> (XEN) dom5: mode=0,ofs=0x3581618a7d,khz=6895385,inc=1 > >>> (XEN) dom6: mode=0,ofs=0x3627ca68b2,khz=6895385,inc=1 > >>> (XEN) dom7: mode=0,ofs=0x36dd491860,khz=6895385,inc=1 > >>> (XEN) dom8: mode=0,ofs=0x377a57ea1a,khz=6895385,inc=1 > >>> (XEN) dom9: mode=0,ofs=0x381eb175ce,khz=6895385,inc=1 > >>> (XEN) dom10: mode=0,ofs=0x38cab2e260,khz=6895385,inc=1 > >>> (XEN) dom11: mode=0,ofs=0x397fc47387,khz=6895385,inc=1 > >>> (XEN) dom12: mode=0,ofs=0x3a552762a0,khz=6895385,inc=1 > >>> > >>> A processor from /proc/cpuinfo in dom0: > >>> > >>> processor : 3 > >>> vendor_id : GenuineIntel > >>> cpu family : 6 > >>> model : 85 > >>> model name : Intel(R) Xeon(R) D-2123IT CPU @ 2.20GHz > >>> stepping: 4 > >>> microcode : 0x265 > >>> cpu MHz : 6895.384 > >>> [...] > >>> > >>> Xen has been built at 310ab79875cb705cc2c7daddff412b5a4899f8c9 from the > >>> stable-4.12 branch. > >> > >> While this contradicts the title, both 4.11 and 4.12 are out of general > >> support. Hence it would be more helpful if you could obtain respective > >> logs with a more modern version of Xen - ideally from the master branch, > >> or else the most recent stable one (4.15). Provided of course the issue > >> continues to exist there in the first place. > > > > That was my error, I meant the stable-4.11 branch. We have a development > > environment based around 4.14.2 which I can test. > > I'm sorry to ask, but have you got around to actually doing that? Or > else is resolving this no longer of interest? We have recorded a couple of other occurences on 4.11 but it is happening so infrequently (probably once every few hundred boots) that further investigation is low on a long list of tasks. We are also moving to 4.14.3 and so far have no occurences with that version. Thanks, James
domain never exits after using 'xl save'
Hi, This is an issue that was observed on 4.11.3 but I have reproduced on 4.14.3. After using the `xl save` command the associated `xl create` process exits which later results in the domain not being cleaned up when the guest is shutdown. e.g.: # xl list -v | grep d13cc54d-dcb8-4337-9dfe-3b04f671b16 guest01 15 2048 3 -b1555.9 d13cc54d-dcb8-4337-9dfe-3b04f671b16a- system_u:system_r:migrate_domU_t # ps -ef | grep d13cc54d-dcb8-4337-9dfe-3b04f671b16 root 18694 1 0 Sep22 ?00:00:00 /usr/sbin/xl create -p /etc/xen/config/d13cc54d-dcb8-4337-9dfe-3b04f671b16a.cfg # xl save -p guest01 /vmsave/guest01.mem Saving to /vmsave/guest01.mem new xl format (info 0x3/0x0/2900) xc: info: Saving domain 15, type x86 HVM xc: Frames: 1044480/1044480 100% xc: End of stream: 0/00% # xl list -v | grep d13cc54d-dcb8-4337-9dfe-3b04f671b16 guest01 15 2048 3 --p---1558.3 d13cc54d-dcb8-4337-9dfe-3b04f671b16a- system_u:system_r:migrate_domU_t # ps -ef | grep d13cc54d-dcb8-4337-9dfe-3b04f671b16 - no matches - # xl unpause guest01 # xl list -v | grep d13cc54d-dcb8-4337-9dfe-3b04f671b16 guest01 15 2048 3 -b1559.0 d13cc54d-dcb8-4337-9dfe-3b04f671b16a- system_u:system_r:migrate_domU_t # xl shutdown guest01 # xl list -v | grep d13cc54d-dcb8-4337-9dfe-3b04f671b16 guest01 15 2048 3 ---s--1575.8 d13cc54d-dcb8-4337-9dfe-3b04f671b16a0 system_u:system_r:migrate_domU_t What we would expect is that the `xl create` process remains running so that when the domain is later shutdown then it gets cleaned up without having to manually `xl destroy`. tools/xl/xl_vmcontrol.c handle_domain_death() has (0 == DOMAIN_RESTART_NONE in xl.h) case LIBXL_SHUTDOWN_REASON_SUSPEND: LOG("Domain has suspended."); return 0; The while(1) loop of create_domain() has a switch statement which handles this return value with: case DOMAIN_RESTART_NONE: LOG("Done. Exiting now"); libxl_event_free(ctx, event); ret = 0; goto out; Is this the expected behaviour? Would an approach to getting the behaviour we want be to change the return value from handle_domain_death() to one which doesn't trigger the exit? Thanks, James
Re: xen 4.11.4 incorrect (~3x) cpu frequency reported
Hi Jan, Thank you for taking the time to reply. On Wed, Jul 21, 2021 at 12:59:11PM +0200, Jan Beulich wrote: > On 21.07.2021 11:29, James Dingwall wrote: > > We have a system which intermittently starts up and reports an incorrect > > cpu frequency: > > > > # grep -i mhz /var/log/kern.log > > Jul 14 17:47:47 dom0 kernel: [0.000475] tsc: Detected 2194.846 MHz > > processor > > Jul 14 22:03:37 dom0 kernel: [0.000476] tsc: Detected 2194.878 MHz > > processor > > Jul 14 23:05:13 dom0 kernel: [0.000478] tsc: Detected 2194.848 MHz > > processor > > Jul 14 23:20:47 dom0 kernel: [0.000474] tsc: Detected 2194.856 MHz > > processor > > Jul 14 23:57:39 dom0 kernel: [0.000476] tsc: Detected 2194.906 MHz > > processor > > Jul 15 01:04:09 dom0 kernel: [0.000476] tsc: Detected 2194.858 MHz > > processor > > Jul 15 01:27:15 dom0 kernel: [0.000482] tsc: Detected 2194.870 MHz > > processor > > Jul 15 02:00:13 dom0 kernel: [0.000481] tsc: Detected 2194.924 MHz > > processor > > Jul 15 03:09:23 dom0 kernel: [0.000475] tsc: Detected 2194.892 MHz > > processor > > Jul 15 03:32:50 dom0 kernel: [0.000482] tsc: Detected 2194.856 MHz > > processor > > Jul 15 04:05:27 dom0 kernel: [0.000480] tsc: Detected 2194.886 MHz > > processor > > Jul 15 05:00:38 dom0 kernel: [0.000473] tsc: Detected 2194.914 MHz > > processor > > Jul 15 05:59:33 dom0 kernel: [0.000480] tsc: Detected 2194.924 MHz > > processor > > Jul 15 06:22:31 dom0 kernel: [0.000474] tsc: Detected 2194.910 MHz > > processor > > Jul 15 17:52:57 dom0 kernel: [0.000474] tsc: Detected 2194.854 MHz > > processor > > Jul 15 18:51:36 dom0 kernel: [0.000474] tsc: Detected 2194.900 MHz > > processor > > Jul 15 19:07:26 dom0 kernel: [0.000478] tsc: Detected 2194.902 MHz > > processor > > Jul 15 19:43:56 dom0 kernel: [0.000154] tsc: Detected 6895.384 MHz > > processor > > Well, this is output from Dom0. What we'd need to see (in addition) > is the corresponding hypervisor log at maximum verbosity (loglvl=all). This was just to illustrate that the dom0 usually reports the correct speed. I'll update the xen boot options with loglvl=all and try to collect the boot messages for each case. > > > The xen 's' debug output: > > > > (XEN) TSC marked as reliable, warp = 0 (count=4) > > (XEN) dom1: mode=0,ofs=0x1d1ac8bf8e,khz=6895385,inc=1 > > (XEN) dom2: mode=0,ofs=0x28bc24c746,khz=6895385,inc=1 > > (XEN) dom3: mode=0,ofs=0x345696b138,khz=6895385,inc=1 > > (XEN) dom4: mode=0,ofs=0x34f2635f31,khz=6895385,inc=1 > > (XEN) dom5: mode=0,ofs=0x3581618a7d,khz=6895385,inc=1 > > (XEN) dom6: mode=0,ofs=0x3627ca68b2,khz=6895385,inc=1 > > (XEN) dom7: mode=0,ofs=0x36dd491860,khz=6895385,inc=1 > > (XEN) dom8: mode=0,ofs=0x377a57ea1a,khz=6895385,inc=1 > > (XEN) dom9: mode=0,ofs=0x381eb175ce,khz=6895385,inc=1 > > (XEN) dom10: mode=0,ofs=0x38cab2e260,khz=6895385,inc=1 > > (XEN) dom11: mode=0,ofs=0x397fc47387,khz=6895385,inc=1 > > (XEN) dom12: mode=0,ofs=0x3a552762a0,khz=6895385,inc=1 > > > > A processor from /proc/cpuinfo in dom0: > > > > processor : 3 > > vendor_id : GenuineIntel > > cpu family : 6 > > model : 85 > > model name : Intel(R) Xeon(R) D-2123IT CPU @ 2.20GHz > > stepping: 4 > > microcode : 0x265 > > cpu MHz : 6895.384 > > [...] > > > > Xen has been built at 310ab79875cb705cc2c7daddff412b5a4899f8c9 from the > > stable-4.12 branch. > > While this contradicts the title, both 4.11 and 4.12 are out of general > support. Hence it would be more helpful if you could obtain respective > logs with a more modern version of Xen - ideally from the master branch, > or else the most recent stable one (4.15). Provided of course the issue > continues to exist there in the first place. That was my error, I meant the stable-4.11 branch. We have a development environment based around 4.14.2 which I can test. My assumption had been that xen reads or calculates this frequency and provides it to the dom0 since it is reported in the hypervisor log before dom0 is started. Regards, James
xen 4.11.4 incorrect (~3x) cpu frequency reported
Hi, We have a system which intermittently starts up and reports an incorrect cpu frequency: # grep -i mhz /var/log/kern.log Jul 14 17:47:47 dom0 kernel: [0.000475] tsc: Detected 2194.846 MHz processor Jul 14 22:03:37 dom0 kernel: [0.000476] tsc: Detected 2194.878 MHz processor Jul 14 23:05:13 dom0 kernel: [0.000478] tsc: Detected 2194.848 MHz processor Jul 14 23:20:47 dom0 kernel: [0.000474] tsc: Detected 2194.856 MHz processor Jul 14 23:57:39 dom0 kernel: [0.000476] tsc: Detected 2194.906 MHz processor Jul 15 01:04:09 dom0 kernel: [0.000476] tsc: Detected 2194.858 MHz processor Jul 15 01:27:15 dom0 kernel: [0.000482] tsc: Detected 2194.870 MHz processor Jul 15 02:00:13 dom0 kernel: [0.000481] tsc: Detected 2194.924 MHz processor Jul 15 03:09:23 dom0 kernel: [0.000475] tsc: Detected 2194.892 MHz processor Jul 15 03:32:50 dom0 kernel: [0.000482] tsc: Detected 2194.856 MHz processor Jul 15 04:05:27 dom0 kernel: [0.000480] tsc: Detected 2194.886 MHz processor Jul 15 05:00:38 dom0 kernel: [0.000473] tsc: Detected 2194.914 MHz processor Jul 15 05:59:33 dom0 kernel: [0.000480] tsc: Detected 2194.924 MHz processor Jul 15 06:22:31 dom0 kernel: [0.000474] tsc: Detected 2194.910 MHz processor Jul 15 17:52:57 dom0 kernel: [0.000474] tsc: Detected 2194.854 MHz processor Jul 15 18:51:36 dom0 kernel: [0.000474] tsc: Detected 2194.900 MHz processor Jul 15 19:07:26 dom0 kernel: [0.000478] tsc: Detected 2194.902 MHz processor Jul 15 19:43:56 dom0 kernel: [0.000154] tsc: Detected 6895.384 MHz processor The xen 's' debug output: (XEN) TSC marked as reliable, warp = 0 (count=4) (XEN) dom1: mode=0,ofs=0x1d1ac8bf8e,khz=6895385,inc=1 (XEN) dom2: mode=0,ofs=0x28bc24c746,khz=6895385,inc=1 (XEN) dom3: mode=0,ofs=0x345696b138,khz=6895385,inc=1 (XEN) dom4: mode=0,ofs=0x34f2635f31,khz=6895385,inc=1 (XEN) dom5: mode=0,ofs=0x3581618a7d,khz=6895385,inc=1 (XEN) dom6: mode=0,ofs=0x3627ca68b2,khz=6895385,inc=1 (XEN) dom7: mode=0,ofs=0x36dd491860,khz=6895385,inc=1 (XEN) dom8: mode=0,ofs=0x377a57ea1a,khz=6895385,inc=1 (XEN) dom9: mode=0,ofs=0x381eb175ce,khz=6895385,inc=1 (XEN) dom10: mode=0,ofs=0x38cab2e260,khz=6895385,inc=1 (XEN) dom11: mode=0,ofs=0x397fc47387,khz=6895385,inc=1 (XEN) dom12: mode=0,ofs=0x3a552762a0,khz=6895385,inc=1 A processor from /proc/cpuinfo in dom0: processor : 3 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) D-2123IT CPU @ 2.20GHz stepping: 4 microcode : 0x265 cpu MHz : 6895.384 cache size : 8448 KB physical id : 0 siblings: 4 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc cpuid pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch intel_ppin ssbd ibrs ibpb stibp fsgsbase bmi1 hle avx2 bmi2 erms rtm avx512f avx512dq rdseed adx clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 md_clear bugs: null_seg cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit bogomips: 13790.76 clflush size: 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: Xen has been built at 310ab79875cb705cc2c7daddff412b5a4899f8c9 from the stable-4.12 branch. The system is a supermicro server, model X11SDV-4C-TP8F. I'm not sure if the incorrect value has been read from hardware or Xen has miscalculated the frequency so any pointers on things to examine would be welcome. Thanks, James
Re: [PATCH for-4.12 and older] x86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL} (again)
Hi Jan, On Thu, Feb 04, 2021 at 10:36:06AM +0100, Jan Beulich wrote: > X86_VENDOR_* aren't bit masks in the older trees. > > Reported-by: James Dingwall > Signed-off-by: Jan Beulich > > --- a/xen/arch/x86/msr.c > +++ b/xen/arch/x86/msr.c > @@ -226,7 +226,8 @@ int guest_rdmsr(const struct vcpu *v, ui > */ > case MSR_IA32_PERF_STATUS: > case MSR_IA32_PERF_CTL: > -if ( !(cp->x86_vendor & (X86_VENDOR_INTEL | X86_VENDOR_CENTAUR)) ) > +if ( cp->x86_vendor != X86_VENDOR_INTEL && > + cp->x86_vendor != X86_VENDOR_CENTAUR ) > goto gp_fault; > > *val = 0; Thanks for this patch, I've applied it and the Windows guest no longer crashes. Regards, James
Re: VIRIDIAN CRASH: 3b c0000096 75b12c5 9e7f1580 0
Hi Jan, Thank you for your reply. On Wed, Feb 03, 2021 at 03:55:07PM +0100, Jan Beulich wrote: > On 01.02.2021 16:26, James Dingwall wrote: > > I am building the xen 4.11 branch at > > 310ab79875cb705cc2c7daddff412b5a4899f8c9 which includes commit > > 3b5de119f0399cbe745502cb6ebd5e6633cc139c "86/msr: fix handling of > > MSR_IA32_PERF_{STATUS/CTL}". I think this should address this error > > recorded in xen's dmesg: > > > > (XEN) d11v0 VIRIDIAN CRASH: 3b c096 75b12c5 9e7f1580 0 > > It seems to me that you imply some information here which might > better be spelled out. As it stands I do not see the immediate > connection between the cited commit and the crash. C096 is > STATUS_PRIVILEGED_INSTRUCTION, which to me ought to be impossible > for code running in ring 0. Of course I may simply not know enough > about modern Windows' internals to understand the connection. Searching for "VIRIDIAN CRASH: 3b" led me to this thread and then to the commit based on the commit log message. https://patchwork.kernel.org/project/xen-devel/patch/20201007102032.98565-1-roger@citrix.com/ I have naively assumed that the RCX register indicated MSR_IA32_PERF_CTL based on: #define MSR_IA32_PERF_CTL 0x0199 I've added this patch: diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c index 99c848ff41..7a764907d5 100644 --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -232,12 +232,16 @@ int guest_rdmsr(const struct vcpu *v, uint32_t msr, uint64_t *val) */ case MSR_IA32_PERF_STATUS: case MSR_IA32_PERF_CTL: -if ( !(cp->x86_vendor & (X86_VENDOR_INTEL | X86_VENDOR_CENTAUR)) ) +if ( !(cp->x86_vendor & (X86_VENDOR_INTEL | X86_VENDOR_CENTAUR)) ) { +printk(KERN_DEBUG "JKD: MSR %#x FAULT1: %#x & %#x\n", msr, cp->x86_vendor, (X86_VENDOR_INTEL | X86_VENDOR_CENTAUR)); + goto gp_fault; +} *val = 0; if ( likely(!is_cpufreq_controller(d)) || rdmsr_safe(msr, *val) == 0 ) break; +printk(KERN_DEBUG "JKD: MSR FAULT2\n"); goto gp_fault; /* and now in the hypervisor log when the domain crashes: (XEN) JKD: MSR 0x199 FAULT1: 0 & 0x2 (XEN) d11v0 VIRIDIAN CRASH: 3b c096 1146d2c5 6346d580 0 (XEN) avc: denied { reset } for domid=11 scontext=system_u:system_r:domU_t tcontext=system_u:system_r:domU_t_self tclass=event I'm not sure what is expected in cp->x86_vendor but this is running on an Intel CPU so I would have thought 0x1 based on #define X86_VENDOR_INTEL (1 << 0) I have also booted with flask=disabled to to eliminate the reported avc denial as the cause. > > > I have removed `viridian = [..]` from the xen config nut still get this > > reliably when launching PassMark Performance Test and it is collecting > > CPU information. > > > > This is recorded in the domain qemu-dm log: > > > > 21244@1612191983.279616:xen_platform_log xen platform: XEN|BUGCHECK: > > > 21244@1612191983.279819:xen_platform_log xen platform: XEN|BUGCHECK: > > SYSTEM_SERVICE_EXCEPTION: C096 F800A43C72C5 > > D0014343D580 > > 21244@1612191983.279959:xen_platform_log xen platform: XEN|BUGCHECK: > > EXCEPTION (F800A43C72C5): > > 21244@1612191983.280075:xen_platform_log xen platform: XEN|BUGCHECK: - Code > > = C148320F > > 21244@1612191983.280205:xen_platform_log xen platform: XEN|BUGCHECK: - > > Flags = 0B4820E2 > > 21244@1612191983.280346:xen_platform_log xen platform: XEN|BUGCHECK: - > > Address = A824948D4800 > > 21244@1612191983.280504:xen_platform_log xen platform: XEN|BUGCHECK: - > > Parameter[0] = 8B0769850F07 > > 21244@1612191983.280633:xen_platform_log xen platform: XEN|BUGCHECK: - > > Parameter[1] = 46B70F4024448906 > > 21244@1612191983.280754:xen_platform_log xen platform: XEN|BUGCHECK: - > > Parameter[2] = 0F2444896604 > > 21244@1612191983.280876:xen_platform_log xen platform: XEN|BUGCHECK: - > > Parameter[3] = E983C88B410646B6 > > 21244@1612191983.281012:xen_platform_log xen platform: XEN|BUGCHECK: - > > Parameter[4] = 0D7401E9831E7401 > > 21244@1612191983.281172:xen_platform_log xen platform: XEN|BUGCHECK: - > > Parameter[5] = 54B70F217502F983 > > 21244@1612191983.281304:xen_platform_log xen platform: XEN|BUGCHECK: - > > Parameter[6] = 54B70F15EBED4024 > > 21244@1612191983.281426:xen_platform_log xen platform: XEN|BUGCHECK: - > > Parameter[7] = EBC0B70FED664024 > > 21244@1612191983.281547:xen_platform_log xen platform: XEN|BUGCHECK: - > > Parameter[8] = 0FEC402454B70F09 > > 21244@1612191983.281668:xen_platform_log xen platform: XEN
VIRIDIAN CRASH: 3b c0000096 75b12c5 9e7f1580 0
Hi, I am building the xen 4.11 branch at 310ab79875cb705cc2c7daddff412b5a4899f8c9 which includes commit 3b5de119f0399cbe745502cb6ebd5e6633cc139c "86/msr: fix handling of MSR_IA32_PERF_{STATUS/CTL}". I think this should address this error recorded in xen's dmesg: (XEN) d11v0 VIRIDIAN CRASH: 3b c096 75b12c5 9e7f1580 0 I have removed `viridian = [..]` from the xen config nut still get this reliably when launching PassMark Performance Test and it is collecting CPU information. This is recorded in the domain qemu-dm log: 21244@1612191983.279616:xen_platform_log xen platform: XEN|BUGCHECK: > 21244@1612191983.279819:xen_platform_log xen platform: XEN|BUGCHECK: SYSTEM_SERVICE_EXCEPTION: C096 F800A43C72C5 D0014343D580 21244@1612191983.279959:xen_platform_log xen platform: XEN|BUGCHECK: EXCEPTION (F800A43C72C5): 21244@1612191983.280075:xen_platform_log xen platform: XEN|BUGCHECK: - Code = C148320F 21244@1612191983.280205:xen_platform_log xen platform: XEN|BUGCHECK: - Flags = 0B4820E2 21244@1612191983.280346:xen_platform_log xen platform: XEN|BUGCHECK: - Address = A824948D4800 21244@1612191983.280504:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[0] = 8B0769850F07 21244@1612191983.280633:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[1] = 46B70F4024448906 21244@1612191983.280754:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[2] = 0F2444896604 21244@1612191983.280876:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[3] = E983C88B410646B6 21244@1612191983.281012:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[4] = 0D7401E9831E7401 21244@1612191983.281172:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[5] = 54B70F217502F983 21244@1612191983.281304:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[6] = 54B70F15EBED4024 21244@1612191983.281426:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[7] = EBC0B70FED664024 21244@1612191983.281547:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[8] = 0FEC402454B70F09 21244@1612191983.281668:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[9] = 448B42244489C0B6 21244@1612191983.281809:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[10] = 2444B70F06894024 21244@1612191983.281932:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[11] = 4688440446896644 21244@1612191983.282052:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[12] = 073846C74906 21244@1612191983.282185:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[13] = F883070AE900 21244@1612191983.282340:xen_platform_log xen platform: XEN|BUGCHECK: - Parameter[14] = 8B06F9850F07 21244@1612191983.282480:xen_platform_log xen platform: XEN|BUGCHECK: EXCEPTION (A824848948C2): 21244@1612191983.282617:xen_platform_log xen platform: XEN|BUGCHECK: CONTEXT (D0014343D580): 21244@1612191983.282717:xen_platform_log xen platform: XEN|BUGCHECK: - GS = 002B 21244@1612191983.282816:xen_platform_log xen platform: XEN|BUGCHECK: - FS = 0053 21244@1612191983.282914:xen_platform_log xen platform: XEN|BUGCHECK: - ES = 002B 21244@1612191983.283011:xen_platform_log xen platform: XEN|BUGCHECK: - DS = 002B 21244@1612191983.283127:xen_platform_log xen platform: XEN|BUGCHECK: - SS = 0018 21244@1612191983.283226:xen_platform_log xen platform: XEN|BUGCHECK: - CS = 0010 21244@1612191983.283332:xen_platform_log xen platform: XEN|BUGCHECK: - EFLAGS = 0202 21244@1612191983.283444:xen_platform_log xen platform: XEN|BUGCHECK: - RDI = F64D5C20 21244@1612191983.283555:xen_platform_log xen platform: XEN|BUGCHECK: - RSI = F6367280 21244@1612191983.283666:xen_platform_log xen platform: XEN|BUGCHECK: - RBX = 8011E060 21244@1612191983.283810:xen_platform_log xen platform: XEN|BUGCHECK: - RDX = F64D5C20 21244@1612191983.283972:xen_platform_log xen platform: XEN|BUGCHECK: - RCX = 0199 21244@1612191983.284350:xen_platform_log xen platform: XEN|BUGCHECK: - RAX = 0004 21244@1612191983.284523:xen_platform_log xen platform: XEN|BUGCHECK: - RBP = 4343E891 21244@1612191983.284658:xen_platform_log xen platform: XEN|BUGCHECK: - RIP = A43C72C5 21244@1612191983.284842:xen_platform_log xen platform: XEN|BUGCHECK: - RSP = 4343DFA0 21244@1612191983.284959:xen_platform_log xen platform: XEN|BUGCHECK: - R8 = 0008 21244@1612191983.285073:xen_platform_log xen platform: XEN|BUGCHECK: - R9 = 000E 21244@1612191983.285188:xen_platform_log xen platform: XEN|BUGCHECK: - R10 = 0002 21244@1612191983.285304:xen_platform_log xen platform: XEN|BUGCHECK: - R11 = 4343E808 21244@1612191983.285420:xen_platform_log xen platform: XEN|BUGCHECK: - R12 = 21244@1612191983.285564:xen_platform_log xen platform: XEN|BUGCHECK: - R13 = F7964E50 21244@1612191983.285680:xen_platform_log xen platform: XEN|BUGCHECK: - R14 =
Re: [Xen-devel] [PATCH] xen/xenbus: fix self-deadlock after killing user process
On Tue, Oct 01, 2019 at 05:03:55PM +0200, Juergen Gross wrote: > In case a user process using xenbus has open transactions and is killed > e.g. via ctrl-C the following cleanup of the allocated resources might > result in a deadlock due to trying to end a transaction in the xenbus > worker thread: > > [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds. > [ 2551.492215] Tainted: P OE 5.0.0-29-generic #5 > [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 2551.528585] xenbus D037 2 0x8080 > [ 2551.528590] Call Trace: > [ 2551.528603] __schedule+0x2c0/0x870 > [ 2551.528606] ? _cond_resched+0x19/0x40 > [ 2551.528632] schedule+0x2c/0x70 > [ 2551.528637] xs_talkv+0x1ec/0x2b0 > [ 2551.528642] ? wait_woken+0x80/0x80 > [ 2551.528645] xs_single+0x53/0x80 > [ 2551.528648] xenbus_transaction_end+0x3b/0x70 > [ 2551.528651] xenbus_file_free+0x5a/0x160 > [ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220 > [ 2551.528657] xenbus_thread+0x7de/0x880 > [ 2551.528660] ? wait_woken+0x80/0x80 > [ 2551.528665] kthread+0x121/0x140 > [ 2551.528667] ? xb_read+0x1d0/0x1d0 > [ 2551.528670] ? kthread_park+0x90/0x90 > [ 2551.528673] ret_from_fork+0x35/0x40 > > Fix this by doing the cleanup via a workqueue instead. > > Reported-by: James Dingwall > Fixes: fd8aa9095a95c ("xen: optimize xenbus driver for multiple concurrent > xenstore accesses") > Cc: # 4.11 > Signed-off-by: Juergen Gross > --- > drivers/xen/xenbus/xenbus_dev_frontend.c | 20 ++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/drivers/xen/xenbus/xenbus_dev_frontend.c > b/drivers/xen/xenbus/xenbus_dev_frontend.c > index 08adc590f631..597af455a522 100644 > --- a/drivers/xen/xenbus/xenbus_dev_frontend.c > +++ b/drivers/xen/xenbus/xenbus_dev_frontend.c > @@ -55,6 +55,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -116,6 +117,8 @@ struct xenbus_file_priv { > wait_queue_head_t read_waitq; > > struct kref kref; > + > + struct work_struct wq; > }; > > /* Read out any raw xenbus messages queued up. */ > @@ -300,14 +303,14 @@ static void watch_fired(struct xenbus_watch *watch, > mutex_unlock(>dev_data->reply_mutex); > } > > -static void xenbus_file_free(struct kref *kref) > +static void xenbus_worker(struct work_struct *wq) > { > struct xenbus_file_priv *u; > struct xenbus_transaction_holder *trans, *tmp; > struct watch_adapter *watch, *tmp_watch; > struct read_buffer *rb, *tmp_rb; > > - u = container_of(kref, struct xenbus_file_priv, kref); > + u = container_of(wq, struct xenbus_file_priv, wq); > > /* >* No need for locking here because there are no other users, > @@ -333,6 +336,18 @@ static void xenbus_file_free(struct kref *kref) > kfree(u); > } > > +static void xenbus_file_free(struct kref *kref) > +{ > + struct xenbus_file_priv *u; > + > + /* > + * We might be called in xenbus_thread(). > + * Use workqueue to avoid deadlock. > + */ > + u = container_of(kref, struct xenbus_file_priv, kref); > + schedule_work(>wq); > +} > + > static struct xenbus_transaction_holder *xenbus_get_transaction( > struct xenbus_file_priv *u, uint32_t tx_id) > { > @@ -650,6 +665,7 @@ static int xenbus_file_open(struct inode *inode, struct > file *filp) > INIT_LIST_HEAD(>watches); > INIT_LIST_HEAD(>read_buffers); > init_waitqueue_head(>read_waitq); > + INIT_WORK(>wq, xenbus_worker); > > mutex_init(>reply_mutex); > mutex_init(>msgbuffer_mutex); > -- > 2.16.4 > We have been having some crashes with an Ubuntu 5.0.0-31 kernel with this patch and thanks to the pstore fix "x86/xen: Return from panic notifier" we caught the oops below. It seems to be in the same area of code as this patch but I'm unsure if it is directly related to this change or a secondary issue. From the logs collected I can see this happened while there were several parallel `xl create` process running but so I have not been able to reproduce this in a test script but perhaps the trace will give some clues. Thanks, James <4>[53626.726580] [ cut here ] <2>[53626.726583] kernel BUG at /build/slowfs/ubuntu-bionic/mm/slub.c:305! <4>[53626.739554] invalid opcode: [#1] SMP NOPTI <4>[53626.751119] CPU: 0 PID: 38 Comm: xenwatch Tainted: P OE 5.0.0-31-generic #33~18.04.1z1 <4>[53626.763015] Hardw
Re: [Xen-devel] [PATCH] xen/xenbus: fix self-deadlock after killing user process
On Tue, Oct 01, 2019 at 01:37:24PM -0400, Boris Ostrovsky wrote: > On 10/1/19 11:03 AM, Juergen Gross wrote: > > In case a user process using xenbus has open transactions and is killed > > e.g. via ctrl-C the following cleanup of the allocated resources might > > result in a deadlock due to trying to end a transaction in the xenbus > > worker thread: > > > > [ 2551.474706] INFO: task xenbus:37 blocked for more than 120 seconds. > > [ 2551.492215] Tainted: P OE 5.0.0-29-generic #5 > > [ 2551.510263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > > this message. > > [ 2551.528585] xenbus D037 2 0x8080 > > [ 2551.528590] Call Trace: > > [ 2551.528603] __schedule+0x2c0/0x870 > > [ 2551.528606] ? _cond_resched+0x19/0x40 > > [ 2551.528632] schedule+0x2c/0x70 > > [ 2551.528637] xs_talkv+0x1ec/0x2b0 > > [ 2551.528642] ? wait_woken+0x80/0x80 > > [ 2551.528645] xs_single+0x53/0x80 > > [ 2551.528648] xenbus_transaction_end+0x3b/0x70 > > [ 2551.528651] xenbus_file_free+0x5a/0x160 > > [ 2551.528654] xenbus_dev_queue_reply+0xc4/0x220 > > [ 2551.528657] xenbus_thread+0x7de/0x880 > > [ 2551.528660] ? wait_woken+0x80/0x80 > > [ 2551.528665] kthread+0x121/0x140 > > [ 2551.528667] ? xb_read+0x1d0/0x1d0 > > [ 2551.528670] ? kthread_park+0x90/0x90 > > [ 2551.528673] ret_from_fork+0x35/0x40 > > > > Fix this by doing the cleanup via a workqueue instead. > > > > Reported-by: James Dingwall > > Fixes: fd8aa9095a95c ("xen: optimize xenbus driver for multiple concurrent > > xenstore accesses") > > Cc: # 4.11 > > Signed-off-by: Juergen Gross > > Reviewed-by: Boris Ostrovsky > Tested-by: James Dingwall This patch does resolve the observed issue although for my (extreme and not representative of our normal workload) test case the worker still gets blocked for some time if the xenstore-rm is interrupted and no concurrent xenstore commands can run. I assume that the worker completes the rm and then does a rollback in the background rather than being interrupted early as a result of the userspace program being terminated. Thanks, James ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] failed to launch qemu when running de-privileged (xen 4.8)
Hi, I had a bit of a head scratcher while writing a patch for 4.8 which allows the qemu-dm process for a stubdom to be executed as an unprivileged user. After a liberal sprinkling of log messages I found that my problem was related to the check of the return code from getpwnam_r. In 4.11 the relevant code looks like this: ret = NAME##_r(spec, resultbuf, buf, buf_size, ); \ if (ret == ERANGE) {\ buf_size += 128;\ continue; \ } \ if (ret != 0) \ return ERROR_FAIL; \ if (resultp != NULL) { \ if (out) *out = resultp;\ return 1; \ } \ return 0; \ if (ret != 0) \ return ERROR_FAIL; \ However checking the man page for getpwnam_r (and getpwuid_r now for 4.11) it is not just 0 which can indicate an entry is not found: 0 or ENOENT or ESRCH or EBADF or EPERM or ... The given name or uid was not found. EINTR A signal was caught; see signal(7). EIOI/O error. EMFILE The per-process limit on the number of open file descriptors has been reached. ENFILE The system-wide limit on the total number of open files has been reached. ENOMEM Insufficient memory to allocate passwd structure. ERANGE Insufficient buffer space supplied. In my case the domid specific qemu user was not present (just using xen-qemuuser-shared) and I was getting an ENOENT from getpwnam_r. I'm sure there should be a more elegant way to write the check but it solved my case. +ret = getpwnam_r(username, , buf, buf_size, ); +if (ret == ERANGE) { +buf_size += 128; +continue; +} +if (ret == EINTR || ret == EIO || ret == EMFILE || ret == ENFILE || ret == ENOMEM) +return ERROR_FAIL; +if (user != NULL) +return 1; +return 0; Thanks, James ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel