On Mon, Jul 10, 2023 at 12:56:27PM +0200, Jan Beulich wrote:
> On 07.07.2023 11:53, Roger Pau Monne wrote:
> > The current logic to init the local APIC and the IO-APIC does init the
> > former first before doing any kind of sanitation on the IO-APIC pin
> > configuration.  It's already noted on enable_IO_APIC() that Xen
> > shouldn't trust the IO-APIC being empty at bootup.
> > 
> > At XenServer we have a system where the IO-APIC 0 is handed to Xen
> > with pin 0 unmasked, set to Fixed delivery mode, edge triggered and
> > with a vector of 0 (all fields of the RTE are zeroed).  Once the local
> > APIC is enabled periodic injections from such pin cause a storm of
> > errors:
> > 
> > APIC error on CPU0: 00(40), Received illegal vector
> > APIC error on CPU0: 40(40), Received illegal vector
> > APIC error on CPU0: 40(40), Received illegal vector
> > APIC error on CPU0: 40(40), Received illegal vector
> > APIC error on CPU0: 40(40), Received illegal vector
> > APIC error on CPU0: 40(40), Received illegal vector
> > 
> > That prevents Xen from booting.
> 
> And I expect no RTE is in ExtInt mode, so one might conclude that
> firmware meant to set up RTE 0 that way. Mainly as a remark, albeit
> of course there's then the question whether to change the RTE
> rather than masking it. What do ACPI tables say?

There's one relevant override:

[668h 1640   1]                Subtable Type : 02 [Interrupt Source Override]
[669h 1641   1]                       Length : 0A
[66Ah 1642   1]                          Bus : 00
[66Bh 1643   1]                       Source : 00
[66Ch 1644   4]                    Interrupt : 00000002
[670h 1648   2]        Flags (decoded below) : 0000
                                    Polarity : 0
                                Trigger Mode : 0

So IRQ 0 -> GSI 2, so it's likely pin 0 is the one the i8259 is
connected to.

> > Fix this by moving the masking of IO-APIC pins ahead of the enabling
> > of the local APIC.  Note that before doing such masking Xen attempts
> > to detect the pin where the legacy i8259 is connected, and that logic
> > relies on the pin being unmasked, hence the logic is also moved ahead
> > of enabling the local APIC.
> 
> A comma after "masking" might help readers here.
> 
> > --- a/xen/arch/x86/apic.c
> > +++ b/xen/arch/x86/apic.c
> > @@ -1476,6 +1476,10 @@ int __init APIC_init_uniprocessor (void)
> >          return -1;
> >      }
> >  
> > +    if ( smp_found_config && !skip_ioapic_setup && nr_ioapics )
> > +        /* Sanitize the IO-APIC pins before enabling the local APIC. */
> > +        sanitize_IO_APIC();
> 
> I'm a little puzzled by the smp_found_config part of the check here,
> but not in smp_prepare_cpus(). What's the reason for (a) the check
> and (b) the difference?

This just mimics what gates the call to setup_IO_APIC() in that same
function.  It makes no sense to call sanitize_IO_APIC() if
setup_IO_APIC() is not called, and I wasn't planning on changing the
logic that gates the call setup_IO_APIC() in this patch.

I did note the difference with smp_prepare_cpus(), and I think we
should look at unifying those paths, but didn't want to do it as part
of this fix.

> Isn't checking nr_ioapics sufficient in this
> regard? (b) probably could be addressed by moving the code to the
> beginning of verify_local_APIC(), immediately ahead of which you
> insert both call sites. (At which point the function may want naming
> verify_IO_APIC() for consistency, but that's surely minor.)

I wanted the call to sanitize_IO_APIC() to be done at the same level
where the call to setup_IO_APIC() is done, because it makes the logic
clearer.

> As to also checking skip_ioapic_setup - wouldn't the unmasked pin be
> similarly troublesome in that case?

skip_ioapic_setup is set when the IO-APIC address in the MADT is
invalid, so in that case attempting to access IO-APIC registers in
that case won't lead to anything good.

> Aiui it would not hurt only if
> the LAPIC also isn't (going to be) set up. Then again the flag,
> among other things, signals a zero base address found in the ACPI or
> MP tables, so I guess this is a (largely) separate issue anyway.

Oh, yes, indeed.  See my reply above.

Thanks, Roger.

Reply via email to