On Thu, Nov 25, 2021 at 09:57:31AM +0100, Jan Beulich wrote:
> On 24.11.2021 22:11, Andrew Cooper wrote:
> > OSSTest has identified a 3rd regression caused by this change.  Migration
> > between Xen 4.15 and 4.16 on the nocera pair of machines (AMD Opteron 4133)
> > fails with:
> > 
> >   xc: error: Failed to set CPUID policy: leaf 00000000, subleaf ffffffff, 
> > msr ffffffff (22 = Invalid argument): Internal error
> >   xc: error: Restore failed (22 = Invalid argument): Internal error
> > 
> > which is a safety check to prevent resuming the guest when the CPUID data 
> > has
> > been truncated.  The problem is caused by shrinking of the max policies, 
> > which
> > is an ABI that needs handling compatibly between different versions of Xen.
> 
> Would you mind pointing me to the flight which has hit this problem? I
> don't think I've seen any respective failure. I also don't think I've
> seen any respective discussion on irc, so I really wonder where all
> this is coming from all of the sudden. It's not like the change in
> question would have gone in just yesterday.

It's from a pair of newish boxes that Credativ and Ian where
attempting to commission yesterday. Since the boxes are not yet in
production Ian wasn't sure if the issue could be on the testing or
hardware side, so emailed the details in private for us to provide
some feedback on the issue. The error is at:

http://logs.test-lab.xenproject.org/osstest/logs/166296/test-amd64-amd64-migrupgrade/info.html

> > Furthermore, shrinking of the default policies also breaks things in some
> > cases, because certain cpuid= settings in a VM config file which used to 
> > have
> > an effect will now be silently discarded.
> 
> I'm afraid I don't see what you're talking about here. Could you give
> an example? Is this about features the flags of which live in the
> higher leaves, which would have got stripped from the default policies?
> Which would mean the stripping really should happen on the max policies
> only (where it may not have much of an effect).

I think there are two separate issues, which I tried to clarify in my
reply to this same patch.

Options set using cpuid= with xl could now be rejected when in
previous versions they were accepted just fine. That's because the
shrinking to the policies can now cause find_leaf calls in
xc_cpuid_xend_policy to fail, and thus the whole operation would
fail.

There's another likely error that only affects callers of
xc_cpuid_apply_policy that pass a featureset (so not the upstream
toolstack), where some leaves of the featureset could now be ignored
by the guest if the max leaves value doesn't cover them anymore. Note
this was already an issue even before 540d911c2813, as applying the
featureset doesn't check that the set feature leaves are below the max
leaf.

Roger.

Reply via email to