On 25.11.2021 10:24, Roger Pau Monné wrote:
> On Thu, Nov 25, 2021 at 09:57:31AM +0100, Jan Beulich wrote:
>> On 24.11.2021 22:11, Andrew Cooper wrote:
>>> OSSTest has identified a 3rd regression caused by this change.  Migration
>>> between Xen 4.15 and 4.16 on the nocera pair of machines (AMD Opteron 4133)
>>> fails with:
>>>
>>>   xc: error: Failed to set CPUID policy: leaf 00000000, subleaf ffffffff, 
>>> msr ffffffff (22 = Invalid argument): Internal error
>>>   xc: error: Restore failed (22 = Invalid argument): Internal error
>>>
>>> which is a safety check to prevent resuming the guest when the CPUID data 
>>> has
>>> been truncated.  The problem is caused by shrinking of the max policies, 
>>> which
>>> is an ABI that needs handling compatibly between different versions of Xen.
>>
>> Would you mind pointing me to the flight which has hit this problem? I
>> don't think I've seen any respective failure. I also don't think I've
>> seen any respective discussion on irc, so I really wonder where all
>> this is coming from all of the sudden. It's not like the change in
>> question would have gone in just yesterday.
> 
> It's from a pair of newish boxes that Credativ and Ian where
> attempting to commission yesterday. Since the boxes are not yet in
> production Ian wasn't sure if the issue could be on the testing or
> hardware side, so emailed the details in private for us to provide
> some feedback on the issue. The error is at:
> 
> http://logs.test-lab.xenproject.org/osstest/logs/166296/test-amd64-amd64-migrupgrade/info.html

I see. Quite lucky timing then.

>>> Furthermore, shrinking of the default policies also breaks things in some
>>> cases, because certain cpuid= settings in a VM config file which used to 
>>> have
>>> an effect will now be silently discarded.
>>
>> I'm afraid I don't see what you're talking about here. Could you give
>> an example? Is this about features the flags of which live in the
>> higher leaves, which would have got stripped from the default policies?
>> Which would mean the stripping really should happen on the max policies
>> only (where it may not have much of an effect).
> 
> I think there are two separate issues, which I tried to clarify in my
> reply to this same patch.
> 
> Options set using cpuid= with xl could now be rejected when in
> previous versions they were accepted just fine. That's because the
> shrinking to the policies can now cause find_leaf calls in
> xc_cpuid_xend_policy to fail, and thus the whole operation would
> fail.

Okay, this could be addressed by merely dropping the calls from
calculate_{pv,hvm}_def_policy(). Thinking about it, I can surely
agree they shouldn't have been put there in the first place.
Which would be quite the opposite of your initial proposal, where
you did drop them from calculate_{pv,hvm}_max_policy(). A guest
migrating in with a larger max leaf value should merely have that
max leaf value retained, but that ought to be possible without
dropping the shrinking from calculate_{pv,hvm}_max_policy(). Even
leaving aside migration, I guess an explicit request for a large
max leaf value should be honored; those possibly many trailing
leaves then would simply all be blank.

> There's another likely error that only affects callers of
> xc_cpuid_apply_policy that pass a featureset (so not the upstream
> toolstack), where some leaves of the featureset could now be ignored
> by the guest if the max leaves value doesn't cover them anymore. Note
> this was already an issue even before 540d911c2813, as applying the
> featureset doesn't check that the set feature leaves are below the max
> leaf.

If this was an issue before the commit to be reverted, I take it
the revert isn't going to help it? In which case this information
is interesting, but not applicable as justification for the
revert?

Jan


Reply via email to