On 03/06/2020 12:48, Paul Durrant wrote: >> -----Original Message----- >> From: Igor Druzhinin <igor.druzhi...@citrix.com> >> Sent: 03 June 2020 12:45 >> To: p...@xen.org; 'Jan Beulich' <jbeul...@suse.com> >> Cc: xen-devel@lists.xenproject.org; andrew.coop...@citrix.com; w...@xen.org; >> roger....@citrix.com; >> george.dun...@citrix.com >> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults >> immediately >> >> On 03/06/2020 12:28, Paul Durrant wrote: >>>> -----Original Message----- >>>> From: Jan Beulich <jbeul...@suse.com> >>>> Sent: 03 June 2020 12:22 >>>> To: p...@xen.org >>>> Cc: 'Igor Druzhinin' <igor.druzhi...@citrix.com>; >>>> xen-devel@lists.xenproject.org; >>>> andrew.coop...@citrix.com; w...@xen.org; roger....@citrix.com; >>>> george.dun...@citrix.com >>>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults >>>> immediately >>>> >>>> On 03.06.2020 12:26, Paul Durrant wrote: >>>>>> -----Original Message----- >>>>>> From: Jan Beulich <jbeul...@suse.com> >>>>>> Sent: 03 June 2020 11:03 >>>>>> To: Igor Druzhinin <igor.druzhi...@citrix.com> >>>>>> Cc: xen-devel@lists.xenproject.org; andrew.coop...@citrix.com; >>>>>> w...@xen.org; roger....@citrix.com; >>>>>> george.dun...@citrix.com; Paul Durrant <p...@xen.org> >>>>>> Subject: Re: [PATCH v2] x86/svm: do not try to handle recalc NPT faults >>>>>> immediately >>>>>> >>>>>> On 02.06.2020 18:56, Igor Druzhinin wrote: >>>>>>> A recalculation NPT fault doesn't always require additional handling >>>>>>> in hvm_hap_nested_page_fault(), moreover in general case if there is no >>>>>>> explicit handling done there - the fault is wrongly considered fatal. >>>>>>> >>>>>>> This covers a specific case of migration with vGPU assigned on AMD: >>>>>>> at a moment log-dirty is enabled globally, recalculation is requested >>>>>>> for the whole guest memory including directly mapped MMIO regions of >>>>>>> vGPU >>>>>>> which causes a page fault being raised at the first access to those; >>>>>>> but due to MMIO P2M type not having any explicit handling in >>>>>>> hvm_hap_nested_page_fault() a domain is erroneously crashed with >>>>>>> unhandled >>>>>>> SVM violation. >>>>>>> >>>>>>> Instead of trying to be opportunistic - use safer approach and handle >>>>>>> P2M recalculation in a separate NPT fault by attempting to retry after >>>>>>> making the necessary adjustments. This is aligned with Intel behavior >>>>>>> where there are separate VMEXITs for recalculation and EPT violations >>>>>>> (faults) and only faults are handled in hvm_hap_nested_page_fault(). >>>>>>> Do it by also unifying do_recalc return code with Intel implementation >>>>>>> where returning 1 means P2M was actually changed. >>>>>>> >>>>>>> Since there was no case previously where >>>>>>> p2m_pt_handle_deferred_changes() >>>>>>> could return a positive value - it's safe to replace ">= 0" with just >>>>>>> "== 0" >>>>>>> in VMEXIT_NPF handler. finish_type_change() is also not affected by the >>>>>>> change as being able to deal with >0 return value of p2m->recalc from >>>>>>> EPT implementation. >>>>>>> >>>>>>> Reviewed-by: Roger Pau Monné <roger....@citrix.com> >>>>>>> Signed-off-by: Igor Druzhinin <igor.druzhi...@citrix.com> >>>>>> >>>>>> Reviewed-by: Jan Beulich <jbeul...@suse.com> >>>>>> albeit preferably with ... >>>>>> >>>>>>> @@ -448,12 +451,14 @@ static int do_recalc(struct p2m_domain *p2m, >>>>>>> unsigned long gfn) >>>>>>> clear_recalc(l1, e); >>>>>>> err = p2m->write_p2m_entry(p2m, gfn, pent, e, level + 1); >>>>>>> ASSERT(!err); >>>>>>> + >>>>>>> + recalc_done = true; >>>>>>> } >>>>>>> >>>>>>> out: >>>>>>> unmap_domain_page(table); >>>>>>> >>>>>>> - return err; >>>>>>> + return err ?: (recalc_done ? 1 : 0); >>>>>> >>>>>> ... this shrunk to >>>>>> >>>>>> return err ?: recalc_done; >>>>>> >>>>>> (easily doable while committing). >>>>>> >>>>>> Also Cc Paul. >>>>>> >>>>> >>>>> paging_log_dirty_enable() still fails global enable if has_arch_pdevs() >>>>> is true, so presumably there's no desperate need for this to go in 4.14? >>>> >>>> The MMIO case is just the particular situation here. Aiui the same issue >>>> could potentially surface with other p2m types. Also given I'd consider >>>> this a backporting candidate, while it may not be desperately needed for >>>> the release, I think it deserves considering beyond the specific aspect >>>> you mention. >>>> >>> >>> In which case I think the commit message probably ought to be rephrased, >>> since it appears to focus >> on a case that cannot currently happen. >> >> This can happen without has_arch_pdevs() being true. It's enough to just >> directly map some physical memory into a guest to get p2m_direct_mmio >> type present in the page tables. > > OK, that's fine, but when will that happen without pass-through? If we can > have a commit message justifying the change based on what can actually happen > now, then I would not be opposed to it going in 4.14.
Yes, it can happen now - we had regular (not SR-IOV) vGPU migration totally broken because of this on AMD - never got tested before at all. You don't need special patches on top of Xen or anything. To get p2m_mmio_direct you just need to call XEN_DOMCTL_memory_mapping on a domain. All Igor