On Tue, Mar 28, 2023 at 03:23:56PM +0200, Jan Beulich wrote:
> On 28.03.2023 15:04, Marek Marczykowski-Górecki wrote:
> > On Tue, Mar 28, 2023 at 02:54:38PM +0200, Jan Beulich wrote:
> >> On 25.03.2023 03:49, Marek Marczykowski-Górecki wrote:
> >>> Some firmware/devices are found to not reset MSI-X properly, leaving
> >>> MASKALL set. Xen relies on initial state being both disabled.
> >>> Especially, pci_reset_msix_state() assumes if MASKALL is set, it was Xen
> >>> setting it due to msix->host_maskall or msix->guest_maskall. Clearing
> >>> just MASKALL might be unsafe if ENABLE is set, so clear them both.
> >>
> >> But pci_reset_msix_state() comes into play only when assigning a device
> >> to a DomU. If the tool stack doing a reset doesn't properly clear the
> >> bit, how would it be cleared the next time round (i.e. after the guest
> >> stopped and then possibly was started again)? It feels like the issue
> >> wants dealing with elsewhere, possibly in the tool stack.
> > 
> > I may be misremembering some details, but AFAIR Xen intercepts
> > toolstack's (or more generally: accesses from dom0) attempt to clean
> > this up and once it enters an inconsistent state (or rather: starts with
> > such at the start of the day), there was no way to clean it up.
> 
> Iirc Roger and you already discussed that there needs to be an
> indication of device reset having happened, so that Xen can resync
> from this "behind its back" operation. That would look to be the
> point/place where such inconsistencies should be eliminated.

I think that was a different conversation with Huang Rui related to
the AMD GPU work, see:

https://lore.kernel.org/xen-devel/ZBwtaceTNvCYksmR@Air-de-Roger/

I understood the problem Marek was trying to solve was that some
devices where initialized with the MASKALL bit set (likely by the
firmware?) and that prevented Xen from using them.  But now seeing the
further replies on this patch I'm unsure whether that's the case.

Thanks, Roger.

Reply via email to