On Fri, Feb 04, 2022 at 10:12:46AM +0000, Oleksandr Andrushchenko wrote:
> Hi, Jan!
> 
> On 04.02.22 11:15, Jan Beulich wrote:
> > On 04.02.2022 09:58, Oleksandr Andrushchenko wrote:
> >> On 04.02.22 09:52, Jan Beulich wrote:
> >>> On 04.02.2022 07:34, Oleksandr Andrushchenko wrote:
> >>>> @@ -285,6 +286,12 @@ static int modify_bars(const struct pci_dev *pdev, 
> >>>> uint16_t cmd, bool rom_only)
> >>>>                    continue;
> >>>>            }
> >>>>    
> >>>> +        spin_lock(&tmp->vpci_lock);
> >>>> +        if ( !tmp->vpci )
> >>>> +        {
> >>>> +            spin_unlock(&tmp->vpci_lock);
> >>>> +            continue;
> >>>> +        }
> >>>>            for ( i = 0; i < ARRAY_SIZE(tmp->vpci->header.bars); i++ )
> >>>>            {
> >>>>                const struct vpci_bar *bar = &tmp->vpci->header.bars[i];
> >>>> @@ -303,12 +310,14 @@ static int modify_bars(const struct pci_dev *pdev, 
> >>>> uint16_t cmd, bool rom_only)
> >>>>                rc = rangeset_remove_range(mem, start, end);
> >>>>                if ( rc )
> >>>>                {
> >>>> +                spin_unlock(&tmp->vpci_lock);
> >>>>                    printk(XENLOG_G_WARNING "Failed to remove [%lx, %lx]: 
> >>>> %d\n",
> >>>>                           start, end, rc);
> >>>>                    rangeset_destroy(mem);
> >>>>                    return rc;
> >>>>                }
> >>>>            }
> >>>> +        spin_unlock(&tmp->vpci_lock);
> >>>>        }
> >>> At the first glance this simply looks like another unjustified (in the
> >>> description) change, as you're not converting anything here but you
> >>> actually add locking (and I realize this was there before, so I'm sorry
> >>> for not pointing this out earlier).
> >> Well, I thought that the description already has "...the lock can be
> >> used (and in a few cases is used right away) to check whether vpci
> >> is present" and this is enough for such uses as here.
> >>>    But then I wonder whether you
> >>> actually tested this, since I can't help getting the impression that
> >>> you're introducing a live-lock: The function is called from cmd_write()
> >>> and rom_write(), which in turn are called out of vpci_write(). Yet that
> >>> function already holds the lock, and the lock is not (currently)
> >>> recursive. (For the 3rd caller of the function - init_bars() - otoh
> >>> the locking looks to be entirely unnecessary.)
> >> Well, you are correct: if tmp != pdev then it is correct to acquire
> >> the lock. But if tmp == pdev and rom_only == true
> >> then we'll deadlock.
> >>
> >> It seems we need to have the locking conditional, e.g. only lock
> >> if tmp != pdev
> > Which will address the live-lock, but introduce ABBA deadlock potential
> > between the two locks.
> I am not sure I can suggest a better solution here
> @Roger, @Jan, could you please help here?

I think we could set the locking order based on the memory address of
the locks, ie:

if ( &tmp->vpci_lock < &pdev->vpci_lock )
{
    spin_unlock(&pdev->vpci_lock);
    spin_lock(&tmp->vpci_lock);
    spin_lock(&pdev->vpci_lock);
    if ( !pdev->vpci || &pdev->vpci->header != header )
        /* ERROR: vpci removed or recreated. */
}
else
    spin_lock(&tmp->vpci_lock);

That however creates a window where the address of the BARs on the
current device (pdev) could be changed, so the result of the mapping
might be skewed. I think the guest would only have itself to blame for
that, as changing the position of the BARs while toggling memory
decoding is not something sensible to do.

> >
> >>>> @@ -222,10 +239,10 @@ static int msix_read(struct vcpu *v, unsigned long 
> >>>> addr, unsigned int len,
> >>>>                break;
> >>>>            }
> >>>>    
> >>>> +        msix_put(msix);
> >>>>            return X86EMUL_OKAY;
> >>>>        }
> >>>>    
> >>>> -    spin_lock(&msix->pdev->vpci->lock);
> >>>>        entry = get_entry(msix, addr);
> >>>>        offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
> >>> You're increasing the locked region quite a bit here. If this is really
> >>> needed, it wants explaining. And if this is deemed acceptable as a
> >>> "side effect", it wants justifying or at least stating imo. Same for
> >>> msix_write() then, obviously.
> >> Yes, I do increase the locking region here, but the msix variable needs
> >> to be protected all the time, so it seems to be obvious that it remains
> >> under the lock
> > What does the msix variable have to do with the vPCI lock? If you see
> > a need to grow the locked region here, then surely this is independent
> > of your conversion of the lock, and hence wants to be a prereq fix
> > (which may in fact want/need backporting).
> First of all, the implementation of msix_get is wrong and needs to be:
> 
> /*
>   * Note: if vpci_msix found, then this function returns with
>   * pdev->vpci_lock held. Use msix_put to unlock.
>   */
> static struct vpci_msix *msix_get(const struct domain *d, unsigned long addr)
> {
>      struct vpci_msix *msix;
> 
>      list_for_each_entry ( msix, &d->arch.hvm.msix_tables, next )

Strictly speaking you would also need to introduce a lock here to
protect msix_tables.

This was all designed when hot-adding (or removing) PCI devices to the
domain wasn't supported.

>      {
>          const struct vpci_bar *bars;
>          unsigned int i;
> 
>          spin_lock(&msix->pdev->vpci_lock);
>          if ( !msix->pdev->vpci )
>          {
>              spin_unlock(&msix->pdev->vpci_lock);
>              continue;
>          }
> 
>          bars = msix->pdev->vpci->header.bars;
>          for ( i = 0; i < ARRAY_SIZE(msix->tables); i++ )
>              if ( bars[msix->tables[i] & PCI_MSIX_BIRMASK].enabled &&
>                   VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, i) )
>                  return msix;
> 
>          spin_unlock(&msix->pdev->vpci_lock);
>      }
> 
>      return NULL;
> }
> 
> Then, both msix_{read|write} can dereference msix->pdev->vpci early,
> this is why Roger suggested we move to msix_{get|put} here.
> And yes, we grow the locked region here and yes this might want a
> prereq fix. Or just be fixed while at it.

Ideally yes, we would need a separate fix that introduced
msix_{get,put}, because the currently unlocked regions of
msix_{read,write} do access the BAR address fields, and doing so
without holding the vpci lock would be racy. I would expect that the
writing/reading of the addr field is done in a single instruction, so
it's unlikely to be a problem in practice. That's kind of similar to
the fact that modify_bars also accesses the addr and size fields of
remote BARs without taking the respective lock.

Once the lock is moved outside of the vpci struct and it's used to
assert that pdev->vpci is present then we do need to hold it while
accessing vpci, or else the struct could be removed under our feet.

Roger.

Reply via email to