Hello Stefano,

Stefano Stabellini <sstabell...@kernel.org> writes:

> On Wed, 31 Aug 2022, Volodymyr Babchuk wrote:
>> domain->pdevs_lock protects access to domain->pdev_list.
>> As this, it should be used when we are adding, removing on enumerating
>> PCI devices assigned to a domain.
>> 
>> This enables more granular locking instead of one huge pcidevs_lock that
>> locks entire PCI subsystem. Please note that pcidevs_lock() is still
>> used, we are going to remove it in subsequent patches.
>> 
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babc...@epam.com>
>
> I reviewed the patch, and made sure to pay extra attention to:

Thank you for doing this.

> - error paths
> - missing locks
> - lock ordering
> - interruptions

> Here is what I found:
>
>
> 1) iommu.c:reassign_device_ownership and pci_amd_iommu.c:reassign_device
> Both functions without any pdevs_lock locking do:
> list_move(&pdev->domain_list, &target->pdev_list);
>
> It seems to be it would need pdevs_lock. Maybe we need to change
> list_move into list_del (protected by the pdevs_lock of the old domain)
> and list_add (protected by the pdev_lock of the new domain).

Yes, I did as you suggested. But this leads to another potential
issue. I'll describe it below, in deassign_device() part.

[...]

>> +    spin_lock(&d->pdevs_lock);
>>      list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list )
>>      {
>>          bus = pdev->bus;
>>          devfn = pdev->devfn;
>>          ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>
> This causes pdevs_lock to be taken twice. deassign_device also takes a
> pdevs_lock.  Probably we need to change all the
> spin_lock(&d->pdevs_lock) into spin_lock_recursive.

You are right, I missed that deassign_device() causes
iommu*_reassign_device() call. But there lies the issue: with recursive
locks, reassign_device() will not be able to unlock source->pdevs_lock,
but will try to take target->pdevs_lock also. This potentially might
lead to deadlock, if another call to reassign_device() moves some other
pdev in the opposite way at the same time. This is why I want to avoid
recursive spinlocks if possible.

So, I am thinking: why does IOMMU code move a pdev across domains in the
first place? We are making IOMMU driver responsible of managing domain's
pdevs, which does not look right, as this is the responsibility of pci.c

I want to propose another approach: implement deassign_device() function
in IOMMU drivers. Then, instead of calling to reassign_device() we might
do the following:

1. deassign_device()
2. remove pdev from source->pdev_list
3. add pdef to target->pdev_list
4. assign_device()


[...]

>> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
>> index 1cf629e7ec..0775228ba9 100644
>> --- a/xen/include/xen/sched.h
>> +++ b/xen/include/xen/sched.h
>> @@ -457,6 +457,7 @@ struct domain
>>  
>>  #ifdef CONFIG_HAS_PCI
>>      struct list_head pdev_list;
>> +    spinlock_t pdevs_lock;
>
> I think it would be better called "pdev_lock" but OK either way

As Jan pointed out, we are locking devices as in plural. On other hand, I can 
rename
pdev_list to pdevs_lists to make this consistent.

>
>
>>  #endif
>>  
>>  #ifdef CONFIG_HAS_PASSTHROUGH
>> -- 
>> 2.36.1
>> 

Reply via email to