Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock
Hi Jan, Jan Beulich writes: > On 20.07.2023 02:32, Volodymyr Babchuk wrote: >> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c >> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c >> @@ -476,8 +476,13 @@ static int cf_check reassign_device( >> >> if ( devfn == pdev->devfn && pdev->domain != target ) >> { >> -list_move(&pdev->domain_list, &target->pdev_list); >> -pdev->domain = target; >> +write_lock(&pdev->domain->pci_lock); >> +list_del(&pdev->domain_list); >> +write_unlock(&pdev->domain->pci_lock); > > As mentioned on an earlier version, perhaps better (cheaper) to use > "source" here? (Same in VT-d code then.) Sorry, I saw you comment for the previous version, but missed to include this change. It will be done in the next version. >> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, >> ret = 0; >> if ( !pdev->domain ) >> { >> +write_lock(&hardware_domain->pci_lock); >> pdev->domain = hardware_domain; >> list_add(&pdev->domain_list, &hardware_domain->pdev_list); >> >> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, >> printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret); >> list_del(&pdev->domain_list); >> pdev->domain = NULL; >> +write_unlock(&hardware_domain->pci_lock); >> goto out; > > In addition to Roger's comments about locking scope: In a case like this > one it would probably also be good to move the printk() out of the locked > area. It can be slow, after all. > > Question is why you have this wide a locked area here in the first place: > Don't you need to hold the lock just across the two list operations (but > not in between)? Strictly speaking yes, we need to hold lock only when operating on the list. For now. Next patch will use the same lock to protect the VPCI (de)alloction, so locked region will be extended anyways. I think, I'll decrease locked area in this patch and increase in the next one, it will be most logical. >> @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t >> seg, uint8_t bus, >> >> int pci_release_devices(struct domain *d) >> { >> -struct pci_dev *pdev, *tmp; >> -u8 bus, devfn; >> -int ret; >> +int combined_ret; >> +LIST_HEAD(failed_pdevs); >> >> pcidevs_lock(); >> -ret = arch_pci_clean_pirqs(d); >> -if ( ret ) >> +write_lock(&d->pci_lock); >> +combined_ret = arch_pci_clean_pirqs(d); >> +if ( combined_ret ) >> { >> pcidevs_unlock(); >> -return ret; >> +write_unlock(&d->pci_lock); >> +return combined_ret; >> } >> -list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list ) >> + >> +while ( !list_empty(&d->pdev_list) ) >> { >> -bus = pdev->bus; >> -devfn = pdev->devfn; >> -ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret; >> +struct pci_dev *pdev = list_first_entry(&d->pdev_list, >> +struct pci_dev, >> +domain_list); >> +uint16_t seg = pdev->seg; >> +uint8_t bus = pdev->bus; >> +uint8_t devfn = pdev->devfn; >> +int ret; >> + >> +write_unlock(&d->pci_lock); >> +ret = deassign_device(d, seg, bus, devfn); >> +write_lock(&d->pci_lock); >> +if ( ret ) >> +{ >> +bool still_present = false; >> +const struct pci_dev *tmp; >> + >> +/* >> + * We need to check if deassign_device() left our pdev in >> + * domain's list. As we dropped the lock, we can't be sure >> + * that list wasn't permutated in some random way, so we >> + * need to traverse the whole list. >> + */ >> +for_each_pdev ( d, tmp ) >> +{ >> +if ( tmp == pdev ) >> +{ >> +still_present = true; >> +break; >> +} >> +} >> +if ( still_present ) >> +list_move(&pdev->domain_list, &failed_pdevs); > > In order to retain original ordering on the resulting list, perhaps better > list_move_tail()? Yes, thanks. -- WBR, Volodymyr
Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock
Hi Roger, Roger Pau Monné writes: > On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote: >> Add per-domain d->pci_lock that protects access to >> d->pdev_list. Purpose of this lock is to give guarantees to VPCI code >> that underlying pdev will not disappear under feet. This is a rw-lock, >> but this patch adds only write_lock()s. There will be read_lock() >> users in the next patches. >> >> This lock should be taken in write mode every time d->pdev_list is >> altered. This covers both accesses to d->pdev_list and accesses to >> pdev->domain_list fields. All write accesses also should be protected >> by pcidevs_lock() as well. Idea is that any user that wants read >> access to the list or to the devices stored in the list should use >> either this new d->pci_lock or old pcidevs_lock(). Usage of any of >> this two locks will ensure only that pdev of interest will not >> disappear from under feet and that the pdev still will be assigned to >> the same domain. Of course, any new users should use pcidevs_lock() >> when it is appropriate (e.g. when accessing any other state that is >> protected by the said lock). > > I think this needs a note about the ordering: > > "In case both the newly introduced per-domain rwlock and the pcidevs > lock is taken, the later must be acquired first." Thanks. Added. >> >> Any write access to pdev->domain_list should be protected by both >> pcidevs_lock() and d->pci_lock in the write mode. > > You also protect calls to vpci_remove_device() with the per-domain > pci_lock it seems, and that will need some explanation as it's not > obvious. Well, strictly speaking, it is not required in this patch. But it is needed in the next one. I can lock only "list_del(&pdev->domain_list);" end extend then locked area in the next patch. On other hand, this patch already protects vpci_add_handlers() call in the pci_add_device() due to the code layout, so it may be natural to protect vpci_remove_device() as well. What is your opinion? >> >> Suggested-by: Roger Pau Monné >> Suggested-by: Jan Beulich >> Signed-off-by: Volodymyr Babchuk >> >> --- >> >> Changes in v8: >> - New patch >> >> Changes in v8 vs RFC: >> - Removed all read_locks after discussion with Roger in #xendevel >> - pci_release_devices() now returns the first error code >> - extended commit message >> - added missing lock in pci_remove_device() >> - extended locked region in pci_add_device() to protect list_del() calls >> --- >> xen/common/domain.c | 1 + >> xen/drivers/passthrough/amd/pci_amd_iommu.c | 9 ++- >> xen/drivers/passthrough/pci.c | 68 + >> xen/drivers/passthrough/vtd/iommu.c | 9 ++- >> xen/include/xen/sched.h | 1 + >> 5 files changed, 74 insertions(+), 14 deletions(-) >> >> diff --git a/xen/common/domain.c b/xen/common/domain.c >> index caaa402637..5d8a8836da 100644 >> --- a/xen/common/domain.c >> +++ b/xen/common/domain.c >> @@ -645,6 +645,7 @@ struct domain *domain_create(domid_t domid, >> >> #ifdef CONFIG_HAS_PCI >> INIT_LIST_HEAD(&d->pdev_list); >> +rwlock_init(&d->pci_lock); >> #endif >> >> /* All error paths can depend on the above setup. */ >> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c >> b/xen/drivers/passthrough/amd/pci_amd_iommu.c >> index 94e3775506..e2f2e2e950 100644 >> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c >> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c >> @@ -476,8 +476,13 @@ static int cf_check reassign_device( >> >> if ( devfn == pdev->devfn && pdev->domain != target ) >> { >> -list_move(&pdev->domain_list, &target->pdev_list); >> -pdev->domain = target; > > You seem to have inadvertently dropped the above line? (and so devices > would keep the previous pdev->domain value) > Oops, yes. Thank you. I was testing those patches on Intel machine, so AMD part left not verified. >> +write_lock(&pdev->domain->pci_lock); >> +list_del(&pdev->domain_list); >> +write_unlock(&pdev->domain->pci_lock); >> + >> +write_lock(&target->pci_lock); >> +list_add(&pdev->domain_list, &target->pdev_list); >> +write_unlock(&target->pci_lock); >> } >> >> /* >> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c >> index 95846e84f2..5b4632ead2 100644 >> --- a/xen/drivers/passthrough/pci.c >> +++ b/xen/drivers/passthrough/pci.c >> @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev) >> if ( pdev->domain ) >> return; >> pdev->domain = dom_xen; >> +write_lock(&dom_xen->pci_lock); >> list_add(&pdev->domain_list, &dom_xen->pdev_list); >> +write_unlock(&dom_xen->pci_lock); >> } >> >> int __init pci_hide_device(unsigned int seg, unsigned int bus, >> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, >> ret = 0; >> if ( !pdev->domain ) >> { >> +write_l
Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock
On 20.07.2023 02:32, Volodymyr Babchuk wrote: > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c > @@ -476,8 +476,13 @@ static int cf_check reassign_device( > > if ( devfn == pdev->devfn && pdev->domain != target ) > { > -list_move(&pdev->domain_list, &target->pdev_list); > -pdev->domain = target; > +write_lock(&pdev->domain->pci_lock); > +list_del(&pdev->domain_list); > +write_unlock(&pdev->domain->pci_lock); As mentioned on an earlier version, perhaps better (cheaper) to use "source" here? (Same in VT-d code then.) > @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, > ret = 0; > if ( !pdev->domain ) > { > +write_lock(&hardware_domain->pci_lock); > pdev->domain = hardware_domain; > list_add(&pdev->domain_list, &hardware_domain->pdev_list); > > @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, > printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret); > list_del(&pdev->domain_list); > pdev->domain = NULL; > +write_unlock(&hardware_domain->pci_lock); > goto out; In addition to Roger's comments about locking scope: In a case like this one it would probably also be good to move the printk() out of the locked area. It can be slow, after all. Question is why you have this wide a locked area here in the first place: Don't you need to hold the lock just across the two list operations (but not in between)? > @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t > seg, uint8_t bus, > > int pci_release_devices(struct domain *d) > { > -struct pci_dev *pdev, *tmp; > -u8 bus, devfn; > -int ret; > +int combined_ret; > +LIST_HEAD(failed_pdevs); > > pcidevs_lock(); > -ret = arch_pci_clean_pirqs(d); > -if ( ret ) > +write_lock(&d->pci_lock); > +combined_ret = arch_pci_clean_pirqs(d); > +if ( combined_ret ) > { > pcidevs_unlock(); > -return ret; > +write_unlock(&d->pci_lock); > +return combined_ret; > } > -list_for_each_entry_safe ( pdev, tmp, &d->pdev_list, domain_list ) > + > +while ( !list_empty(&d->pdev_list) ) > { > -bus = pdev->bus; > -devfn = pdev->devfn; > -ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret; > +struct pci_dev *pdev = list_first_entry(&d->pdev_list, > +struct pci_dev, > +domain_list); > +uint16_t seg = pdev->seg; > +uint8_t bus = pdev->bus; > +uint8_t devfn = pdev->devfn; > +int ret; > + > +write_unlock(&d->pci_lock); > +ret = deassign_device(d, seg, bus, devfn); > +write_lock(&d->pci_lock); > +if ( ret ) > +{ > +bool still_present = false; > +const struct pci_dev *tmp; > + > +/* > + * We need to check if deassign_device() left our pdev in > + * domain's list. As we dropped the lock, we can't be sure > + * that list wasn't permutated in some random way, so we > + * need to traverse the whole list. > + */ > +for_each_pdev ( d, tmp ) > +{ > +if ( tmp == pdev ) > +{ > +still_present = true; > +break; > +} > +} > +if ( still_present ) > +list_move(&pdev->domain_list, &failed_pdevs); In order to retain original ordering on the resulting list, perhaps better list_move_tail()? Jan
Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock
On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote: > Add per-domain d->pci_lock that protects access to > d->pdev_list. Purpose of this lock is to give guarantees to VPCI code > that underlying pdev will not disappear under feet. This is a rw-lock, > but this patch adds only write_lock()s. There will be read_lock() > users in the next patches. > > This lock should be taken in write mode every time d->pdev_list is > altered. This covers both accesses to d->pdev_list and accesses to > pdev->domain_list fields. All write accesses also should be protected > by pcidevs_lock() as well. Idea is that any user that wants read > access to the list or to the devices stored in the list should use > either this new d->pci_lock or old pcidevs_lock(). Usage of any of > this two locks will ensure only that pdev of interest will not > disappear from under feet and that the pdev still will be assigned to > the same domain. Of course, any new users should use pcidevs_lock() > when it is appropriate (e.g. when accessing any other state that is > protected by the said lock). I think this needs a note about the ordering: "In case both the newly introduced per-domain rwlock and the pcidevs lock is taken, the later must be acquired first." > > Any write access to pdev->domain_list should be protected by both > pcidevs_lock() and d->pci_lock in the write mode. You also protect calls to vpci_remove_device() with the per-domain pci_lock it seems, and that will need some explanation as it's not obvious. > > Suggested-by: Roger Pau Monné > Suggested-by: Jan Beulich > Signed-off-by: Volodymyr Babchuk > > --- > > Changes in v8: > - New patch > > Changes in v8 vs RFC: > - Removed all read_locks after discussion with Roger in #xendevel > - pci_release_devices() now returns the first error code > - extended commit message > - added missing lock in pci_remove_device() > - extended locked region in pci_add_device() to protect list_del() calls > --- > xen/common/domain.c | 1 + > xen/drivers/passthrough/amd/pci_amd_iommu.c | 9 ++- > xen/drivers/passthrough/pci.c | 68 + > xen/drivers/passthrough/vtd/iommu.c | 9 ++- > xen/include/xen/sched.h | 1 + > 5 files changed, 74 insertions(+), 14 deletions(-) > > diff --git a/xen/common/domain.c b/xen/common/domain.c > index caaa402637..5d8a8836da 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -645,6 +645,7 @@ struct domain *domain_create(domid_t domid, > > #ifdef CONFIG_HAS_PCI > INIT_LIST_HEAD(&d->pdev_list); > +rwlock_init(&d->pci_lock); > #endif > > /* All error paths can depend on the above setup. */ > diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c > b/xen/drivers/passthrough/amd/pci_amd_iommu.c > index 94e3775506..e2f2e2e950 100644 > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c > @@ -476,8 +476,13 @@ static int cf_check reassign_device( > > if ( devfn == pdev->devfn && pdev->domain != target ) > { > -list_move(&pdev->domain_list, &target->pdev_list); > -pdev->domain = target; You seem to have inadvertently dropped the above line? (and so devices would keep the previous pdev->domain value) > +write_lock(&pdev->domain->pci_lock); > +list_del(&pdev->domain_list); > +write_unlock(&pdev->domain->pci_lock); > + > +write_lock(&target->pci_lock); > +list_add(&pdev->domain_list, &target->pdev_list); > +write_unlock(&target->pci_lock); > } > > /* > diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c > index 95846e84f2..5b4632ead2 100644 > --- a/xen/drivers/passthrough/pci.c > +++ b/xen/drivers/passthrough/pci.c > @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev) > if ( pdev->domain ) > return; > pdev->domain = dom_xen; > +write_lock(&dom_xen->pci_lock); > list_add(&pdev->domain_list, &dom_xen->pdev_list); > +write_unlock(&dom_xen->pci_lock); > } > > int __init pci_hide_device(unsigned int seg, unsigned int bus, > @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, > ret = 0; > if ( !pdev->domain ) > { > +write_lock(&hardware_domain->pci_lock); > pdev->domain = hardware_domain; > list_add(&pdev->domain_list, &hardware_domain->pdev_list); > > @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, > printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret); > list_del(&pdev->domain_list); > pdev->domain = NULL; > +write_unlock(&hardware_domain->pci_lock); Strictly speaking, this could move one line earlier, as accesses to pdev->domain are not protected by the d->pci_lock? Same in other instances (above and below), as you seem to introduce a pattern to perform accesses to pdev->domain wit
[PATCH v8 01/13] pci: introduce per-domain PCI rwlock
Add per-domain d->pci_lock that protects access to d->pdev_list. Purpose of this lock is to give guarantees to VPCI code that underlying pdev will not disappear under feet. This is a rw-lock, but this patch adds only write_lock()s. There will be read_lock() users in the next patches. This lock should be taken in write mode every time d->pdev_list is altered. This covers both accesses to d->pdev_list and accesses to pdev->domain_list fields. All write accesses also should be protected by pcidevs_lock() as well. Idea is that any user that wants read access to the list or to the devices stored in the list should use either this new d->pci_lock or old pcidevs_lock(). Usage of any of this two locks will ensure only that pdev of interest will not disappear from under feet and that the pdev still will be assigned to the same domain. Of course, any new users should use pcidevs_lock() when it is appropriate (e.g. when accessing any other state that is protected by the said lock). Any write access to pdev->domain_list should be protected by both pcidevs_lock() and d->pci_lock in the write mode. Suggested-by: Roger Pau Monné Suggested-by: Jan Beulich Signed-off-by: Volodymyr Babchuk --- Changes in v8: - New patch Changes in v8 vs RFC: - Removed all read_locks after discussion with Roger in #xendevel - pci_release_devices() now returns the first error code - extended commit message - added missing lock in pci_remove_device() - extended locked region in pci_add_device() to protect list_del() calls --- xen/common/domain.c | 1 + xen/drivers/passthrough/amd/pci_amd_iommu.c | 9 ++- xen/drivers/passthrough/pci.c | 68 + xen/drivers/passthrough/vtd/iommu.c | 9 ++- xen/include/xen/sched.h | 1 + 5 files changed, 74 insertions(+), 14 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index caaa402637..5d8a8836da 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -645,6 +645,7 @@ struct domain *domain_create(domid_t domid, #ifdef CONFIG_HAS_PCI INIT_LIST_HEAD(&d->pdev_list); +rwlock_init(&d->pci_lock); #endif /* All error paths can depend on the above setup. */ diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c b/xen/drivers/passthrough/amd/pci_amd_iommu.c index 94e3775506..e2f2e2e950 100644 --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c @@ -476,8 +476,13 @@ static int cf_check reassign_device( if ( devfn == pdev->devfn && pdev->domain != target ) { -list_move(&pdev->domain_list, &target->pdev_list); -pdev->domain = target; +write_lock(&pdev->domain->pci_lock); +list_del(&pdev->domain_list); +write_unlock(&pdev->domain->pci_lock); + +write_lock(&target->pci_lock); +list_add(&pdev->domain_list, &target->pdev_list); +write_unlock(&target->pci_lock); } /* diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index 95846e84f2..5b4632ead2 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev) if ( pdev->domain ) return; pdev->domain = dom_xen; +write_lock(&dom_xen->pci_lock); list_add(&pdev->domain_list, &dom_xen->pdev_list); +write_unlock(&dom_xen->pci_lock); } int __init pci_hide_device(unsigned int seg, unsigned int bus, @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, ret = 0; if ( !pdev->domain ) { +write_lock(&hardware_domain->pci_lock); pdev->domain = hardware_domain; list_add(&pdev->domain_list, &hardware_domain->pdev_list); @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret); list_del(&pdev->domain_list); pdev->domain = NULL; +write_unlock(&hardware_domain->pci_lock); goto out; } ret = iommu_add_device(pdev); @@ -768,8 +772,10 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, vpci_remove_device(pdev); list_del(&pdev->domain_list); pdev->domain = NULL; +write_unlock(&hardware_domain->pci_lock); goto out; } +write_unlock(&hardware_domain->pci_lock); } else iommu_enable_device(pdev); @@ -812,11 +818,13 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn) list_for_each_entry ( pdev, &pseg->alldevs_list, alldevs_list ) if ( pdev->bus == bus && pdev->devfn == devfn ) { +write_lock(&pdev->domain->pci_lock); vpci_remove_device(pdev); pci_cleanup_msi(pdev); ret = iommu_remove_device(pdev); if ( pdev->domain ) list_del(&pdev->domain_list); +write_unlock(&pdev->do