Re: [PATCH 0/2] iommu/iova: Solve longterm IOVA issue

2020-09-25 Thread Cong Wang
On Fri, Sep 25, 2020 at 2:56 AM John Garry wrote: > > This series contains a patch to solve the longterm IOVA issue which > leizhen originally tried to address at [0]. > > I also included the small optimisation from Cong Wang, which never seems > to be have been accepted [

Re: [Patch v3 1/3] iommu: avoid unnecessary magazine allocations

2020-01-22 Thread Cong Wang
On Wed, Jan 22, 2020 at 9:07 AM Robin Murphy wrote: > > On 21/01/2020 5:21 pm, Cong Wang wrote: > > On Tue, Jan 21, 2020 at 3:11 AM Robin Murphy wrote: > >> > >> On 18/12/2019 4:39 am, Cong Wang wrote: > >>> The IOVA cache algorithm implemented in I

Re: [Patch v3 2/3] iommu: optimize iova_magazine_free_pfns()

2020-01-22 Thread Cong Wang
On Wed, Jan 22, 2020 at 9:34 AM Robin Murphy wrote: > Sorry, but without convincing evidence, this change just looks like > churn for the sake of it. The time I wasted on arguing with you isn't worth anything than the value this patch brings. So let's just drop it to save some time. Thanks.

Re: [Patch v3 2/3] iommu: optimize iova_magazine_free_pfns()

2020-01-21 Thread Cong Wang
On Tue, Jan 21, 2020 at 1:52 AM Robin Murphy wrote: > > On 18/12/2019 4:39 am, Cong Wang wrote: > > If the magazine is empty, iova_magazine_free_pfns() should > > be a nop, however it misses the case of mag->size==0. So we > > should just call iova_magazine_empty(). >

Re: [Patch v3 1/3] iommu: avoid unnecessary magazine allocations

2020-01-21 Thread Cong Wang
On Tue, Jan 21, 2020 at 3:11 AM Robin Murphy wrote: > > On 18/12/2019 4:39 am, Cong Wang wrote: > > The IOVA cache algorithm implemented in IOMMU code does not > > exactly match the original algorithm described in the paper > > "Magazines and Vmem: Extending the Sl

Re: [Patch v3 0/3] iommu: reduce spinlock contention on fast path

2020-01-20 Thread Cong Wang
On Tue, Dec 17, 2019 at 8:40 PM Cong Wang wrote: > > This patchset contains three small optimizations for the global spinlock > contention in IOVA cache. Our memcache perf test shows this reduced its > p999 latency down by 45% on AMD when IOMMU is enabled. > > (Resending v3

[Patch v3 3/3] iommu: avoid taking iova_rbtree_lock twice

2019-12-17 Thread Cong Wang
Both find_iova() and __free_iova() take iova_rbtree_lock, there is no reason to take and release it twice inside free_iova(). Fold them into one critical section by calling the unlock versions instead. Cc: Joerg Roedel Cc: John Garry Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 8

[Patch v3 0/3] iommu: reduce spinlock contention on fast path

2019-12-17 Thread Cong Wang
This patchset contains three small optimizations for the global spinlock contention in IOVA cache. Our memcache perf test shows this reduced its p999 latency down by 45% on AMD when IOMMU is enabled. (Resending v3 on Joerg's request.) Cong Wang (3): iommu: avoid unnecessary mag

[Patch v3 2/3] iommu: optimize iova_magazine_free_pfns()

2019-12-17 Thread Cong Wang
ed-off-by: Cong Wang --- drivers/iommu/iova.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index cb473ddce4cf..184d4c0e20b5 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -797,13 +797,23 @@

[Patch v3 1/3] iommu: avoid unnecessary magazine allocations

2019-12-17 Thread Cong Wang
. Together with a few other changes to make it exactly match the pseudo code in the paper. Cc: Joerg Roedel Cc: John Garry Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 45 +++- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/driv

Re: [Patch v2 0/3] iommu: reduce spinlock contention on fast path

2019-12-17 Thread Cong Wang
On Tue, Dec 17, 2019 at 1:43 AM Joerg Roedel wrote: > > On Thu, Nov 28, 2019 at 04:48:52PM -0800, Cong Wang wrote: > > This patchset contains three small optimizations for the global spinlock > > contention in IOVA cache. Our memcache perf test shows this reduced its > >

[Patch v3 0/3] iommu: reduce spinlock contention on fast path

2019-12-06 Thread Cong Wang
This patchset contains three small optimizations for the global spinlock contention in IOVA cache. Our memcache perf test shows this reduced its p999 latency down by 45% on AMD when IOMMU is enabled. Cong Wang (3): iommu: avoid unnecessary magazine allocations iommu: optimize

[Patch v3 2/3] iommu: optimize iova_magazine_free_pfns()

2019-12-06 Thread Cong Wang
ed-off-by: Cong Wang --- drivers/iommu/iova.c | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index cb473ddce4cf..184d4c0e20b5 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -797,13 +797,23 @@

[Patch v3 1/3] iommu: avoid unnecessary magazine allocations

2019-12-06 Thread Cong Wang
. Together with a few other changes to make it exactly match the pseudo code in the paper. Cc: Joerg Roedel Cc: John Garry Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 45 +++- 1 file changed, 28 insertions(+), 17 deletions(-) diff --git a/driv

[Patch v3 3/3] iommu: avoid taking iova_rbtree_lock twice

2019-12-06 Thread Cong Wang
Both find_iova() and __free_iova() take iova_rbtree_lock, there is no reason to take and release it twice inside free_iova(). Fold them into one critical section by calling the unlock versions instead. Cc: Joerg Roedel Cc: John Garry Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 8

Re: [Patch v2 2/3] iommu: optimize iova_magazine_free_pfns()

2019-12-03 Thread Cong Wang
On Mon, Dec 2, 2019 at 2:02 AM John Garry wrote: > > On 30/11/2019 06:02, Cong Wang wrote: > > On Fri, Nov 29, 2019 at 5:24 AM John Garry wrote: > >> > >> On 29/11/2019 00:48, Cong Wang wrote: > >>> If the maganize is empty, iova_magazine_free_pfns() sh

Re: [Patch v2 2/3] iommu: optimize iova_magazine_free_pfns()

2019-12-03 Thread Cong Wang
On Mon, Dec 2, 2019 at 8:59 AM Christoph Hellwig wrote: > > > + return (mag && mag->size == IOVA_MAG_SIZE); > > > + return (!mag || mag->size == 0); > > No need for the braces in both cases. The current code is already this, I don't want to mix coding style changes with a non-coding-style

Re: [Patch v2 1/3] iommu: match the original algorithm

2019-12-03 Thread Cong Wang
On Mon, Dec 2, 2019 at 2:55 AM John Garry wrote: > Apart from this change, did anyone ever consider kmem cache for the > magazines? You can always make any changes you want after this patch, I can't do all optimizations in one single patch. :) So, I will leave this to you. Thanks. _

Re: [Patch v2 1/3] iommu: match the original algorithm

2019-12-03 Thread Cong Wang
On Mon, Dec 2, 2019 at 8:58 AM Christoph Hellwig wrote: > > I think a subject line better describes what you change, no that > it matches an original algorithm. The fact that the fix matches > the original algorithm can go somewhere towards the commit log, > preferably with a reference to the act

Re: [Patch v2 3/3] iommu: avoid taking iova_rbtree_lock twice

2019-11-29 Thread Cong Wang
On Fri, Nov 29, 2019 at 5:34 AM John Garry wrote: > > On 29/11/2019 00:48, Cong Wang wrote: > > Both find_iova() and __free_iova() take iova_rbtree_lock, > > there is no reason to take and release it twice inside > > free_iova(). > > > > Fold them into the cr

Re: [Patch v2 2/3] iommu: optimize iova_magazine_free_pfns()

2019-11-29 Thread Cong Wang
On Fri, Nov 29, 2019 at 5:24 AM John Garry wrote: > > On 29/11/2019 00:48, Cong Wang wrote: > > If the maganize is empty, iova_magazine_free_pfns() should > > magazine Good catch! > > > be a nop, however it misses the case of mag->size==0. So we > >

Re: [Patch v2 1/3] iommu: match the original algorithm

2019-11-29 Thread Cong Wang
On Fri, Nov 29, 2019 at 6:43 AM John Garry wrote: > > On 29/11/2019 00:48, Cong Wang wrote: > > The IOVA cache algorithm implemented in IOMMU code does not > > exactly match the original algorithm described in the paper. > > > > which paper? It's in drivers

[Patch v2 0/3] iommu: reduce spinlock contention on fast path

2019-11-28 Thread Cong Wang
This patchset contains three small optimizations for the global spinlock contention in IOVA cache. Our memcache perf test shows this reduced its p999 latency down by 45% on AMD when IOMMU is enabled. Cong Wang (3): iommu: match the original algorithm iommu: optimize iova_magazine_free_pfns

[Patch v2 3/3] iommu: avoid taking iova_rbtree_lock twice

2019-11-28 Thread Cong Wang
Both find_iova() and __free_iova() take iova_rbtree_lock, there is no reason to take and release it twice inside free_iova(). Fold them into the critical section by calling the unlock versions instead. Cc: Joerg Roedel Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 8 ++-- 1 file

[Patch v2 2/3] iommu: optimize iova_magazine_free_pfns()

2019-11-28 Thread Cong Wang
If the maganize is empty, iova_magazine_free_pfns() should be a nop, however it misses the case of mag->size==0. So we should just call iova_magazine_empty(). This should reduce the contention on iovad->iova_rbtree_lock a little bit. Cc: Joerg Roedel Signed-off-by: Cong Wang --- drivers

[Patch v2 1/3] iommu: match the original algorithm

2019-11-28 Thread Cong Wang
t and only recycle them when all of them are full. Before this patch, rcache->depot[] contains either full or freed entries, after this patch, it contains either full or empty (but allocated) entries. Cc: Joerg Roedel Signed-off-by: Cong Wang --- drivers/iommu/iov

Re: [PATCH 1/3] iommu: match the original algorithm

2019-11-28 Thread Cong Wang
On Wed, Nov 27, 2019 at 10:01 AM John Garry wrote: > > On 21/11/2019 00:13, Cong Wang wrote: > > The IOVA cache algorithm implemented in IOMMU code does not > > exactly match the original algorithm described in the paper. > > > > Particularly, it doesn't nee

[PATCH 3/3] iommu: avoid taking iova_rbtree_lock twice

2019-11-20 Thread Cong Wang
Both find_iova() and __free_iova() take iova_rbtree_lock, there is no reason to take and release it twice inside free_iova(). Fold them into the critical section by calling the unlock versions instead. Cc: Joerg Roedel Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 8 ++-- 1 file

[PATCH 0/3] iommu: reduce spinlock contention on fast path

2019-11-20 Thread Cong Wang
This patchset contains three small optimizations for the global spinlock contention in IOVA cache. Our memcache perf test shows this reduced its p999 latency down by 45% on AMD when IOMMU is enabled. Cong Wang (3): iommu: match the original algorithm iommu: optimize iova_magazine_free_pfns

[PATCH 2/3] iommu: optimize iova_magazine_free_pfns()

2019-11-20 Thread Cong Wang
If the maganize is empty, iova_magazine_free_pfns() should be a nop, however it misses the case of mag->size==0. So we should just call iova_magazine_empty(). This should reduce the contention on iovad->iova_rbtree_lock a little bit. Cc: Joerg Roedel Signed-off-by: Cong Wang --- drivers

[PATCH 1/3] iommu: match the original algorithm

2019-11-20 Thread Cong Wang
. Cc: Joerg Roedel Signed-off-by: Cong Wang --- drivers/iommu/iova.c | 14 -- 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 41c605b0058f..92f72a85e62a 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -900,7 +