Re: [PATCH 2/7] iommu: per-cpu deferred invalidation queues

2016-04-13 Thread Adam Morrison
On Mon, Apr 11, 2016 at 11:10 PM, Benjamin Serebrin via iommu wrote: > Reviewed-by: Ben Serebrin > > Commit message nit: s/defererred/deferred/ > > intel-iommu.c Line 3615: I'd suggest making the 10ms into a #defined constant. This should probably be done in an independent patch for cleanups. I

Re: [PATCH 7/7] iommu: introduce per-cpu caching to iova allocation

2016-04-13 Thread Adam Morrison
On Tue, Apr 12, 2016 at 12:54 AM, Benjamin Serebrin via iommu wrote: >> +/* >> + * Magazine caches for IOVA ranges. For an introduction to magazines, >> + * see the USENIX 2001 paper "Magazines and Vmem: Extending the Slab >> + * Allocator to Many CPUs and Arbitrary Resources" by Bonwick and Ada

Re: [PATCH v5 3/9] dma-mapping: add dma_{map,unmap}_resource

2016-04-13 Thread Niklas Söderlund
Hi Christoph, On 2016-03-21 08:26:01 -0700, Christoph Hellwig wrote: > On Thu, Mar 17, 2016 at 01:33:51PM +0200, Laurent Pinchart wrote: > > The good news is that, given that no code uses this new API at the moment, > > there isn't much to audit. The patch series implements the resource mapping

[PATCH] iommu/arm-smmu: Don't allocate resources for bypass domains

2016-04-13 Thread Robin Murphy
Until we get fully plumbed into of_iommu_configure, our default IOMMU_DOMAIN_DMA domains just bypass translation. Since we achieve that by leaving the stream table entries set to bypass instead of pointing at a translation context, the context bank we allocate for the domain is completely wasted. C

[PATCH v2] iommu/dma: Finish optimising higher-order allocations

2016-04-13 Thread Robin Murphy
Now that we know exactly which page sizes our caller wants to use in the given domain, we can restrict higher-order allocation attempts to just those sizes, if any, and avoid wasting any time or effort on other sizes which offer no benefit. In the same vein, this also lets us accommodate a minimum

[PATCH 2/7] iommu/arm-smmu: Convert ThunderX workaround to new method

2016-04-13 Thread Robin Murphy
With a framework for implementation-specific funtionality in place, the currently-FDT-dependent ThunderX workaround gets to be the first user. Signed-off-by: Robin Murphy --- drivers/iommu/arm-smmu.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git

[PATCH 1/7] iommu/arm-smmu: Differentiate specific implementations

2016-04-13 Thread Robin Murphy
As the inevitable reality of implementation-specific errata workarounds begin to accrue alongside our integration quirk handling, it's about time the driver had a decent way of keeping track. Extend the per-SMMU data so we can identify specific implementations in an efficient and firmware-agnostic

[PATCH 3/7] iommu/arm-smmu: Work around MMU-500 prefetch errata

2016-04-13 Thread Robin Murphy
MMU-500 erratum #841119 is tickled by a particular set of circumstances interacting with the next-page prefetcher. Since said prefetcher is quite dumb and actually detrimental to performance in some cases (by causing unwanted TLB evictions for non-sequential access patterns), we lose very little by

[PATCH 0/7] arm-smmu: Implementation and context format differentiation

2016-04-13 Thread Robin Murphy
Hi all, This is almost 3 separate sub-series, but there's enough interdependency and horrid merge conflicts that I've been keeping it all together. - Patches 1-3 rationalise implementation-specific details and errata, in order to start shovelling in yet more wonderful workarounds. - Patches 3-6

[PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants

2016-04-13 Thread Robin Murphy
Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed accessor macros as conditional wrappers") makes the *_relaxed forms of I/O accessors universally available to drivers, in cases where writeq() is implemented via the io-64-nonatomic helpers, writeq_relaxed() will end up falling back to

[PATCH 5/7] iommu/arm-smmu: Tidy up 64-bit/atomic I/O accesses

2016-04-13 Thread Robin Murphy
With {read,write}q_relaxed now able to fall back to the common nonatomic-hi-lo helper, make use of that so that we don't have to open-code our own. In the process, also convert the other remaining split accesses, and repurpose the custom accessor to smooth out the couple of troublesome instances wh

[PATCH 7/7] iommu/arm-smmu: Support SMMUv1 64KB supplement

2016-04-13 Thread Robin Murphy
The 64KB Translation Granule Supplement to the SMMUv1 architecture allows an SMMUv1 implementation to support 64KB pages for stage 2 translations, using a constrained VMSAv8 descriptor format limited to 40-bit addresses. Now that we can freely mix and match context formats, we can actually handle h

[PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config

2016-04-13 Thread Robin Murphy
The way the driver currently forces an AArch32 or AArch64 context format based on the kernel config and SMMU architecture version is suboptimal, in that it makes it very hard to support oddball mix-and-match cases like the SMMUv1 64KB supplement, or situations where the reduced table depth of an AA

[PATCH v2 0/7] Intel IOMMU scalability improvements

2016-04-13 Thread Adam Morrison
This patchset improves the scalability of the Intel IOMMU code by resolving two spinlock bottlenecks, yielding up to ~5x performance improvement and approaching iommu=off performance. For example, here's the throughput obtained by 16 memcached instances running on a 16-core Sandy Bridge system, ac

[PATCH v2 2/7] iommu: per-cpu deferred invalidation queues

2016-04-13 Thread Adam Morrison
From: Omer Peleg The IOMMU's IOTLB invalidation is a costly process. When iommu mode is not set to "strict", it is done asynchronously. Current code amortizes the cost of invalidating IOTLB entries by batching all the invalidations in the system and performing a single global invalidation instea

[PATCH v2 1/7] iommu: refactoring of deferred flush entries

2016-04-13 Thread Adam Morrison
From: Omer Peleg Currently, deferred flushes' info is striped between several lists in the flush tables. Instead, move all information about a specific flush to a single entry in this table. This patch does not introduce any functional change. Signed-off-by: Omer Peleg [m...@cs.technion.ac.il:

[PATCH v2 6/7] iommu: change intel-iommu to use IOVA frame numbers

2016-04-13 Thread Adam Morrison
From: Omer Peleg Make intel-iommu map/unmap/invalidate work with IOVA pfns instead of pointers to "struct iova". This avoids using the iova struct from the IOVA red-black tree and the resulting explicit find_iova() on unmap. This patch will allow us to cache IOVAs in the next patch, in order to

[PATCH v2 7/7] iommu: introduce per-cpu caching to iova allocation

2016-04-13 Thread Adam Morrison
From: Omer Peleg IOVA allocation has two problems that impede high-throughput I/O. First, it can do a linear search over the allocated IOVA ranges. Second, the rbtree spinlock that serializes IOVA allocations becomes contended. Address these problems by creating an API for caching allocated IOVA

[PATCH v2 5/7] iommu: avoid dev iotlb logic in intel-iommu for domains with no dev iotlbs

2016-04-13 Thread Adam Morrison
From: Omer Peleg This patch avoids taking the device_domain_lock in iommu_flush_dev_iotlb() for domains with no dev iotlb devices. Signed-off-by: Omer Peleg [g...@google.com: fixed locking issues] Signed-off-by: Godfrey van der Linden [m...@cs.technion.ac.il: rebased and reworded the commit me

[PATCH v2 4/7] iommu: only unmap mapped entries

2016-04-13 Thread Adam Morrison
From: Omer Peleg Current unmap implementation unmaps the entire area covered by the IOVA range, which is a power-of-2 aligned region. The corresponding map, however, only maps those pages originally mapped by the user. This discrepancy can lead to unmapping of already unmapped entries, which is u

[PATCH v2 3/7] iommu: correct flush_unmaps pfn usage

2016-04-13 Thread Adam Morrison
From: Omer Peleg Change flush_unmaps() to correctly pass iommu_flush_iotlb_psi() dma addresses. (Intel mm and dma have the same size for pages at the moment, but this usage improves consistency.) Signed-off-by: Omer Peleg [m...@cs.technion.ac.il: rebased and reworded the commit message] Signed

Re: [PATCH v2 4/7] iommu: only unmap mapped entries

2016-04-13 Thread Shaohua Li
On Wed, Apr 13, 2016 at 09:52:00PM +0300, Adam Morrison wrote: > @@ -3738,7 +3743,16 @@ static void intel_unmap_sg(struct device *dev, struct > scatterlist *sglist, > int nelems, enum dma_data_direction dir, > struct dma_attrs *attrs) > { > -

Re: [PATCH v2 7/7] iommu: introduce per-cpu caching to iova allocation

2016-04-13 Thread Shaohua Li
On Wed, Apr 13, 2016 at 09:52:33PM +0300, Adam Morrison wrote: > From: Omer Peleg > > IOVA allocation has two problems that impede high-throughput I/O. > First, it can do a linear search over the allocated IOVA ranges. > Second, the rbtree spinlock that serializes IOVA allocations becomes > conte

Re: [PATCH 1/7] iommu/arm-smmu: Differentiate specific implementations

2016-04-13 Thread Chalamarla, Tirumalesh
On 4/13/16, 10:12 AM, "Robin Murphy" wrote: >As the inevitable reality of implementation-specific errata workarounds >begin to accrue alongside our integration quirk handling, it's about >time the driver had a decent way of keeping track. Extend the per-SMMU >data so we can identify specific

Re: [PATCH 2/7] iommu/arm-smmu: Convert ThunderX workaround to new method

2016-04-13 Thread Chalamarla, Tirumalesh
On 4/13/16, 10:12 AM, "Robin Murphy" wrote: >With a framework for implementation-specific funtionality in place, the >currently-FDT-dependent ThunderX workaround gets to be the first user. > >Signed-off-by: Robin Murphy >--- > drivers/iommu/arm-smmu.c | 27 ++- > 1 fi

[PATCH] iommu/amd: Set AMD iommu callbacks for platform bus driver

2016-04-13 Thread Wan Zongshun
From: Wan Zongshun AMD has more drivers will use ACPI to platform bus driver later, all those devices need iommu support, such as eMMC acpi driver. Signed-off-by: Wan Zongshun --- drivers/iommu/amd_iommu.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/iommu/amd_iommu.c b/driv