Fwd: Re: sparc64: Build failure due to commit f1600e549b94 (sparc: Make sparc64 use scalable lib/iommu-common.c functions)

2015-04-19 Thread Sowmini Varadhan
On (04/19/15 14:09), David Miller wrote: On (04/18/15 21:23), Guenter Roeck wrote: lib/built-in.o:(.discard+0x1): multiple definition of `__pcpu_unique_iommu_pool_hash' arch/powerpc/kernel/built-in.o:(.discard+0x18): first defined here .. I get a similar failure in the

[PATCH v10 1/3] Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-09 Thread Sowmini Varadhan
infrastructure. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input

[PATCH v10 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-04-09 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com Acked-by: Benjamin

[PATCH v10 3/3] sparc: Make LDC use common iommu poll management functions

2015-04-09 Thread Sowmini Varadhan
, with a typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- v3: added this file to be a consumer of the common iommu library v4: removed -cookie_to_index and -demap from

[PATCH v10 0/3] Generic IOMMU pooled allocator

2015-04-09 Thread Sowmini Varadhan
tag, and new mail Message-Id. Sowmini Varadhan (3): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Make LDC use common iommu poll management functions arch/sparc/include/asm/iommu_64.h |7 +- arch

Re: [PATCHv9 RFC 1/3] Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-08 Thread Sowmini Varadhan
On (04/08/15 18:30), Benjamin Herrenschmidt wrote: I'm happy with your last version, feel free to add my Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org sounds good, I'll do this and rese a non-RFC version today. Thanks for all the feedback - it was very useful to me, and I'm

[PATCH v9 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-04-05 Thread Sowmini Varadhan
, with a typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v3: added this file to be a consumer of the common iommu library v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead inline these calls into ldc before

[PATCH v9 RFC 0/3] Generic IOMMU pooled allocator

2015-04-05 Thread Sowmini Varadhan
Addresses latest BenH comments: need_flush checks, add support for dma mask and align_order. Sowmini Varadhan (3): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Make LDC use common iommu poll management

[PATCHv9 RFC 1/3] Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-05 Thread Sowmini Varadhan
infrastructure. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size

[PATCH v9 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-04-05 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2: moved sparc

Re: [PATCHv9 RFC 1/3] Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-05 Thread Sowmini Varadhan
On (04/05/15 22:26), Benjamin Herrenschmidt wrote: So you decided to keep the logic here that updates the hint instead of just getting rid of need_flush alltogether ? Out of curiosity, what's the rationale ? Did you find a reason why resetting the hint in those two cases (rather than just

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-04 Thread Sowmini Varadhan
One last question before I spin out v9.. the dma_mask code is a bit confusing to me, so I want to make sure... the code is if (limit + tbl-it_offset mask) { limit = mask - tbl-it_offset + 1; /* If we're constrained on address range, first try * at

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Sowmini Varadhan
Just want to confirm: + again: + if (pass == 0 handle *handle + (*handle = pool-start) (*handle pool-end)) + start = *handle; + else + start = pool-hint; Now this means handle might be pool-hint, in that case you also need a lazy flush. Or

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-03 Thread Sowmini Varadhan
On (04/04/15 08:06), Benjamin Herrenschmidt wrote: No, I meant n pool-hint, ie, the start of the newly allocated block. ah, got it. I'll do my drill with patchset and get back, probably by Monday. --Sowmini ___ Linuxppc-dev mailing list

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
On (04/03/15 08:57), Benjamin Herrenschmidt wrote: I only just noticed too, you completely dropped the code to honor the dma mask. Why that ? Some devices rely on this. /* Sowmini's comment about this coming from sparc origins.. */ Probably, not that many devices have limits on DMA

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
On (04/03/15 07:54), Benjamin Herrenschmidt wrote: + limit = pool-end; + + /* The case below can happen if we have a small segment appended +* to a large, or when the previous alloc was at the very end of +* the available space. If so, go back to the beginning and flush. +

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-04-02 Thread Sowmini Varadhan
On (03/31/15 23:12), David Miller wrote: It's much more amortized with smart buffering strategies, which are common on current generation networking cards. There you only eat one map/unmap per PAGE_SIZE / rx_pkt_size. Maybe the infiniband stuff is doing things very suboptimally, and

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-04-02 Thread Sowmini Varadhan
the other question that comes to my mind is: the whole lazy_flush optimization probably works best when there is exactly one pool, and no large pools. In most other cases, we'd end up doing a lazy_flush when we wrap within our pool itself, losing the benefit of that optimization. Given that the

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Sowmini Varadhan
On 03/31/2015 09:01 PM, Benjamin Herrenschmidt wrote: On Tue, 2015-03-31 at 14:06 -0400, Sowmini Varadhan wrote: Having bravely said that.. the IB team informs me that they see a 10% degradation using the spin_lock as opposed to the trylock. one path going forward is to continue processing

[PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-31 Thread Sowmini Varadhan
infrastructure. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size

[PATCH v8 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-03-31 Thread Sowmini Varadhan
, with a typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v3: added this file to be a consumer of the common iommu library v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead inline these calls into ldc before

[PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Sowmini Varadhan
the trylock, probably need to revisit this (and then probably start by re-exmaining the hash function to avoid collisions, before resorting to trylock). Sowmini Varadhan (3): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c

Re: [PATCH v8 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-31 Thread Sowmini Varadhan
On (03/31/15 15:15), David Laight wrote: I've wondered whether the iommu setup for ethernet receive (in particular) could be made much more efficient if there were a function that would unmap one buffer and map a second buffer? My thought is that iommu pte entry used by the old buffer could

Re: [PATCH v8 RFC 0/3] Generic IOMMU pooled allocator

2015-03-31 Thread Sowmini Varadhan
On (03/31/15 10:40), Sowmini Varadhan wrote: I've not heard back from the IB folks, but I'm going to make a judgement call here and go with the spin_lock. *If* they report some significant benefit from the trylock, probably need to revisit this (and then probably start by re-exmaining

[PATCH v8 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-03-31 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2: moved sparc

Re: [PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-30 Thread Sowmini Varadhan
On (03/30/15 21:55), Benjamin Herrenschmidt wrote: No that's not my point. The lock is only taken for a short time but might still collide, the bouncing in that case will probably (at least that's my feeling) hurt more than help. However, I have another concern with your construct.

Re: [PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-30 Thread Sowmini Varadhan
On (03/30/15 14:24), Benjamin Herrenschmidt wrote: + +#define IOMMU_POOL_HASHBITS 4 +#define IOMMU_NR_POOLS (1 IOMMU_POOL_HASHBITS) I don't like those macros. You changed the value from what we had on powerpc. It could be that the new values are as good for us but I'd like

Re: [PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-30 Thread Sowmini Varadhan
On (03/31/15 08:28), Benjamin Herrenschmidt wrote: Provided that the IB test doesn't come up with a significant difference, I definitely vote for the simpler version of doing a normal spin_lock. sounds good. let me wait for the confirmation from IB, and I'll send out patchv8 soon after.

Re: [PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-30 Thread Sowmini Varadhan
On (03/30/15 09:01), Sowmini Varadhan wrote: So I tried looking at the code, and perhaps there is some arch-specific subtlety here that I am missing, but where does spin_lock itself do the cpu_relax? afaict, LOCK_CONTENDED() itself does not have this. To answer my question: I'd missed

Re: [PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-27 Thread Sowmini Varadhan
On (03/26/15 08:05), Benjamin Herrenschmidt wrote: PowerPC folks, what do you think? I'll give it another look today. Cheers, Ben. Hi Ben, did you have a chance to look at this? --Sowmini ___ Linuxppc-dev mailing list

Re: Generic IOMMU pooled allocator

2015-03-26 Thread Sowmini Varadhan
On (03/25/15 21:43), casca...@linux.vnet.ibm.com wrote: However, when using large TCP send/recv (I used uperf with 64KB writes/reads), I noticed that on the transmit side, largealloc is not used, but on the receive side, cxgb4 almost only uses largealloc, while qlge seems to have a 1/1 usage

[PATCH v7 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-03-25 Thread Sowmini Varadhan
, with a typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v3: added this file to be a consumer of the common iommu library v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead inline these calls into ldc before

[PATCH v7 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-03-25 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2: moved sparc

[PATCH v7 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread Sowmini Varadhan
Changes from patchv6: moved pool_hash initialization to lib/iommu-common.c and cleaned up code duplication from sun4v/sun4u/ldc. Sowmini (2): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Sowmini

[PATCH v7 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-25 Thread Sowmini Varadhan
. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size is not very

Re: [PATCH v6 0/3] Generic IOMMU pooled allocator

2015-03-25 Thread Sowmini Varadhan
On (03/24/15 18:16), David Miller wrote: Generally this looks fine to me. But about patch #2, I see no reason to have multiple iommu_pool_hash tables. Even from a purely sparc perspective, we can always just do with just one of them. Furthermore, you can even probably move it down into

[PATCH v6 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-03-24 Thread Sowmini Varadhan
, with a typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v3: added this file to be a consumer of the common iommu library v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead inline these calls into ldc before

[PATCH v6 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-24 Thread Sowmini Varadhan
. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size is not very

[PATCH v6 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-03-24 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2: moved sparc

[PATCH v6 0/3] Generic IOMMU pooled allocator

2015-03-24 Thread Sowmini Varadhan
table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Sowmini Varadhan (1): Make LDC use common iommu poll management functions arch/sparc/include/asm/iommu_64.h |7 +- arch/sparc/kernel/iommu.c | 182

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/23/15 12:29), David Miller wrote: In order to elide the IOMMU flush as much as possible, I implemnented a scheme for sun4u wherein we always allocated from low IOMMU addresses to high IOMMU addresses. In this regime, we only need to flush the IOMMU when we rolled over back to low

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/23/15 15:05), David Miller wrote: Why add performance regressions to old machines who already are suffering too much from all the bloat we are constantly adding to the kernel? I have no personal opinion on this- it's a matter of choosing whether we want to have some extra baggage in

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/24/15 11:47), Benjamin Herrenschmidt wrote: Yes, pass a function pointer argument that can be NULL or just make it a member of the iommu_allocator struct (or whatever you call it) passed to the init function and that can be NULL. My point is we don't need a separate ops structure.

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/24/15 09:21), Benjamin Herrenschmidt wrote: So we have two choices here that I can see: - Keep that old platform use the old/simpler allocator Problem with that approach is that the base struct iommu structure for sparc gets a split personality: the older one is used with the older

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
On (03/24/15 09:36), Benjamin Herrenschmidt wrote: - One pool only - Whenever the allocation is before the previous hint, do a flush, that should only happen if a wrap around occurred or in some cases if the device DMA mask forced it. I think we always update the hint whenever we

Re: Generic IOMMU pooled allocator

2015-03-23 Thread Sowmini Varadhan
benh It might be sufficient to add a flush counter and compare it between runs benh if actual wall-clock benchmarks are too hard to do (especially if you benh don't have things like very fast network cards at hand). benh benh Number of flush / number of packets might be a sufficient metric, it..

Re: Generic IOMMU pooled allocator

2015-03-22 Thread Sowmini Varadhan
On (03/23/15 09:02), Benjamin Herrenschmidt wrote: How does this relate to the ARM implementation? There is currently an effort going on to make that one shared with ARM64 and possibly x86. Has anyone looked at both the PowerPC and ARM ways of doing the allocation to see if we could pick

Re: Generic IOMMU pooled allocator

2015-03-22 Thread Sowmini Varadhan
Turned out that I was able to iterate over it, and remove both the -cookie_to_index and the -demap indirection from iommu_tbl_ops. That leaves only the odd iommu_flushall() hook, I'm trying to find the history behind that (needed for sun4u platforms, afaik, and not sure if there are other ways to

[PATCH v5 RFC 1/3] sparc: Break up monolithic iommu table/lock into finer graularity pools and lock

2015-03-22 Thread Sowmini Varadhan
. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2 changes: - incorporate David Miller editorial comments: sparc specific fields moved from iommu-common into sparc's iommu_64.h - make the npools value an input parameter, for the case when the iommu map size is not very

[PATCH v5 RFC 3/3] sparc: Make LDC use common iommu poll management functions

2015-03-22 Thread Sowmini Varadhan
, with a typical request for 1-4 pages. Thus LDC uses npools == 1. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v3: added this file to be a consumer of the common iommu library v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead inline these calls into ldc before

[PATCH v5 RFC 0/3] Generic IOMMU pooled allocator

2015-03-22 Thread Sowmini Varadhan
the skip_span_boundary argument to iommu_tbl_pool_init() for those callers like LDC which do no care about span boundary checks. Sowmini (2): Break up monolithic iommu table/lock into finer graularity pools and lock Make sparc64 use scalable lib/iommu-common.c functions Sowmini Varadhan (1

[PATCH v5 RFC 2/3] sparc: Make sparc64 use scalable lib/iommu-common.c functions

2015-03-22 Thread Sowmini Varadhan
settings (TSO enabled) :9-9.5 Gbps Disable TSO using ethtool- drops badly: 2-3 Gbps. After this patch, iperf client with 10 threads, can give a throughput of at least 8.5 Gbps, even when TSO is disabled. Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com --- v2: moved sparc

Re: Generic IOMMU pooled allocator

2015-03-19 Thread Sowmini Varadhan
On 03/19/2015 02:01 PM, Benjamin Herrenschmidt wrote: Ben One thing I noticed is the asymetry in your code between the alloc Ben and the free path. The alloc path is similar to us in that the lock Ben covers the allocation and that's about it, there's no actual mapping to Ben the HW done, it's