On (04/19/15 14:09), David Miller wrote:
On (04/18/15 21:23), Guenter Roeck wrote:
lib/built-in.o:(.discard+0x1): multiple definition of
`__pcpu_unique_iommu_pool_hash'
arch/powerpc/kernel/built-in.o:(.discard+0x18): first defined here
.. I get a similar failure in the
infrastructure.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
Acked-by: Benjamin
, with a typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org
---
v3: added this file to be a consumer of the common iommu library
v4: removed -cookie_to_index and -demap from
tag, and new mail Message-Id.
Sowmini Varadhan (3):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Make LDC use common iommu poll management functions
arch/sparc/include/asm/iommu_64.h |7 +-
arch
On (04/08/15 18:30), Benjamin Herrenschmidt wrote:
I'm happy with your last version, feel free to add my
Acked-by: Benjamin Herrenschmidt b...@kernel.crashing.org
sounds good, I'll do this and rese a non-RFC version today.
Thanks for all the feedback - it was very useful to me, and
I'm
, with a typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v3: added this file to be a consumer of the common iommu library
v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead
inline these calls into ldc before
Addresses latest BenH comments: need_flush checks, add support
for dma mask and align_order.
Sowmini Varadhan (3):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Make LDC use common iommu poll management
infrastructure.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2: moved sparc
On (04/05/15 22:26), Benjamin Herrenschmidt wrote:
So you decided to keep the logic here that updates the hint instead of
just getting rid of need_flush alltogether ?
Out of curiosity, what's the rationale ? Did you find a reason why
resetting the hint in those two cases (rather than just
One last question before I spin out v9.. the dma_mask code
is a bit confusing to me, so I want to make sure... the code is
if (limit + tbl-it_offset mask) {
limit = mask - tbl-it_offset + 1;
/* If we're constrained on address range, first try
* at
Just want to confirm:
+ again:
+ if (pass == 0 handle *handle
+ (*handle = pool-start) (*handle pool-end))
+ start = *handle;
+ else
+ start = pool-hint;
Now this means handle might be pool-hint, in that case you also
need a lazy flush. Or
On (04/04/15 08:06), Benjamin Herrenschmidt wrote:
No, I meant n pool-hint, ie, the start of the newly allocated
block.
ah, got it. I'll do my drill with patchset and get back, probably by
Monday.
--Sowmini
___
Linuxppc-dev mailing list
On (04/03/15 08:57), Benjamin Herrenschmidt wrote:
I only just noticed too, you completely dropped the code to honor
the dma mask. Why that ? Some devices rely on this.
/* Sowmini's comment about this coming from sparc origins.. */
Probably, not that many devices have limits on DMA
On (04/03/15 07:54), Benjamin Herrenschmidt wrote:
+ limit = pool-end;
+
+ /* The case below can happen if we have a small segment appended
+* to a large, or when the previous alloc was at the very end of
+* the available space. If so, go back to the beginning and flush.
+
On (03/31/15 23:12), David Miller wrote:
It's much more amortized with smart buffering strategies, which are
common on current generation networking cards.
There you only eat one map/unmap per PAGE_SIZE / rx_pkt_size.
Maybe the infiniband stuff is doing things very suboptimally, and
the other question that comes to my mind is: the whole lazy_flush
optimization probably works best when there is exactly one pool,
and no large pools. In most other cases, we'd end up doing a lazy_flush
when we wrap within our pool itself, losing the benefit of that
optimization.
Given that the
On 03/31/2015 09:01 PM, Benjamin Herrenschmidt wrote:
On Tue, 2015-03-31 at 14:06 -0400, Sowmini Varadhan wrote:
Having bravely said that..
the IB team informs me that they see a 10% degradation using
the spin_lock as opposed to the trylock.
one path going forward is to continue processing
infrastructure.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size
, with a typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v3: added this file to be a consumer of the common iommu library
v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead
inline these calls into ldc before
the trylock, probably
need to revisit this (and then probably start by re-exmaining
the hash function to avoid collisions, before resorting to
trylock).
Sowmini Varadhan (3):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c
On (03/31/15 15:15), David Laight wrote:
I've wondered whether the iommu setup for ethernet receive (in particular)
could be made much more efficient if there were a function that
would unmap one buffer and map a second buffer?
My thought is that iommu pte entry used by the old buffer could
On (03/31/15 10:40), Sowmini Varadhan wrote:
I've not heard back from the IB folks, but I'm going to make
a judgement call here and go with the spin_lock. *If* they
report some significant benefit from the trylock, probably
need to revisit this (and then probably start by re-exmaining
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2: moved sparc
On (03/30/15 21:55), Benjamin Herrenschmidt wrote:
No that's not my point. The lock is only taken for a short time but
might still collide, the bouncing in that case will probably (at least
that's my feeling) hurt more than help.
However, I have another concern with your construct.
On (03/30/15 14:24), Benjamin Herrenschmidt wrote:
+
+#define IOMMU_POOL_HASHBITS 4
+#define IOMMU_NR_POOLS (1 IOMMU_POOL_HASHBITS)
I don't like those macros. You changed the value from what we had on
powerpc. It could be that the new values are as good for us but I'd
like
On (03/31/15 08:28), Benjamin Herrenschmidt wrote:
Provided that the IB test doesn't come up with a significant difference,
I definitely vote for the simpler version of doing a normal spin_lock.
sounds good. let me wait for the confirmation from IB,
and I'll send out patchv8 soon after.
On (03/30/15 09:01), Sowmini Varadhan wrote:
So I tried looking at the code, and perhaps there is some arch-specific
subtlety here that I am missing, but where does spin_lock itself
do the cpu_relax? afaict, LOCK_CONTENDED() itself does not have this.
To answer my question:
I'd missed
On (03/26/15 08:05), Benjamin Herrenschmidt wrote:
PowerPC folks, what do you think?
I'll give it another look today.
Cheers,
Ben.
Hi Ben,
did you have a chance to look at this?
--Sowmini
___
Linuxppc-dev mailing list
On (03/25/15 21:43), casca...@linux.vnet.ibm.com wrote:
However, when using large TCP send/recv (I used uperf with 64KB
writes/reads), I noticed that on the transmit side, largealloc is not
used, but on the receive side, cxgb4 almost only uses largealloc, while
qlge seems to have a 1/1 usage
, with a typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v3: added this file to be a consumer of the common iommu library
v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead
inline these calls into ldc before
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2: moved sparc
Changes from patchv6: moved pool_hash initialization to
lib/iommu-common.c and cleaned up code duplication from
sun4v/sun4u/ldc.
Sowmini (2):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Sowmini
.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not very
On (03/24/15 18:16), David Miller wrote:
Generally this looks fine to me.
But about patch #2, I see no reason to have multiple iommu_pool_hash
tables. Even from a purely sparc perspective, we can always just do
with just one of them.
Furthermore, you can even probably move it down into
, with a typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v3: added this file to be a consumer of the common iommu library
v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead
inline these calls into ldc before
.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not very
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2: moved sparc
table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Sowmini Varadhan (1):
Make LDC use common iommu poll management functions
arch/sparc/include/asm/iommu_64.h |7 +-
arch/sparc/kernel/iommu.c | 182
On (03/23/15 12:29), David Miller wrote:
In order to elide the IOMMU flush as much as possible, I implemnented
a scheme for sun4u wherein we always allocated from low IOMMU
addresses to high IOMMU addresses.
In this regime, we only need to flush the IOMMU when we rolled over
back to low
On (03/23/15 15:05), David Miller wrote:
Why add performance regressions to old machines who already are
suffering too much from all the bloat we are constantly adding to the
kernel?
I have no personal opinion on this- it's a matter of choosing
whether we want to have some extra baggage in
On (03/24/15 11:47), Benjamin Herrenschmidt wrote:
Yes, pass a function pointer argument that can be NULL or just make it a
member of the iommu_allocator struct (or whatever you call it) passed to
the init function and that can be NULL. My point is we don't need a
separate ops structure.
On (03/24/15 09:21), Benjamin Herrenschmidt wrote:
So we have two choices here that I can see:
- Keep that old platform use the old/simpler allocator
Problem with that approach is that the base struct iommu structure
for sparc gets a split personality: the older one is used with
the older
On (03/24/15 09:36), Benjamin Herrenschmidt wrote:
- One pool only
- Whenever the allocation is before the previous hint, do a flush, that
should only happen if a wrap around occurred or in some cases if the
device DMA mask forced it. I think we always update the hint whenever we
benh It might be sufficient to add a flush counter and compare it between runs
benh if actual wall-clock benchmarks are too hard to do (especially if you
benh don't have things like very fast network cards at hand).
benh
benh Number of flush / number of packets might be a sufficient metric, it..
On (03/23/15 09:02), Benjamin Herrenschmidt wrote:
How does this relate to the ARM implementation? There is currently
an effort going on to make that one shared with ARM64 and possibly
x86. Has anyone looked at both the PowerPC and ARM ways of doing the
allocation to see if we could pick
Turned out that I was able to iterate over it, and remove
both the -cookie_to_index and the -demap indirection from
iommu_tbl_ops.
That leaves only the odd iommu_flushall() hook, I'm trying
to find the history behind that (needed for sun4u platforms,
afaik, and not sure if there are other ways to
.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2 changes:
- incorporate David Miller editorial comments: sparc specific
fields moved from iommu-common into sparc's iommu_64.h
- make the npools value an input parameter, for the case when
the iommu map size is not very
, with a typical
request for 1-4 pages. Thus LDC uses npools == 1.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v3: added this file to be a consumer of the common iommu library
v4: removed -cookie_to_index and -demap from iommu_tbl_ops and instead
inline these calls into ldc before
the skip_span_boundary argument to iommu_tbl_pool_init() for
those callers like LDC which do no care about span boundary checks.
Sowmini (2):
Break up monolithic iommu table/lock into finer graularity pools and
lock
Make sparc64 use scalable lib/iommu-common.c functions
Sowmini Varadhan (1
settings (TSO enabled) :9-9.5 Gbps
Disable TSO using ethtool- drops badly: 2-3 Gbps.
After this patch, iperf client with 10 threads, can give a
throughput of at least 8.5 Gbps, even when TSO is disabled.
Signed-off-by: Sowmini Varadhan sowmini.varad...@oracle.com
---
v2: moved sparc
On 03/19/2015 02:01 PM, Benjamin Herrenschmidt wrote:
Ben One thing I noticed is the asymetry in your code between the alloc
Ben and the free path. The alloc path is similar to us in that the lock
Ben covers the allocation and that's about it, there's no actual mapping to
Ben the HW done, it's
53 matches
Mail list logo