Re: [PATCH v3 1/1] swiotlb: Reduce swiotlb pool lookups

2024-07-10 Thread Petr Tesařík
On Wed, 10 Jul 2024 07:55:20 +0200
Christoph Hellwig  wrote:

> On Tue, Jul 09, 2024 at 09:08:18PM +0200, Petr Tesařík wrote:
> > I'm confused. If you're not a big fan, why are we effectively adding
> > them to more places now than before the patch?  
> 
> Because I didn't want to second guess the patch author too much.

Fair enough. I don't have any relevant test cases either, so when/if
somebody encounters an issue, let them change it then.

Thanks!
Petr T



Re: [PATCH v3 1/1] swiotlb: Reduce swiotlb pool lookups

2024-07-09 Thread Petr Tesařík
On Tue, 9 Jul 2024 13:18:12 +0200
Christoph Hellwig  wrote:

> On Tue, Jul 09, 2024 at 11:10:13AM +0200, Petr Tesařík wrote:
> > Reviewed-by: Petr Tesarik   
> 
> Thanks.
> 
> > 
> > OK, so __swiotlb_find_pool() is now always declared (so the code
> > compiles), but if CONFIG_SWIOTLB_DYNAMIC=n, it is never defined. The
> > result still links, because the compiler optimizes away the whole
> > if-clause, so there are no references to an undefined symbol in the
> > object file.
> > 
> > I think I've already seen similar constructs elsewhere in the kernel,
> > so relying on the optimization seems to be common practice.  
> 
> Yes, it's a pretty common patter.  It's gone here now, though to not
> add the struct device field unconditionally.
> 
> > > +{
> > > + struct io_tlb_pool *pool = swiotlb_find_pool(dev, addr);
> > > +
> > > + if (unlikely(pool))
> > > + __swiotlb_tbl_unmap_single(dev, addr, size, dir, attrs, pool);
> > > +}
> > > +
> > > +static inline void swiotlb_sync_single_for_device(struct device *dev,
> > > + phys_addr_t addr, size_t size, enum dma_data_direction dir)
> > > +{
> > > + struct io_tlb_pool *pool = swiotlb_find_pool(dev, addr);
> > > +
> > > + if (unlikely(pool))
> > > + __swiotlb_sync_single_for_device(dev, addr, size, dir, pool);  
> > 
> > We're adding an unlikely() here, which wasn't originally present in
> > iommu_dma_sync_single_for_device(). OTOH it should do no harm, and it
> > was most likely an omission.   
> 
> I'm honestly not a big fan of the unlikely annotations unlike they
> are proven to make a difference.  Normally the runtime branch predictor
> should do a really good job here, and for some uses this will not
> just be likely but the only case.

I'm confused. If you're not a big fan, why are we effectively adding
them to more places now than before the patch?

FWIW on all architectures I've worked with (arm64, ia64, powerpc,
s390x, x86_64), the dynamic branch predictor takes precedence over
static prediction (yes, even ia64 held on to those mandatory .spnt.few
and .sptk.many hints only if there was no entry in the branch
predictor). But on every branch instruction, the compiler must decide
which code follows the branch instruction and which code is placed at
another place. The latter often needs another jump instruction to
continue executing instructions _after_ the whole conditional clause,
which is slightly less efficient. The compiler must make this decision
even if there is no likely() or unlikely() annotation, but in that case
it uses some (rather crude) heuristics of its own, e.g. whether the
condition covers the majority or minority of the value range.

Long story short, in this case "if (pool)" does pretty much the same
job as "if (likely(pool))". But we say "if (unlikely(pool)))"...

You needn't change the code, but I want this documented somewhere that
people can find it, i.e. in LKML archives.

Petr T



Re: [PATCH v3 1/1] swiotlb: Reduce swiotlb pool lookups

2024-07-09 Thread Petr Tesařík
On Mon,  8 Jul 2024 12:41:00 -0700
mhkelle...@gmail.com wrote:

> From: Michael Kelley 
> 
> With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
> in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
> places, the pool is found and used in one function, and then must
> be found again in the next function that is called because only the
> tlb_addr is passed as an argument. These are the six call sites:
> 
> dma_direct_map_page:
> 1. swiotlb_map->swiotlb_tbl_map_single->swiotlb_bounce
> 
> dma_direct_unmap_page:
> 2. dma_direct_sync_single_for_cpu->is_swiotlb_buffer
> 3. dma_direct_sync_single_for_cpu->swiotlb_sync_single_for_cpu->
>   swiotlb_bounce
> 4. is_swiotlb_buffer
> 5. swiotlb_tbl_unmap_single->swiotlb_del_transient
> 6. swiotlb_tbl_unmap_single->swiotlb_release_slots
> 
> Reduce the number of calls by finding the pool at a higher level, and
> passing it as an argument instead of searching again. A key change is
> for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
> and then pass this pool pointer to subsequent swiotlb functions.
> 
> There are 9 occurrences of is_swiotlb_buffer() used to test if a buffer
> is a swiotlb buffer before calling a swiotlb function. To reduce code
> duplication in getting the pool pointer and passing it as an argument,
> introduce inline wrappers for this pattern. The generated code is
> essentially unchanged.
> 
> Since is_swiotlb_buffer() no longer returns a boolean, rename some
> functions to reflect the change:
> * swiotlb_find_pool() becomes __swiotlb_find_pool()
> * is_swiotlb_buffer() becomes swiotlb_find_pool()
> * is_xen_swiotlb_buffer() becomes xen_swiotlb_find_pool()
> 
> With these changes, a round-trip map/unmap pair requires only 2 pool
> lookups (listed using the new names and wrappers):
> 
> dma_direct_unmap_page:
> 1. dma_direct_sync_single_for_cpu->swiotlb_find_pool
> 2. swiotlb_tbl_unmap_single->swiotlb_find_pool
> 
> These changes come from noticing the inefficiencies in a code review,
> not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
> __swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
> so avoiding the redundant calls helps performance in a hot path.
> When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
> is minimal and the perf benefits are likely negligible, but no
> harm is done.
> 
> No functional change is intended.
> 
> Signed-off-by: Michael Kelley 

Reviewed-by: Petr Tesarik 

Find a few notes inline, mostly FYI:

> ---
> 
> Changes in v3:
> * Add inline wrappers for swiotlb_tbl_unmap_single(),
>   swiotlb_sync_single_for_device() and swiotlb_sync_single_for_cpu() to
>   commonize the swiotlb_find_pool() tests for whether the buffer is a
>   swiotlb buffer [Christoph Hellwig]
> * Change most users of __swiotlb_find_pool() to use swiotlb_find_pool(),
>   as the extra checks in the latter aren't impactful. Remove v2 change in
>   swiotlb_find_pool() to use __swiotlb_find_pool() when
>   CONFIG_SWIOTLB_DYNAMIC is not set. [Christoph Hellwig]
> * Rework swiotlb_find_pool() to use IS_ENABLED() instead of #ifdef's.
>   To make this work, move dma_uses_io_tlb field in struct device out from
>   under #ifdef CONFIG_SWIOTLB_DYNAMIC. [Christoph Hellwig]
> * Fix line lengths > 80 chars [Christoph Hellwig]
> * Update commit message to reflect changes
> 
> Changes in v2 vs. RFC version[1]:
> * In swiotlb_find_pool(), use __swiotlb_find_pool() instead of open
>   coding when CONFIG_SWIOTLB_DYNAMIC is not set [Petr Tesařík]
> * Rename functions as described in the commit message to reflect that
>   is_swiotlb_buffer() no longer returns a boolean [Petr Tesařík,
>   Christoph Hellwig]
> * Updated commit message and patch Subject
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20240607031421.182589-1-mhkli...@outlook.com/
> 
>  drivers/iommu/dma-iommu.c   |  11 ++-
>  drivers/xen/swiotlb-xen.c   |  31 +---
>  include/linux/device.h  |   3 +-
>  include/linux/scatterlist.h |   2 +-
>  include/linux/swiotlb.h | 138 +---
>  kernel/dma/direct.c |  10 +--
>  kernel/dma/direct.h |   9 +--
>  kernel/dma/swiotlb.c|  64 +
>  8 files changed, 150 insertions(+), 118 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 43520e7275cc..7b4486238427 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1081,8 +1081,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
> *dev,
>   if (!dev_is_dma_coherent(dev))
>   arch_sync_dma_for_cpu(phys, size, dir);

Re: [PATCH v2 1/1] swiotlb: Reduce swiotlb pool lookups

2024-07-07 Thread Petr Tesařík
On Sun, 7 Jul 2024 02:11:48 +
Michael Kelley  wrote:

> From: Christoph Hellwig  Sent: Friday, July 5, 2024 10:50 PM
> > 
> > Hi Michael,
> > 
> > I like the idea behind this, but can you respin it to avoid some of
> > the added code duplication.  We have a lot of this pattern:
> > 
> > pool = swiotlb_find_pool(dev, paddr);
> > if (pool)
> > swiotlb_foo(dev, ...
> > 
> > duplicated in all three swiotlb users.  If we rename the original
> > swiotlb_foo to __swiotlb_foo and add a little inline wrapper this is
> > de-duplicated and also avoids exposing swiotlb_find_pool to the
> > callers.  
> 
> This works pretty well. It certainly avoids the messiness of declaring
> a "pool" local variable and needing a separate assignment before the
> "if" statement, in each of the 9 call sites. The small downside is that
> it looks like a swiotlb function is called every time, even though
> there's usually an inline bailout. But that pattern occurs throughout
> the kernel, so not a big deal.
> 
> I initially coded this change as a separate patch that goes first. But
> the second patch ends up changing about 20 lines that are changed
> by the first patch. It's hard to cleanly tease them apart. So I've gone
> back to a single unified patch. But let me know if you think it's worth
> the extra churn to break them apart.
> 
> > 
> > If we then stub out swiotlb_find_pool to return NULL for !CONFIG_SWIOTLB,
> > we also don't need extra stubs for all the __swiotlb_ helpers as the
> > compiler will eliminate the calls as dead code.  
> 
> Yes, this works as long as the declarations for the __swiotlb_foo
> functions are *not* under CONFIG_SWIOTLB. But when compiling with
> !CONFIG_SWIOTLB on arm64 with gcc-8.5.0, two tangentially related
> compile errors occur. iommu_dma_map_page() references
> swiotlb_tlb_map_single(). The declaration for the latter is under
> CONFIG_SWIOTLB. A similar problem occurs with dma_direct_map_page()
> and swiotlb_map(). Do later versions of gcc not complain when the
> reference is in dead code? Or are these just bugs that occurred because
> !CONFIG_SWIOTLB is rare? If the latter, I can submit a separate patch to
> move the declarations out from under CONFIG_SWIOTLB.
> 
> > 
> > I might be missing something, but what is the reason for using the
> > lower-level __swiotlb_find_pool in swiotlb_map and xen_swiotlb_map_page?
> > I can't see a reason why the simple checks in swiotlb_find_pool itself
> > are either wrong or a performance problem there.
> 
> Yes, swiotlb_find_pool() could be used instead of __swiotlb_find_pool().
> 
> > Because if we don't
> > need these separate calls we can do away with __swiotlb_find_pool
> > for !CONFIG_SWIOTLB_DYNAMIC and simplify swiotlb_find_pool quite
> > a bit like this:
> > 
> > ...
> > 
> > if (!mem)
> > return NULL;
> > 
> > if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) {  
> 
> The "IS_ENABLED" doesn't work because the dma_uses_io_tlb
> field in struct dev is under CONFIG_SWIOTLB_DYNAMIC. I guess
> it could be moved out, but that's going further afield. So I'm back
> to using #ifdef.
> 
> > smp_rmb();
> > if (!READ_ONCE(dev->dma_uses_io_tlb))
> > return NULL;
> > return __swiotlb_find_pool(dev, paddr);
> > }
> > 
> > if (paddr < mem->defpool.start || paddr >= mem->defpool.end)
> > return NULL;
> > return &dev->dma_io_tlb_mem->defpool;  
> 
> Petr Tesařík had commented [1] on my original RFC suggesting that
> __swiotlb_find_pool() be used here instead of open coding it. With
> the changes you suggest, __swiotlb_find_pool() is needed only in
> the CONFIG_SWIOTLB_DYNAMIC case, and I would be fine with just
> open coding the address of defpool here. Petr -- are you OK with
> removing __swiotlb_find_pool when !CONFIG_SWIOTLB_DYNAMIC,
> since this is the only place it would be used?

Yes. I have never had strong opinion about it, I merely saw the
opportunity when it was low-hanging fruit, but it's definitely not
worth adding complexity.

Petr T



Re: [RFC 1/1] swiotlb: Reduce calls to swiotlb_find_pool()

2024-06-30 Thread Petr Tesařík
On Mon, 1 Jul 2024 06:36:15 +0200
"h...@lst.de"  wrote:

> On Sun, Jun 30, 2024 at 02:02:52PM +, Michael Kelley wrote:
> > 1) Rename is_swiotlb_buffer() to swiotlb_find_pool(), since it
> > now returns a pool.  A NULL return value indicates that the
> > paddr is not an swiotlb buffer.
> > 
> > 2) Similarly, rename is_xen_swiotlb_buffer() to
> > xen_swiotlb_find_pool()
> > 
> > 3) The existing swiotlb_find_pool() has the same function signature,
> > but it is used only where the paddr is known to be an swiotlb buffer
> > and hence always succeeds. Rename it to __swiotlb_find_pool() as
> > the "internal" version of swiotlb_find_pool().  
> 
> Sounds good.

Agreed. Most importantly, the "nice" name swiotlb_find_pool() is used
for external users. The difference between swiotlb_find_pool() and
__swiotlb_find_pool() is that the former can be used with any device,
and the latter (internal) only with devices that make some use of
swiotlb. The main reason to keep them separate is that the internal
function should not be inlined if CONFIG_SWIOTLB_DYNAMIC=y.

I hope somebody finds my explanation useful when they touch the code
again in a few years from now. ;-)

> > 4) Do you still want is_swiotlb_buffer() as a trivial wrapper around
> > the new swiotlb_find_pool(), for use solely in dma_direct_need_sync()
> > where only a Boolean is needed and not the pool?  
> 
> If there is really just a single caller left we can skip the wrapper,
> otherwise it might be handy.

AFAICS dma_direct_need_sync() is the only such place.

Petr T



Re: [RFC 1/1] swiotlb: Reduce calls to swiotlb_find_pool()

2024-06-28 Thread Petr Tesařík
V Fri, 28 Jun 2024 08:01:29 +0200
"h...@lst.de"  napsáno:

> On Thu, Jun 27, 2024 at 04:02:59PM +, Michael Kelley wrote:
> > > > Conceptually, it's still being used as a boolean function based on
> > > > whether the return value is NULL.  Renaming it to swiotlb_get_pool()
> > > > more accurately describes the return value, but obscures the
> > > > intent of determining if it is a swiotlb buffer.  I'll think about it.
> > > > Suggestions are welcome.  
> > > 
> > > Just keep is_swiotlb_buffer as a trivial inline helper that returns
> > > bool.  
> > 
> > I don't understand what you are suggesting.  Could you elaborate a bit?
> > is_swiotlb_buffer() can't be trivial when CONFIG_SWIOTLB_DYNAMIC
> > is set.  
> 
> Call the main function that finds and retuns the pool swiotlb_find_pool,
> and then have a is_swiotlb_buffer wrapper that just returns bool.
> 

I see. That's not my point. After applying Michael's patch, the return
value is always used, except here:

bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
{
return !dev_is_dma_coherent(dev) ||
   is_swiotlb_buffer(dev, dma_to_phys(dev, dma_addr));
}

I don't think this one occurrence in the entire source tree is worth a
separate inline function.

If nobody has a better idea, I'm not really offended by keeping the
original name, is_swiotlb_buffer(). It would just become the only
function which starts with "is_" and provides more information in the
return value than a simple yes/no, and I thought there must be an
unwritten convention about that.

Petr T



Re: [RFC 1/1] swiotlb: Reduce calls to swiotlb_find_pool()

2024-06-27 Thread Petr Tesařík
On Thu,  6 Jun 2024 20:14:21 -0700
mhkelle...@gmail.com wrote:

> From: Michael Kelley 
> 
> With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
> in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
> places, the pool is found and used in one function, and then must
> found again in the next function that is called because only the
> tlb_addr is passed as an argument. These are the six call sites:
> 
> dma_direct_map_page:
> 1. swiotlb_map->swiotlb_tbl_map_single->swiotlb_bounce
> 
> dma_direct_unmap_page:
> 2. dma_direct_sync_single_for_cpu->is_swiotlb_buffer
> 3. dma_direct_sync_single_for_cpu->swiotlb_sync_single_for_cpu->
>   swiotlb_bounce
> 4. is_swiotlb_buffer
> 5. swiotlb_tbl_unmap_single->swiotlb_del_transient
> 6. swiotlb_tbl_unmap_single->swiotlb_release_slots
> 
> Reduce the number of calls by finding the pool at a higher level, and
> passing it as an argument instead of searching again. A key change is
> for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
> and then pass this pool pointer to subsequent swiotlb functions.
> With these changes, a round-trip map/unmap pair requires only 2 calls
> to swiotlb_find_pool():
> 
> dma_direct_unmap_page:
> 1. dma_direct_sync_single_for_cpu->is_swiotlb_buffer
> 2. is_swiotlb_buffer
> 
> These changes come from noticing the inefficiencies in a code review,
> not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
> swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
> so avoiding the redundant calls helps performance in a hot path.
> When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
> is minimal and the perf benefits are likely negligible, but no
> harm is done.
> 
> No functional change is intended.
> 
> Signed-off-by: Michael Kelley 
> ---
> This patch trades off making many of the core swiotlb APIs take
> an additional argument in order to avoid duplicating calls to
> swiotlb_find_pool(). The current code seems rather wasteful in
> making 6 calls per round-trip, but I'm happy to accept others'
> judgment as to whether getting rid of the waste is worth the
> additional code complexity.
> 
>  drivers/iommu/dma-iommu.c | 27 ++--
>  drivers/xen/swiotlb-xen.c | 25 +++---
>  include/linux/swiotlb.h   | 54 +--
>  kernel/dma/direct.c   | 12 ++---
>  kernel/dma/direct.h   | 18 -
>  kernel/dma/swiotlb.c  | 43 ---
>  6 files changed, 106 insertions(+), 73 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f731e4b2a417..ab6bc37ecf90 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1073,6 +1073,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
> *dev,
>   dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
>  {
>   phys_addr_t phys;
> + struct io_tlb_pool *pool;
>  
>   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
>   return;
> @@ -1081,21 +1082,25 @@ static void iommu_dma_sync_single_for_cpu(struct 
> device *dev,
>   if (!dev_is_dma_coherent(dev))
>   arch_sync_dma_for_cpu(phys, size, dir);
>  
> - if (is_swiotlb_buffer(dev, phys))
> - swiotlb_sync_single_for_cpu(dev, phys, size, dir);
> + pool = is_swiotlb_buffer(dev, phys);
> + if (pool)
> + swiotlb_sync_single_for_cpu(dev, phys, size, dir, pool);
>  }
>  
>  static void iommu_dma_sync_single_for_device(struct device *dev,
>   dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
>  {
>   phys_addr_t phys;
> + struct io_tlb_pool *pool;
>  
>   if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
>   return;
>  
>   phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> - if (is_swiotlb_buffer(dev, phys))
> - swiotlb_sync_single_for_device(dev, phys, size, dir);
> +
> + pool = is_swiotlb_buffer(dev, phys);
> + if (pool)
> + swiotlb_sync_single_for_device(dev, phys, size, dir, pool);
>  
>   if (!dev_is_dma_coherent(dev))
>   arch_sync_dma_for_device(phys, size, dir);
> @@ -1189,8 +1194,12 @@ static dma_addr_t iommu_dma_map_page(struct device 
> *dev, struct page *page,
>   arch_sync_dma_for_device(phys, size, dir);
>  
>   iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
> - if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
> - swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> + if (iova == DMA_MAPPING_ERROR) {
> + struct io_tlb_pool *pool = is_swiotlb_buffer(dev, phys);
> +
> + if (pool)
> + swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs, 
> pool);
> + }
>   return iova;
>  }
>  
> @@ -1199,6 +1208,7 @@ static void iommu_dma_unmap_page(struct device 

Re: [RFC 1/1] swiotlb: Reduce calls to swiotlb_find_pool()

2024-06-26 Thread Petr Tesařík
On Thu, 27 Jun 2024 08:02:51 +0200
"h...@lst.de"  wrote:

> On Wed, Jun 26, 2024 at 11:58:13PM +, Michael Kelley wrote:
> > > This patch trades off making many of the core swiotlb APIs take
> > > an additional argument in order to avoid duplicating calls to
> > > swiotlb_find_pool(). The current code seems rather wasteful in
> > > making 6 calls per round-trip, but I'm happy to accept others'
> > > judgment as to whether getting rid of the waste is worth the
> > > additional code complexity.  
> > 
> > Quick ping on this RFC.  Is there any interest in moving forward?
> > Quite a few lines of code are affected because of adding the
> > additional "pool" argument to several functions, but the change
> > is conceptually pretty simple.  
> 
> Yes, this looks sensible to me.  I'm tempted to apply it.

Oh, right. The idea is good, but I was not able to reply immediately
and then forgot about it.

For the record, I considered an alternative: Call swiotlb_* functions
unconditionally and bail out early if the pool is NULL. But it's no
good, because is_swiotlb_buffer() can be inlined, so this approach
would replace a quick check with a function call. And then there's also
swiotlb_tbl_unmap_single()...

I have only a very minor suggestion: Could is_swiotlb_buffer() be
renamed now that it no longer returns a bool? OTOH I have no good
immediate idea myself.

Petr T



Re: [PATCH 2/2] iommu/dma: Fix zero'ing of bounce buffer padding used by untrusted devices

2024-05-06 Thread Petr Tesařík
V Sun,  7 Apr 2024 21:11:42 -0700
mhkelle...@gmail.com napsáno:

> From: Michael Kelley 
> 
> iommu_dma_map_page() allocates swiotlb memory as a bounce buffer when
> an untrusted device wants to map only part of the memory in an
> granule. The goal is to disallow the untrusted device having
> DMA access to unrelated kernel data that may be sharing the granule.
> To meet this goal, the bounce buffer itself is zero'ed, and any
> additional swiotlb memory up to alloc_size after the bounce buffer
> end (i.e., "post-padding") is also zero'ed.
> 
> However, as of commit 901c7280ca0d ("Reinstate some of "swiotlb: rework
> "fix info leak with DMA_FROM_DEVICE"""), swiotlb_tbl_map_single()
> always initializes the contents of the bounce buffer to the original
> memory. Zero'ing the bounce buffer is redundant and probably wrong per
> the discussion in that commit. Only the post-padding needs to be
> zero'ed.
> 
> Also, when the DMA min_align_mask is non-zero, the allocated bounce
> buffer space may not start on a granule boundary. The swiotlb memory
> from the granule boundary to the start of the allocated bounce buffer
> might belong to some unrelated bounce buffer. So as described in the
> "second issue" in [1], it can't be zero'ed to protect against untrusted
> devices. But as of commit  ("swiotlb: extend buffer
> pre-padding to alloc_align_mask if necessary"), swiotlb_tbl_map_single()

This is now commit af133562d5af.

> allocates pre-padding slots when necessary to meet min_align_mask
> requirements, making it possible to zero the pre-padding area as well.
> 
> Finally, iommu_dma_map_page() uses the swiotlb for untrusted devices
> and also for certain kmalloc() memory. Current code does the zero'ing
> for both cases, but it is needed only for the untrusted device case.
> 
> Fix all of this by updating iommu_dma_map_page() to zero both the
> pre-padding and post-padding areas, but not the actual bounce buffer.
> Do this only in the case where the bounce buffer is used because
> of an untrusted device.
> 
> [1] https://lore.kernel.org/all/20210929023300.335969-1-steve...@google.com/
> 
> Signed-off-by: Michael Kelley 
> ---
> I've wondered if this code for zero'ing the pre- and post-padding
> should go in swiotlb_tbl_map_single(). The bounce buffer proper is
> already being initialized there. But swiotlb_tbl_map_single()
> would need to test for an untrusted device (or have a "zero the
> padding" flag passed in as part of the "attrs" argument), which
> adds complexity. Thoughts?

Historically, swiotlb has never cared about exposing data from a
previous user of a bounce buffer. I assume that's because it was
pointless to make an attempt at protecting system memory from a
malicious device that can do DMA to any address anyway. The situation
has changed with hardware IOMMUs, and that could be why the zeroing is
only done in the IOMMU path.

In short, if anybody can explain the value of concealing potentially
sensitive data from devices that are not behind an IOMMU, let's move
the zeroing to swiotlb. Otherwise, let's keep what we have.

Other than that (and the missing commit id), the patch looks good to me.

Reviewed-by: Petr Tesarik 

Petr T

> 
> The commit ID of Petr's patch is X'ed out above because Petr's patch
> hasn't gone into Linus' tree yet. We can add the real commit ID once
> this patch is ready to go in.
> 
> Also I've haven't used any "Fixes:" tags. This patch really should
> be backported only if all the other recent swiotlb fixes get
> backported, and I'm unclear on whether that will happen.
> 
>  drivers/iommu/dma-iommu.c | 29 -
>  include/linux/iova.h  |  5 +
>  2 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index c21ef1388499..ecac39b3190d 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1154,9 +1154,6 @@ static dma_addr_t iommu_dma_map_page(struct device 
> *dev, struct page *page,
>*/
>   if (dev_use_swiotlb(dev, size, dir) &&
>   iova_offset(iovad, phys | size)) {
> - void *padding_start;
> - size_t padding_size, aligned_size;
> -
>   if (!is_swiotlb_active(dev)) {
>   dev_warn_once(dev, "DMA bounce buffers are inactive, 
> unable to map unaligned transaction.\n");
>   return DMA_MAPPING_ERROR;
> @@ -1164,24 +1161,30 @@ static dma_addr_t iommu_dma_map_page(struct device 
> *dev, struct page *page,
>  
>   trace_swiotlb_bounced(dev, phys, size);
>  
> - aligned_size = iova_align(iovad, size);
>   phys = swiotlb_tbl_map_single(dev, phys, size,
> iova_mask(iovad), dir, attrs);
>  
>   if (phys == DMA_MAPPING_ERROR)
>   return DMA_MAPPING_ERROR;
>  
> - /* Cleanup the padding area. */
> - padding_start = phys_to_virt

Re: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()

2024-05-06 Thread Petr Tesařík
On Mon, 6 May 2024 15:14:05 +
Michael Kelley  wrote:

> From: mhkelle...@gmail.com 
> >   
> 
> Gentle ping ...
> 
> Anyone interested in reviewing this series of two patches?  It fixes
> an edge case bug in the size of the swiotlb request coming from
> dma-iommu, and plugs a hole that allows untrusted devices to see
> kernel data unrelated to the intended DMA transfer.  I think these are
> the last "known bugs" that came out of the extensive swiotlb discussion
> and patches for 6.9.
> 
> Michael
> 
> > Currently swiotlb_tbl_map_single() takes alloc_align_mask and
> > alloc_size arguments to specify an swiotlb allocation that is
> > larger than mapping_size. This larger allocation is used solely
> > by iommu_dma_map_single() to handle untrusted devices that should
> > not have DMA visibility to memory pages that are partially used
> > for unrelated kernel data.
> > 
> > Having two arguments to specify the allocation is redundant. While
> > alloc_align_mask naturally specifies the alignment of the starting
> > address of the allocation, it can also implicitly specify the size
> > by rounding up the mapping_size to that alignment.
> > 
> > Additionally, the current approach has an edge case bug.
> > iommu_dma_map_page() already does the rounding up to compute the
> > alloc_size argument. But swiotlb_tbl_map_single() then calculates
> > the alignment offset based on the DMA min_align_mask, and adds
> > that offset to alloc_size. If the offset is non-zero, the addition
> > may result in a value that is larger than the max the swiotlb can
> > allocate. If the rounding up is done _after_ the alignment offset is
> > added to the mapping_size (and the original mapping_size conforms to
> > the value returned by swiotlb_max_mapping_size), then the max that the
> > swiotlb can allocate will not be exceeded.
> > 
> > In view of these issues, simplify the swiotlb_tbl_map_single() interface
> > by removing the alloc_size argument. Most call sites pass the same
> > value for mapping_size and alloc_size, and they pass alloc_align_mask
> > as zero. Just remove the redundant argument from these callers, as they
> > will see no functional change. For iommu_dma_map_page() also remove
> > the alloc_size argument, and have swiotlb_tbl_map_single() compute
> > the alloc_size by rounding up mapping_size after adding the offset
> > based on min_align_mask. This has the side effect of fixing the
> > edge case bug but with no other functional change.
> > 
> > Also add a sanity test on the alloc_align_mask. While IOMMU code
> > currently ensures the granule is not larger than PAGE_SIZE, if
> > that guarantee were to be removed in the future, the downstream
> > effect on the swiotlb might go unnoticed until strange allocation
> > failures occurred.
> > 
> > Tested on an ARM64 system with 16K page size and some kernel
> > test-only hackery to allow modifying the DMA min_align_mask and
> > the granule size that becomes the alloc_align_mask. Tested these
> > combinations with a variety of original memory addresses and
> > sizes, including those that reproduce the edge case bug:
> > 
> > * 4K granule and 0 min_align_mask
> > * 4K granule and 0xFFF min_align_mask (4K - 1)
> > * 16K granule and 0xFFF min_align_mask
> > * 64K granule and 0xFFF min_align_mask
> > * 64K granule and 0x3FFF min_align_mask (16K - 1)
> > 
> > With the changes, all combinations pass.
> > 
> > Signed-off-by: Michael Kelley 

Looks good to me. My previous discussion was not related to this
change; I was merely trying to find an answer to your question whether
anything else should be changed, and IIUC the result was that not.

Reviewed-by: Petr Tesarik 

Petr T

> > ---
> > I've haven't used any "Fixes:" tags. This patch really should be
> > backported only if all the other recent swiotlb fixes get backported,
> > and I'm unclear on whether that will happen.
> > 
> > I saw the brief discussion about removing the "dir" parameter from
> > swiotlb_tbl_map_single(). That removal could easily be done as part
> > of this patch, since it's already changing the swiotlb_tbl_map_single()
> > parameters. But I think the conclusion of the discussion was to leave
> > the "dir" parameter for symmetry with the swiotlb_sync_*() functions.
> > Please correct me if that's wrong, and I'll respin this patch to do
> > the removal.
> > 
> >  drivers/iommu/dma-iommu.c |  2 +-
> >  drivers/xen/swiotlb-xen.c |  2 +-
> >  include/linux/swiotlb.h   |  2 +-
> >  kernel/dma/swiotlb.c  | 56 +--
> >  4 files changed, 45 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 07d087eecc17..c21ef1388499 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -1165,7 +1165,7 @@ static dma_addr_t iommu_dma_map_page(struct device 
> > *dev, struct page *page,
> > trace_swiotlb_bounced(dev, phys, size);
> > 
> > aligned_size = iova_align(iovad, siz

Re: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()

2024-04-15 Thread Petr Tesařík
On Mon, 15 Apr 2024 13:03:30 +
Michael Kelley  wrote:

> From: Petr Tesařík  Sent: Monday, April 15, 2024 5:50 AM
> > 
> > On Mon, 15 Apr 2024 12:23:22 +
> > Michael Kelley  wrote:
> >   
> > > From: Petr Tesařík  Sent: Monday, April 15, 2024 4:46 
> > > AM  
> > > >
> > > > Hi Michael,
> > > >
> > > > sorry for taking so long to answer. Yes, there was no agreement on the
> > > > removal of the "dir" parameter, but I'm not sure it's because of
> > > > symmetry with swiotlb_sync_*(), because the topic was not really
> > > > discussed.
> > > >
> > > > The discussion was about the KUnit test suite and whether direction is
> > > > a property of the bounce buffer or of each sync operation. Since DMA API
> > > > defines associates each DMA buffer with a direction, the direction
> > > > parameter passed to swiotlb_sync_*() should match what was passed to
> > > > swiotlb_tbl_map_single(), because that's how it is used by the generic
> > > > DMA code. In other words, if the parameter is kept, it should be kept
> > > > to match dma_map_*().
> > > >
> > > > However, there is also symmetry with swiotlb_tbl_unmap_single(). This
> > > > function does use the parameter for the final sync. I believe there
> > > > should be a matching initial sync in swiotlb_tbl_map_single(). In
> > > > short, the buffer sync for DMA non-coherent devices should be moved from
> > > > swiotlb_map() to swiotlb_tbl_map_single(). If this sync is not needed,
> > > > then the caller can (and should) include DMA_ATTR_SKIP_CPU_SYNC in
> > > > the flags parameter.
> > > >
> > > > To sum it up:
> > > >
> > > > * Do *NOT* remove the "dir" parameter.
> > > > * Let me send a patch which moves the initial buffer sync.
> > > >  
> > >
> > > I'm not seeing the need to move the initial buffer sync.  All
> > > callers of swiotlb_tbl_map_single() already have a subsequent
> > > check for a non-coherent device, and a call to
> > > arch_sync_dma_for_device().  And the Xen code has some
> > > special handling that probably shouldn't go in
> > > swiotlb_tbl_map_single().  Or am I missing something?  
> > 
> > Oh, sure, there's nothing broken ATM. It's merely a cleanup. The API is
> > asymmetric and thus confusing. You get a final sync by default if you
> > call swiotlb_tbl_unmap_single(),   
> 
> I don't see that final sync in swiotlb_tbl_unmap_single().  It calls
> swiotlb_bounce() to copy the data, but it doesn't deal with
> non-coherent devices or call arch_sync_dma_for_cpu().

Ouch. You're right! The buffer gets only bounced but not synced if
device DMA is non-coherent. So, how is this supposed to work?

Now I'm looking at the code in dma_direct_map_page(), and it calls
arch_sync_dma_for_device() explicitly, _except_ when using SWIOTLB. So,
maybe I should instead review all callers of swiotlb_map(), make sure
that they handle non-coherent devices, and then remove the sync from
swiotlb_map()?

I mean, the current situation seems somewhat disorganized to me.

Petr T



Re: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()

2024-04-15 Thread Petr Tesařík
On Mon, 15 Apr 2024 12:23:22 +
Michael Kelley  wrote:

> From: Petr Tesařík  Sent: Monday, April 15, 2024 4:46 AM
> > 
> > Hi Michael,
> > 
> > sorry for taking so long to answer. Yes, there was no agreement on the
> > removal of the "dir" parameter, but I'm not sure it's because of
> > symmetry with swiotlb_sync_*(), because the topic was not really
> > discussed.
> > 
> > The discussion was about the KUnit test suite and whether direction is
> > a property of the bounce buffer or of each sync operation. Since DMA API
> > defines associates each DMA buffer with a direction, the direction
> > parameter passed to swiotlb_sync_*() should match what was passed to
> > swiotlb_tbl_map_single(), because that's how it is used by the generic
> > DMA code. In other words, if the parameter is kept, it should be kept
> > to match dma_map_*().
> > 
> > However, there is also symmetry with swiotlb_tbl_unmap_single(). This
> > function does use the parameter for the final sync. I believe there
> > should be a matching initial sync in swiotlb_tbl_map_single(). In
> > short, the buffer sync for DMA non-coherent devices should be moved from
> > swiotlb_map() to swiotlb_tbl_map_single(). If this sync is not needed,
> > then the caller can (and should) include DMA_ATTR_SKIP_CPU_SYNC in
> > the flags parameter.
> > 
> > To sum it up:
> > 
> > * Do *NOT* remove the "dir" parameter.
> > * Let me send a patch which moves the initial buffer sync.
> >   
> 
> I'm not seeing the need to move the initial buffer sync.  All
> callers of swiotlb_tbl_map_single() already have a subsequent
> check for a non-coherent device, and a call to 
> arch_sync_dma_for_device().  And the Xen code has some 
> special handling that probably shouldn't go in
> swiotlb_tbl_map_single().  Or am I missing something?

Oh, sure, there's nothing broken ATM. It's merely a cleanup. The API is
asymmetric and thus confusing. You get a final sync by default if you
call swiotlb_tbl_unmap_single(), but you don't get an initial sync by
default if you call swiotlb_tbl_map_single(). This is difficult to
remember, so potential new users of the API may incorrectly assume that
an initial sync is done, or that a final sync is not done.

And yes, when moving the code, all current users of
swiotlb_tbl_map_single() should specify DMA_ATTR_SKIP_CPU_SYNC.

Petr T



Re: [PATCH 1/2] swiotlb: Remove alloc_size argument to swiotlb_tbl_map_single()

2024-04-15 Thread Petr Tesařík
On Sun,  7 Apr 2024 21:11:41 -0700
mhkelle...@gmail.com wrote:

> From: Michael Kelley 
> 
> Currently swiotlb_tbl_map_single() takes alloc_align_mask and
> alloc_size arguments to specify an swiotlb allocation that is
> larger than mapping_size. This larger allocation is used solely
> by iommu_dma_map_single() to handle untrusted devices that should
> not have DMA visibility to memory pages that are partially used
> for unrelated kernel data.
> 
> Having two arguments to specify the allocation is redundant. While
> alloc_align_mask naturally specifies the alignment of the starting
> address of the allocation, it can also implicitly specify the size
> by rounding up the mapping_size to that alignment.
> 
> Additionally, the current approach has an edge case bug.
> iommu_dma_map_page() already does the rounding up to compute the
> alloc_size argument. But swiotlb_tbl_map_single() then calculates
> the alignment offset based on the DMA min_align_mask, and adds
> that offset to alloc_size. If the offset is non-zero, the addition
> may result in a value that is larger than the max the swiotlb can
> allocate. If the rounding up is done _after_ the alignment offset is
> added to the mapping_size (and the original mapping_size conforms to
> the value returned by swiotlb_max_mapping_size), then the max that the
> swiotlb can allocate will not be exceeded.
> 
> In view of these issues, simplify the swiotlb_tbl_map_single() interface
> by removing the alloc_size argument. Most call sites pass the same
> value for mapping_size and alloc_size, and they pass alloc_align_mask
> as zero. Just remove the redundant argument from these callers, as they
> will see no functional change. For iommu_dma_map_page() also remove
> the alloc_size argument, and have swiotlb_tbl_map_single() compute
> the alloc_size by rounding up mapping_size after adding the offset
> based on min_align_mask. This has the side effect of fixing the
> edge case bug but with no other functional change.
> 
> Also add a sanity test on the alloc_align_mask. While IOMMU code
> currently ensures the granule is not larger than PAGE_SIZE, if
> that guarantee were to be removed in the future, the downstream
> effect on the swiotlb might go unnoticed until strange allocation
> failures occurred.
> 
> Tested on an ARM64 system with 16K page size and some kernel
> test-only hackery to allow modifying the DMA min_align_mask and
> the granule size that becomes the alloc_align_mask. Tested these
> combinations with a variety of original memory addresses and
> sizes, including those that reproduce the edge case bug:
> 
> * 4K granule and 0 min_align_mask
> * 4K granule and 0xFFF min_align_mask (4K - 1)
> * 16K granule and 0xFFF min_align_mask
> * 64K granule and 0xFFF min_align_mask
> * 64K granule and 0x3FFF min_align_mask (16K - 1)
> 
> With the changes, all combinations pass.
> 
> Signed-off-by: Michael Kelley 
> ---
> I've haven't used any "Fixes:" tags. This patch really should be
> backported only if all the other recent swiotlb fixes get backported,
> and I'm unclear on whether that will happen.
> 
> I saw the brief discussion about removing the "dir" parameter from
> swiotlb_tbl_map_single(). That removal could easily be done as part
> of this patch, since it's already changing the swiotlb_tbl_map_single()
> parameters. But I think the conclusion of the discussion was to leave
> the "dir" parameter for symmetry with the swiotlb_sync_*() functions.
> Please correct me if that's wrong, and I'll respin this patch to do
> the removal.

Hi Michael,

sorry for taking so long to answer. Yes, there was no agreement on the
removal of the "dir" parameter, but I'm not sure it's because of
symmetry with swiotlb_sync_*(), because the topic was not really
discussed.

The discussion was about the KUnit test suite and whether direction is
a property of the bounce buffer or of each sync operation. Since DMA API
defines associates each DMA buffer with a direction, the direction
parameter passed to swiotlb_sync_*() should match what was passed to
swiotlb_tbl_map_single(), because that's how it is used by the generic
DMA code. In other words, if the parameter is kept, it should be kept
to match dma_map_*().

However, there is also symmetry with swiotlb_tbl_unmap_single(). This
function does use the parameter for the final sync. I believe there
should be a matching initial sync in swiotlb_tbl_map_single(). In
short, the buffer sync for DMA non-coherent devices should be moved from
swiotlb_map() to swiotlb_tbl_map_single(). If this sync is not needed,
then the caller can (and should) include DMA_ATTR_SKIP_CPU_SYNC in
the flags parameter.

To sum it up:

* Do *NOT* remove the "dir" parameter.
* Let me send a patch which moves the initial buffer sync.

Petr T

>  drivers/iommu/dma-iommu.c |  2 +-
>  drivers/xen/swiotlb-xen.c |  2 +-
>  include/linux/swiotlb.h   |  2 +-
>  kernel/dma/swiotlb.c  | 56
> +-- 4 files changed, 45
> insertions(+), 17 

Re: [PATCH v7 9/9] swiotlb: search the software IO TLB only if the device makes use of it

2023-09-07 Thread Petr Tesařík
Hi all,

sorry for my late reply; I've been away from my work setup for a
month...

On Wed, 30 Aug 2023 08:55:51 -0600
Jonathan Corbet  wrote:

> So it seems this code got merged without this question ever being
> answered.  Sorry if it's a dumb one, but I don't think this
> functionality works as advertised...

Yes, I believe the check was originally in is_swiotlb_buffer(), but it
got lost during one of the numerous rebases of this patch set. Let me
send a follow-up patch after making sure it actually works.

Petr T

> Thanks,
> 
> jon
> 
> Jonathan Corbet  writes:
> 
> > Petr Tesarik  writes:
> >  
> >> From: Petr Tesarik 
> >>
> >> Skip searching the software IO TLB if a device has never used it,
> >> making sure these devices are not affected by the introduction of
> >> multiple IO TLB memory pools.
> >>
> >> Additional memory barrier is required to ensure that the new value
> >> of the flag is visible to other CPUs after mapping a new bounce
> >> buffer. For efficiency, the flag check should be inlined, and then
> >> the memory barrier must be moved to is_swiotlb_buffer(). However,
> >> it can replace the existing barrier in swiotlb_find_pool(),
> >> because all callers use is_swiotlb_buffer() first to verify that
> >> the buffer address belongs to the software IO TLB.
> >>
> >> Signed-off-by: Petr Tesarik 
> >> ---  
> >
> > Excuse me if this is a silly question, but I'm not able to figure
> > it out on my own...
> >  
> >>  include/linux/device.h  |  2 ++
> >>  include/linux/swiotlb.h |  7 ++-
> >>  kernel/dma/swiotlb.c| 14 ++
> >>  3 files changed, 14 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/include/linux/device.h b/include/linux/device.h
> >> index 5fd89c9d005c..6fc808d22bfd 100644
> >> --- a/include/linux/device.h
> >> +++ b/include/linux/device.h
> >> @@ -628,6 +628,7 @@ struct device_physical_location {
> >>   * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver
> >> use.
> >>   * @dma_io_tlb_pools: List of transient swiotlb memory
> >> pools.
> >>   * @dma_io_tlb_lock:  Protects changes to the list of
> >> active pools.
> >> + * @dma_uses_io_tlb: %true if device has used the software IO TLB.
> >>   * @archdata: For arch-specific additions.
> >>   * @of_node:  Associated device tree node.
> >>   * @fwnode:   Associated device node supplied by platform
> >> firmware. @@ -737,6 +738,7 @@ struct device {
> >>  #ifdef CONFIG_SWIOTLB_DYNAMIC
> >>struct list_head dma_io_tlb_pools;
> >>spinlock_t dma_io_tlb_lock;
> >> +  bool dma_uses_io_tlb;  
> >
> > You add this new member here, fine...
> >  
> >>  #endif
> >>/* arch specific additions */
> >>struct dev_archdata archdata;
> >> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> >> index 8371c92a0271..b4536626f8ff 100644
> >> --- a/include/linux/swiotlb.h
> >> +++ b/include/linux/swiotlb.h
> >> @@ -172,8 +172,13 @@ static inline bool is_swiotlb_buffer(struct
> >> device *dev, phys_addr_t paddr) if (!mem)
> >>return false;
> >>  
> >> -  if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC))
> >> +  if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) {
> >> +  /* Pairs with smp_wmb() in swiotlb_find_slots()
> >> and
> >> +   * swiotlb_dyn_alloc(), which modify the RCU
> >> lists.
> >> +   */
> >> +  smp_rmb();
> >>return swiotlb_find_pool(dev, paddr);
> >> +  }
> >>return paddr >= mem->defpool.start && paddr <
> >> mem->defpool.end; }
> >>  
> >> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> >> index adf80dec42d7..d7eac84f975b 100644
> >> --- a/kernel/dma/swiotlb.c
> >> +++ b/kernel/dma/swiotlb.c
> >> @@ -730,7 +730,7 @@ static void swiotlb_dyn_alloc(struct
> >> work_struct *work) 
> >>add_mem_pool(mem, pool);
> >>  
> >> -  /* Pairs with smp_rmb() in swiotlb_find_pool(). */
> >> +  /* Pairs with smp_rmb() in is_swiotlb_buffer(). */
> >>smp_wmb();
> >>  }
> >>  
> >> @@ -764,11 +764,6 @@ struct io_tlb_pool *swiotlb_find_pool(struct
> >> device *dev, phys_addr_t paddr) struct io_tlb_mem *mem =
> >> dev->dma_io_tlb_mem; struct io_tlb_pool *pool;
> >>  
> >> -  /* Pairs with smp_wmb() in swiotlb_find_slots() and
> >> -   * swiotlb_dyn_alloc(), which modify the RCU lists.
> >> -   */
> >> -  smp_rmb();
> >> -
> >>rcu_read_lock();
> >>list_for_each_entry_rcu(pool, &mem->pools, node) {
> >>if (paddr >= pool->start && paddr < pool->end)
> >> @@ -813,6 +808,7 @@ void swiotlb_dev_init(struct device *dev)
> >>  #ifdef CONFIG_SWIOTLB_DYNAMIC
> >>INIT_LIST_HEAD(&dev->dma_io_tlb_pools);
> >>spin_lock_init(&dev->dma_io_tlb_lock);
> >> +  dev->dma_uses_io_tlb = false;  
> >
> > ...here you initialize it, fine...
> >  
> >>  #endif
> >>  }
> >>  
> >> @@ -1157,9 +1153,11 @@ static int swiotlb_find_slots(struct device
> >> *dev, phys_addr_t orig_addr, list_add_rcu(&pool->node,
> >> &dev->dma_io_tlb_pools);
> >> spin_unlock_irqrestore(&dev->dma_io_tlb_lock, flags); 
> >> -  /* Pairs with smp_rmb

Re: [PATCH v6 0/9] Allow dynamic allocation of software IO TLB bounce buffers

2023-07-31 Thread Petr Tesařík
V Mon, 31 Jul 2023 18:04:09 +0200
Christoph Hellwig  napsáno:

> I was just going to apply this, but patch 1 seems to have a non-trivial
> conflict with the is_swiotlb_active removal in pci-dma.c.  Can you resend
> against the current dma-mapping for-next tree?

Sure thing, will re-send tomorrow morning.

Petr T



Re: [PATCH v4 2/8] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-07-20 Thread Petr Tesařík
On Thu, 20 Jul 2023 10:01:10 +0200
Christoph Hellwig  wrote:

> On Thu, Jul 20, 2023 at 09:56:09AM +0200, Petr Tesařík wrote:
> > On Thu, 20 Jul 2023 08:38:19 +0200
> > Christoph Hellwig  wrote:
> >   
> > > On Thu, Jul 13, 2023 at 05:23:13PM +0200, Petr Tesarik wrote:  
> > > > From: Petr Tesarik 
> > > > 
> > > > Add some kernel-doc comments and move the existing documentation of 
> > > > struct
> > > > io_tlb_slot to its correct location. The latter was forgotten in commit
> > > > 942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").
> > > > 
> > > > Use the opportunity to give swiotlb_do_find_slots() a more descriptive
> > > > name, which makes it clear how it differs from swiotlb_find_slots().
> > > 
> > > Please keep the swiotlb_ prefix.  Otherwise this looks good to me.  
> > 
> > Will do. Out of curiosity, why does it matter for a static (file-local)
> > function?  
> 
> Because it makes looking at stack traces much easier.

Got it. Thanks!

Petr T



Re: [PATCH v4 0/8] Allow dynamic allocation of software IO TLB bounce buffers

2023-07-20 Thread Petr Tesařík
On Thu, 20 Jul 2023 08:52:16 +0200
Christoph Hellwig  wrote:

> Just to add a highlevel comment here after I feel like I need a little
> more time to review the guts.
> 
> I'm still pretty concerned about the extra list that needs to be
> consulted in is_swiotlb_buffer, but I can't really think of
> anything better.  Maybe an xarray has better cache characteristics,
> but that one requires even more allocations in the low-level dma map
> path.
> 
> One thing I'd like to see for the next version is to make the
> new growing code a config option at least for now.  It is a pretty
> big change of the existing swiotlb behavior, and I want people to opt
> into it conciously.  Maybe we can drop the option again after a few
> years once everything has settled.

Fine with me. I removed it after all my testing showed no performance
impact as long as the size of the initial SWIOTLB is kept at the
default value (and sufficient for the workload), but it's OK for me if
dynamic SWIOTLB allocations are off by default.

OTOH I'd like to make it a boot-time option rather than build-time
option. Would that be OK for you?

Petr T



Re: [PATCH v4 8/8] swiotlb: search the software IO TLB only if a device makes use of it

2023-07-20 Thread Petr Tesařík
On Thu, 20 Jul 2023 08:47:44 +0200
Christoph Hellwig  wrote:

> Any reason this can't just do a list_empty_careful on the list
> instead of adding yet another field that grows struct device?

On which list?

The dma_io_tlb_pools list only contains transient pools, but a device
may use bounce buffers from a regular pool.

The dma_io_tlb_mem.pools list will always be non-empty, unless the
system runs without SWIOTLB.

On a system which does have a SWIOTLB, the flag allows to differentiate
between devices that actually use bounce buffers and devices that do
not (e.g. because they do not have any addressing limitations).

Petr T



Re: [PATCH v4 2/8] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-07-20 Thread Petr Tesařík
On Thu, 20 Jul 2023 08:38:19 +0200
Christoph Hellwig  wrote:

> On Thu, Jul 13, 2023 at 05:23:13PM +0200, Petr Tesarik wrote:
> > From: Petr Tesarik 
> > 
> > Add some kernel-doc comments and move the existing documentation of struct
> > io_tlb_slot to its correct location. The latter was forgotten in commit
> > 942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").
> > 
> > Use the opportunity to give swiotlb_do_find_slots() a more descriptive
> > name, which makes it clear how it differs from swiotlb_find_slots().  
> 
> Please keep the swiotlb_ prefix.  Otherwise this looks good to me.

Will do. Out of curiosity, why does it matter for a static (file-local)
function?

Petr T



Re: [PATCH v4 1/8] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-07-20 Thread Petr Tesařík
On Thu, 20 Jul 2023 08:37:44 +0200
Christoph Hellwig  wrote:

> On Thu, Jul 13, 2023 at 05:23:12PM +0200, Petr Tesarik wrote:
> > From: Petr Tesarik 
> > 
> > SWIOTLB implementation details should not be exposed to the rest of the
> > kernel. This will allow to make changes to the implementation without
> > modifying non-swiotlb code.
> > 
> > To avoid breaking existing users, provide helper functions for the few
> > required fields.
> > 
> > As a bonus, using a helper function to initialize struct device allows to
> > get rid of an #ifdef in driver core.
> > 
> > Signed-off-by: Petr Tesarik 
> > ---
> >  arch/arm/xen/mm.c  |  2 +-
> >  arch/mips/pci/pci-octeon.c |  2 +-
> >  arch/x86/kernel/pci-dma.c  |  2 +-
> >  drivers/base/core.c|  4 +---
> >  drivers/xen/swiotlb-xen.c  |  2 +-
> >  include/linux/swiotlb.h| 25 +++-
> >  kernel/dma/swiotlb.c   | 39 +-
> >  7 files changed, 67 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> > index 3d826c0b5fee..0f32c14eb786 100644
> > --- a/arch/arm/xen/mm.c
> > +++ b/arch/arm/xen/mm.c
> > @@ -125,7 +125,7 @@ static int __init xen_mm_init(void)
> > return 0;
> >  
> > /* we can work with the default swiotlb */
> > -   if (!io_tlb_default_mem.nslabs) {
> > +   if (!is_swiotlb_allocated()) {
> > rc = swiotlb_init_late(swiotlb_size_or_default(),
> >xen_swiotlb_gfp(), NULL);
> > if (rc < 0)  
> 
> I'd much rather move the already initialized check into
> swiotlb_init_late, which is a much cleaer interface.
> 
> > /* we can work with the default swiotlb */
> > -   if (!io_tlb_default_mem.nslabs) {
> > +   if (!is_swiotlb_allocated()) {
> > int rc = swiotlb_init_late(swiotlb_size_or_default(),
> >GFP_KERNEL, xen_swiotlb_fixup);
> > if (rc < 0)  
> 
> .. and would take care of this one as well.

Oh, you're right! These are the only two places that look at
io_tlb_default_mem.nslabs, and all they need is to avoid double
initialization. Makes perfect sense to move it inside
swiotlb_init_late().

> > +bool is_swiotlb_allocated(void)
> > +{
> > +   return !!io_tlb_default_mem.nslabs;  
> 
> Nit: no need for the !!, we can rely on the implicit promotion to
> bool.  But with the suggestion above the need for this helper
> should go away anyway.

Eh, yes. I initially declared the return type as int and then forgot to
change the return statement. But as you say, the whole function will go
away entirely.

Petr T



Re: [PATCH v4 1/8] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-07-17 Thread Petr Tesařík
On Mon, 17 Jul 2023 08:06:07 +0200
Philippe Mathieu-Daudé  wrote:

> Hi Petr,
> 
> On 13/7/23 17:23, Petr Tesarik wrote:
> > From: Petr Tesarik 
> > 
> > SWIOTLB implementation details should not be exposed to the rest of the
> > kernel. This will allow to make changes to the implementation without
> > modifying non-swiotlb code.
> > 
> > To avoid breaking existing users, provide helper functions for the few
> > required fields.
> > 
> > As a bonus, using a helper function to initialize struct device allows to
> > get rid of an #ifdef in driver core.
> > 
> > Signed-off-by: Petr Tesarik 
> > ---
> >   arch/arm/xen/mm.c  |  2 +-
> >   arch/mips/pci/pci-octeon.c |  2 +-
> >   arch/x86/kernel/pci-dma.c  |  2 +-
> >   drivers/base/core.c|  4 +---
> >   drivers/xen/swiotlb-xen.c  |  2 +-
> >   include/linux/swiotlb.h| 25 +++-
> >   kernel/dma/swiotlb.c   | 39 +-
> >   7 files changed, 67 insertions(+), 9 deletions(-)  
> 
> 
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index 4e52cd5e0bdc..07216af59e93 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -110,7 +110,6 @@ struct io_tlb_mem {
> > atomic_long_t used_hiwater;
> >   #endif
> >   };
> > -extern struct io_tlb_mem io_tlb_default_mem;
> >   
> >   static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t 
> > paddr)
> >   {
> > @@ -128,13 +127,22 @@ static inline bool is_swiotlb_force_bounce(struct 
> > device *dev)
> >   
> >   void swiotlb_init(bool addressing_limited, unsigned int flags);
> >   void __init swiotlb_exit(void);
> > +void swiotlb_dev_init(struct device *dev);
> >   size_t swiotlb_max_mapping_size(struct device *dev);
> > +bool is_swiotlb_allocated(void);
> >   bool is_swiotlb_active(struct device *dev);
> >   void __init swiotlb_adjust_size(unsigned long size);
> > +phys_addr_t default_swiotlb_start(void);
> > +phys_addr_t default_swiotlb_limit(void);  
> 
> Usually we use start/end, base/limit, low[est]/high[est] tuples.

I'm no big fan of start/end, because the "end" sometimes means "highest
within range" and sometimes "one past range", being responsible for an
impressive amount of off-by-one errors.

But I agree. When I decided against "end", I should have also replaced
"start" with "base". Well, this patch series will certainly see a v5,
so I'll change it there. Thanks for the suggestion!

Petr T

> Possibly clearer to rename, regardless:
> Reviewed-by: Philippe Mathieu-Daudé 
> 
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index 2b83e3ad9dca..873b077d7e37 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c  
> 
> 
> > @@ -958,6 +975,26 @@ bool is_swiotlb_active(struct device *dev)
> >   }
> >   EXPORT_SYMBOL_GPL(is_swiotlb_active);
> >   
> > +/**
> > + * default_swiotlb_start() - get the start of the default SWIOTLB
> > + *
> > + * Get the lowest physical address used by the default software IO TLB 
> > pool.
> > + */
> > +phys_addr_t default_swiotlb_start(void)
> > +{
> > +   return io_tlb_default_mem.start;
> > +}
> > +
> > +/**
> > + * default_swiotlb_limit() - get the highest address in the default SWIOTLB
> > + *
> > + * Get the highest physical address used by the default software IO TLB 
> > pool.  
> 
> (note you describe lowest/highest).
> 
> > + */
> > +phys_addr_t default_swiotlb_limit(void)
> > +{
> > +   return io_tlb_default_mem.end - 1;
> > +}
> > +
> >   #ifdef CONFIG_DEBUG_FS
> >   
> >   static int io_tlb_used_get(void *data, u64 *val)  
> 




Re: [PATCH v4 3/8] swiotlb: separate memory pool data from other allocator data

2023-07-13 Thread Petr Tesařík
On Thu, 13 Jul 2023 17:23:14 +0200
Petr Tesarik  wrote:

> From: Petr Tesarik 
> 
> Carve out memory pool specific fields from struct io_tlb_mem. The original
> struct now contains shared data for the whole allocator, while the new
> struct io_tlb_pool contains data that is specific to one memory pool of
> (potentially) many.
> 
> Allocate both structures together for restricted DMA pools to keep the
> error cleanup path simple.
> 
> Signed-off-by: Petr Tesarik 
> ---
>  include/linux/device.h  |   2 +-
>  include/linux/swiotlb.h |  47 +++
>  kernel/dma/swiotlb.c| 181 +---
>  3 files changed, 147 insertions(+), 83 deletions(-)
> 
> diff --git a/include/linux/device.h b/include/linux/device.h
> index bbaeabd04b0d..d9754a68ba95 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -625,7 +625,7 @@ struct device_physical_location {
>   * @dma_pools:   Dma pools (if dma'ble device).
>   * @dma_mem: Internal for coherent mem override.
>   * @cma_area:Contiguous memory area for dma allocations
> - * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
> + * @dma_io_tlb_mem: Software IO TLB allocator.  Not for driver use.
>   * @archdata:For arch-specific additions.
>   * @of_node: Associated device tree node.
>   * @fwnode:  Associated device node supplied by platform firmware.
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 39313c3a791a..d669e11e2827 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -62,8 +62,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
>  #ifdef CONFIG_SWIOTLB
>  
>  /**
> - * struct io_tlb_mem - IO TLB Memory Pool Descriptor
> - *
> + * struct io_tlb_pool - IO TLB memory pool descriptor
>   * @start:   The start address of the swiotlb memory pool. Used to do a quick
>   *   range check to see if the memory was in fact allocated by this
>   *   API.
> @@ -73,15 +72,36 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t 
> phys,
>   * @vaddr:   The vaddr of the swiotlb memory pool. The swiotlb memory pool
>   *   may be remapped in the memory encrypted case and store virtual
>   *   address for bounce buffer operation.
> - * @nslabs:  The number of IO TLB blocks (in groups of 64) between @start and
> - *   @end. For default swiotlb, this is command line adjustable via
> - *   setup_io_tlb_npages.
> + * @nslabs:  The number of IO TLB slots between @start and @end. For the
> + *   default swiotlb, this can be adjusted with a boot parameter,
> + *   see setup_io_tlb_npages().
> + * @used:The number of used IO TLB slots.
> + * @late_alloc:  %true if allocated using the page allocator.
> + * @nareas:  Number of areas in the pool.
> + * @area_nslabs: Number of slots in each area.
> + * @areas:   Array of memory area descriptors.
> + * @slots:   Array of slot descriptors.
> + */
> +struct io_tlb_pool {
> + phys_addr_t start;
> + phys_addr_t end;
> + void *vaddr;
> + unsigned long nslabs;
> + unsigned long used;

Oops. This member should not be re-introduced here after I removed it
with commit efa76afdde16...

I'm going to fix this in a v5, but I don't think it's critical enough
to make an immediate resend.

Petr T



Re: [PATCH v3 4/7] swiotlb: if swiotlb is full, fall back to a transient memory pool

2023-07-10 Thread Petr Tesařík
On Sat, 8 Jul 2023 15:18:32 +
"Michael Kelley (LINUX)"  wrote:

> From: Petr Tesařík  Sent: Friday, July 7, 2023 3:22 AM
> > 
> > On Fri, 7 Jul 2023 10:29:00 +0100
> > Greg Kroah-Hartman  wrote:
> >   
> > > On Thu, Jul 06, 2023 at 02:22:50PM +, Michael Kelley (LINUX) wrote:  
> > > > From: Greg Kroah-Hartman  Sent: Thursday, 
> > > > July 6,  
> > 2023 1:07 AM  
> > > > >
> > > > > On Thu, Jul 06, 2023 at 03:50:55AM +, Michael Kelley (LINUX) 
> > > > > wrote:  
> > > > > > From: Petr Tesarik  Sent: Tuesday, 
> > > > > > June 27, 2023  
> > > > > 2:54 AM  
> > > > > > >
> > > > > > > Try to allocate a transient memory pool if no suitable slots can 
> > > > > > > be found,
> > > > > > > except when allocating from a restricted pool. The transient pool 
> > > > > > > is just
> > > > > > > enough big for this one bounce buffer. It is inserted into a 
> > > > > > > per-device
> > > > > > > list of transient memory pools, and it is freed again when the 
> > > > > > > bounce
> > > > > > > buffer is unmapped.
> > > > > > >
> > > > > > > Transient memory pools are kept in an RCU list. A memory barrier 
> > > > > > > is
> > > > > > > required after adding a new entry, because any address within a 
> > > > > > > transient
> > > > > > > buffer must be immediately recognized as belonging to the 
> > > > > > > SWIOTLB, even if
> > > > > > > it is passed to another CPU.
> > > > > > >
> > > > > > > Deletion does not require any synchronization beyond RCU ordering
> > > > > > > guarantees. After a buffer is unmapped, its physical addresses 
> > > > > > > may no
> > > > > > > longer be passed to the DMA API, so the memory range of the 
> > > > > > > corresponding
> > > > > > > stale entry in the RCU list never matches. If the memory range 
> > > > > > > gets
> > > > > > > allocated again, then it happens only after a RCU quiescent state.
> > > > > > >
> > > > > > > Since bounce buffers can now be allocated from different pools, 
> > > > > > > add a
> > > > > > > parameter to swiotlb_alloc_pool() to let the caller know which 
> > > > > > > memory pool
> > > > > > > is used. Add swiotlb_find_pool() to find the memory pool 
> > > > > > > corresponding to
> > > > > > > an address. This function is now also used by 
> > > > > > > is_swiotlb_buffer(), because
> > > > > > > a simple boundary check is no longer sufficient.
> > > > > > >
> > > > > > > The logic in swiotlb_alloc_tlb() is taken from 
> > > > > > > __dma_direct_alloc_pages(),
> > > > > > > simplified and enhanced to use coherent memory pools if needed.
> > > > > > >
> > > > > > > Note that this is not the most efficient way to provide a bounce 
> > > > > > > buffer,
> > > > > > > but when a DMA buffer can't be mapped, something may (and will) 
> > > > > > > actually
> > > > > > > break. At that point it is better to make an allocation, even if 
> > > > > > > it may be
> > > > > > > an expensive operation.  
> > > > > >
> > > > > > I continue to think about swiotlb memory management from the 
> > > > > > standpoint
> > > > > > of CoCo VMs that may be quite large with high network and storage 
> > > > > > loads.
> > > > > > These VMs are often running mission-critical workloads that can't 
> > > > > > tolerate
> > > > > > a bounce buffer allocation failure.  To prevent such failures, the 
> > > > > > swiotlb
> > > > > > memory size must be overly large, which wastes memory.  
> > > > >
> > > > > If "mission critical workloads" are in a vm that allowes overcommit 
> > > > > and
> > > > > no control over other vms in that 

Re: [PATCH v3 4/7] swiotlb: if swiotlb is full, fall back to a transient memory pool

2023-07-07 Thread Petr Tesařík
On Fri, 7 Jul 2023 10:29:00 +0100
Greg Kroah-Hartman  wrote:

> On Thu, Jul 06, 2023 at 02:22:50PM +, Michael Kelley (LINUX) wrote:
> > From: Greg Kroah-Hartman  Sent: Thursday, July 
> > 6, 2023 1:07 AM  
> > > 
> > > On Thu, Jul 06, 2023 at 03:50:55AM +, Michael Kelley (LINUX) wrote:  
> > > > From: Petr Tesarik  Sent: Tuesday, June 
> > > > 27, 2023  
> > > 2:54 AM  
> > > > >
> > > > > Try to allocate a transient memory pool if no suitable slots can be 
> > > > > found,
> > > > > except when allocating from a restricted pool. The transient pool is 
> > > > > just
> > > > > enough big for this one bounce buffer. It is inserted into a 
> > > > > per-device
> > > > > list of transient memory pools, and it is freed again when the bounce
> > > > > buffer is unmapped.
> > > > >
> > > > > Transient memory pools are kept in an RCU list. A memory barrier is
> > > > > required after adding a new entry, because any address within a 
> > > > > transient
> > > > > buffer must be immediately recognized as belonging to the SWIOTLB, 
> > > > > even if
> > > > > it is passed to another CPU.
> > > > >
> > > > > Deletion does not require any synchronization beyond RCU ordering
> > > > > guarantees. After a buffer is unmapped, its physical addresses may no
> > > > > longer be passed to the DMA API, so the memory range of the 
> > > > > corresponding
> > > > > stale entry in the RCU list never matches. If the memory range gets
> > > > > allocated again, then it happens only after a RCU quiescent state.
> > > > >
> > > > > Since bounce buffers can now be allocated from different pools, add a
> > > > > parameter to swiotlb_alloc_pool() to let the caller know which memory 
> > > > > pool
> > > > > is used. Add swiotlb_find_pool() to find the memory pool 
> > > > > corresponding to
> > > > > an address. This function is now also used by is_swiotlb_buffer(), 
> > > > > because
> > > > > a simple boundary check is no longer sufficient.
> > > > >
> > > > > The logic in swiotlb_alloc_tlb() is taken from 
> > > > > __dma_direct_alloc_pages(),
> > > > > simplified and enhanced to use coherent memory pools if needed.
> > > > >
> > > > > Note that this is not the most efficient way to provide a bounce 
> > > > > buffer,
> > > > > but when a DMA buffer can't be mapped, something may (and will) 
> > > > > actually
> > > > > break. At that point it is better to make an allocation, even if it 
> > > > > may be
> > > > > an expensive operation.  
> > > >
> > > > I continue to think about swiotlb memory management from the standpoint
> > > > of CoCo VMs that may be quite large with high network and storage loads.
> > > > These VMs are often running mission-critical workloads that can't 
> > > > tolerate
> > > > a bounce buffer allocation failure.  To prevent such failures, the 
> > > > swiotlb
> > > > memory size must be overly large, which wastes memory.  
> > > 
> > > If "mission critical workloads" are in a vm that allowes overcommit and
> > > no control over other vms in that same system, then you have worse
> > > problems, sorry.
> > > 
> > > Just don't do that.
> > >   
> > 
> > No, the cases I'm concerned about don't involve memory overcommit.
> > 
> > CoCo VMs must use swiotlb bounce buffers to do DMA I/O.  Current swiotlb
> > code in the Linux guest allocates a configurable, but fixed, amount of guest
> > memory at boot time for this purpose.  But it's hard to know how much
> > swiotlb bounce buffer memory will be needed to handle peak I/O loads.
> > This patch set does dynamic allocation of swiotlb bounce buffer memory,
> > which can help avoid needing to configure an overly large fixed size at 
> > boot.  
> 
> But, as you point out, memory allocation can fail at runtime, so how can
> you "guarantee" that this will work properly anymore if you are going to
> make it dynamic?

In general, there is no guarantee, of course, because bounce buffers
may be requested from interrupt context. I believe Michael is looking
for the SWIOTLB_MAY_SLEEP flag that was introduced in my v2 series, so
new pools can be allocated with GFP_KERNEL instead of GFP_NOWAIT if
possible, and then there is no need to dip into the coherent pool.

Well, I have deliberately removed all complexities from my v3 series,
but I have more WIP local topic branches in my local repo:

- allow blocking allocations if possible
- allocate a new pool before existing pools are full
- free unused memory pools

I can make a bigger series, or I can send another series as RFC if this
is desired. ATM I don't feel confident enough that my v3 series will be
accepted without major changes, so I haven't invested time into
finalizing the other topic branches.

@Michael: If you know that my plan is to introduce blocking allocations
with a follow-up patch series, is the present approach acceptable?

Petr T



Re: [PATCH v3 1/7] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-06-27 Thread Petr Tesařík
On Tue, 27 Jun 2023 17:48:02 +0200
Christoph Hellwig  wrote:

> On Tue, Jun 27, 2023 at 01:30:06PM +0200, Petr Tesařík wrote:
> > Xen is the only user of an "is SWIOTLB present" interface. IIUC Xen
> > needs bounce buffers for the PCI frontend driver, but if there is no
> > other reason to have a SWIOTLB, the system does not set up one at boot.  
> 
> Please take a look at my "unexport swiotlb_active v2" series that
> unfortunately missed the 6.5 merge window waiting for reviews.

I noticed it, but it seems I missed the part which completely removes
pci_xen_swiotlb_init_late().

Then we're left only with a reference from xen_mm_init() in
arch/arm/xen/mm.c, and I believe this one can also be solved
differently.

Petr T



Re: unexport swiotlb_active v2

2023-06-27 Thread Petr Tesařík
On Mon, 19 Jun 2023 11:19:41 +0200
Christoph Hellwig  wrote:

> Any comments?  I'd really like to finish this off this merge window..

Let me second this request. My dynamic SWIOTLB patch series also has a
dependence on this.

Petr T



Re: [PATCH v3 1/7] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-06-27 Thread Petr Tesařík
Oops, originally sent only to Robin. Restoring the recipient list here...

On Tue, 27 Jun 2023 11:55:00 +0100
Robin Murphy  wrote:

> On 27/06/2023 11:24 am, Greg Kroah-Hartman wrote:  
> > On Tue, Jun 27, 2023 at 11:54:23AM +0200, Petr Tesarik wrote:
> >> +/**
> >> + * is_swiotlb_active() - check if the software IO TLB is initialized
> >> + * @dev:  Device to check, or %NULL for the default IO TLB.
> >> + */
> >>   bool is_swiotlb_active(struct device *dev)
> >>   {
> >> -  struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> >> +  struct io_tlb_mem *mem = dev
> >> +  ? dev->dma_io_tlb_mem
> >> +  : &io_tlb_default_mem;
> > 
> > That's impossible to read and maintain over time, sorry.
> > 
> > Please use real "if () else" lines, so that it can be maintained over
> > time.
> 
> Moreover, it makes for a horrible interface anyway. If there's a need 
> for a non-specific "is SWIOTLB present at all?" check unrelated to any 
> particular device (which arguably still smells of poking into 
> implementation details...), please encapsulate it in its own distinct 
> helper like, say, is_swiotlb_present(void).  

I'm sorry for writing two replies, but I realized too late that this
part is unrelated to the MIPS Octeon platform.

Xen is the only user of an "is SWIOTLB present" interface. IIUC Xen
needs bounce buffers for the PCI frontend driver, but if there is no
other reason to have a SWIOTLB, the system does not set up one at boot.

Yeah, they should probably do things differently. At least this code in
arch/x86/kernel/pci-dma.c is fishy:

/* XXX: this switches the dma ops under live devices! */
dma_ops = &xen_swiotlb_dma_ops;

However, I don't think it's up to me to fix that...

To sum it up, I can certainly provide a separate function instead of
overloading the is_swiotlb_active() API.

Thanks for the suggestion!

Petr T



Re: [PATCH v3 1/7] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-06-27 Thread Petr Tesařík
On Tue, 27 Jun 2023 11:55:00 +0100
Robin Murphy  wrote:

> On 27/06/2023 11:24 am, Greg Kroah-Hartman wrote:
> > On Tue, Jun 27, 2023 at 11:54:23AM +0200, Petr Tesarik wrote:  
> >> +/**
> >> + * is_swiotlb_active() - check if the software IO TLB is initialized
> >> + * @dev:  Device to check, or %NULL for the default IO TLB.
> >> + */
> >>   bool is_swiotlb_active(struct device *dev)
> >>   {
> >> -  struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> >> +  struct io_tlb_mem *mem = dev
> >> +  ? dev->dma_io_tlb_mem
> >> +  : &io_tlb_default_mem;  
> > 
> > That's impossible to read and maintain over time, sorry.
> > 
> > Please use real "if () else" lines, so that it can be maintained over
> > time.  
> 
> Moreover, it makes for a horrible interface anyway. If there's a need 
> for a non-specific "is SWIOTLB present at all?" check unrelated to any 
> particular device (which arguably still smells of poking into 
> implementation details...), please encapsulate it in its own distinct 
> helper like, say, is_swiotlb_present(void).
> 
> However, the more I think about it, the more I doubt that logic like 
> octeon_pci_setup() can continue to work properly at all if SWIOTLB 
> allocation becomes dynamic... :/

Good, so I'm not alone. I don't know enough of the Octeon hardware to
understand how much magic is behind these PCI BARs and why one of them
should be (sometimes) programmed the way it is.

OTOH it doesn't seem to me that this platform forces DMA through
SWIOTLB. At least all calls to swiotlb_init() under arch/mips take this
form:

swiotlb_init(true, SWIOTLB_VERBOSE);

This makes me believe that this PCI BAR setup is merely an optimization.

However, if nobody has a clear answer, a fallback solution is to stay
on the safe side and add a flag to struct io_tlb_mem whether SWIOTLB
can grow dynamically. The helper function would then set this flag and
make sure that on this Octeon platform, the SWIOTLB stays restricted to
the default pool.

Hopefully, Thomas Bogendoerfer can shed some light on that code.

Petr T



Re: [PATCH 2/4] x86: always initialize xen-swiotlb when xen-pcifront is enabling

2023-05-22 Thread Petr Tesařík
On Fri, 19 May 2023 12:10:26 +0200
Marek Marczykowski-Górecki  wrote:

> On Fri, May 19, 2023 at 06:04:05AM +0200, Christoph Hellwig wrote:
> > On Thu, May 18, 2023 at 08:18:39PM +0200, Marek Marczykowski-Górecki wrote: 
> >  
> > > On Thu, May 18, 2023 at 03:42:51PM +0200, Christoph Hellwig wrote:  
> > > > Remove the dangerous late initialization of xen-swiotlb in
> > > > pci_xen_swiotlb_init_late and instead just always initialize
> > > > xen-swiotlb in the boot code if CONFIG_XEN_PCIDEV_FRONTEND is enabled.
> > > > 
> > > > Signed-off-by: Christoph Hellwig   
> > > 
> > > Doesn't it mean all the PV guests will basically waste 64MB of RAM
> > > by default each if they don't really have PCI devices?  
> > 
> > If CONFIG_XEN_PCIDEV_FRONTEND is enabled, and the kernel's isn't booted
> > with swiotlb=noforce, yes.  
> 
> That's "a bit" unfortunate, since that might be significant part of the
> VM memory, or if you have a lot of VMs, a significant part of the host
> memory - it quickly adds up.

I wonder if dynamic swiotlb allocation might also help with this...

Petr T


pgpmoZtpUqdK5.pgp
Description: Digitální podpis OpenPGP


Re: [PATCH 2/4] x86: always initialize xen-swiotlb when xen-pcifront is enabling

2023-05-22 Thread Petr Tesařík
Hi Christoph,

On Sat, 20 May 2023 08:21:03 +0200
Christoph Hellwig  wrote:

> On Fri, May 19, 2023 at 02:58:57PM +0200, Christoph Hellwig wrote:
> > On Fri, May 19, 2023 at 01:49:46PM +0100, Andrew Cooper wrote:  
> > > > The alternative would be to finally merge swiotlb-xen into swiotlb, in
> > > > which case we might be able to do this later.  Let me see what I can
> > > > do there.  
> > > 
> > > If that is an option, it would be great to reduce the special-cashing.  
> > 
> > I think it's doable, and I've been wanting it for a while.  I just
> > need motivated testers, but it seems like I just found at least two :)  
> 
> So looking at swiotlb-xen it does these off things where it takes a value
> generated originally be xen_phys_to_dma, then only does a dma_to_phys
> to go back and call pfn_valid on the result.  Does this make sense, or
> is it wrong and just works by accident?  I.e. is the patch below correct?
> 
> 
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 67aa74d201627d..3396c5766f0dd8 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -90,9 +90,7 @@ static inline int range_straddles_page_boundary(phys_addr_t 
> p, size_t size)
>  
>  static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>  {
> - unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
> - unsigned long xen_pfn = bfn_to_local_pfn(bfn);
> - phys_addr_t paddr = (phys_addr_t)xen_pfn << XEN_PAGE_SHIFT;
> + phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);


I'm no big Xen expert, but I think this is wrong. Let's go through it
line by line:

- bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));

  Take the DMA address (as seen by devices on the bus), convert it to a
  physical address (as seen by the CPU on the bus) and shift it right
  by XEN_PAGE_SHIFT. The result is a Xen machine PFN.

- xen_pfn = bfn_to_local_pfn(bfn);

  Take the machine PFN and converts it to a physical PFN.

- paddr = (phys_addr_t)xen_pfn << XEN_PAGE_SHIFT;

  Convert the physical PFN to a physical address.

The important thing here is that Xen PV does not have auto-translated
physical addresses, so physical address != machine address. Physical
addresses in Xen PV domains are "artificial", used by the kernel to
index the mem_map array, so a PFN can be easily converted to a struct
page pointer and back. However, these addresses are never used by
hardware, not even by CPU. The addresses used by the CPU are called
machine addresses. There is no address translation between VCPUs and
CPUs, because a PV domain runs directly on the CPU. After all, that's
why it is called _para_virtualized.

HTH
Petr T