On Sun, Dec 28, 2025 at 10:45:13AM +1300, Barry Song wrote:
> On Sun, Dec 28, 2025 at 9:07 AM Leon Romanovsky <[email protected]> wrote:
> >
> > On Sat, Dec 27, 2025 at 11:52:44AM +1300, Barry Song wrote:
> > > From: Barry Song <[email protected]>
> > >
> > > Currently, arch_sync_dma_for_cpu and arch_sync_dma_for_device
> > > always wait for the completion of each DMA buffer. That is,
> > > issuing the DMA sync and waiting for completion is done in a
> > > single API call.
> > >
> > > For scatter-gather lists with multiple entries, this means
> > > issuing and waiting is repeated for each entry, which can hurt
> > > performance. Architectures like ARM64 may be able to issue all
> > > DMA sync operations for all entries first and then wait for
> > > completion together.
> > >
> > > To address this, arch_sync_dma_for_* now issues DMA operations in
> > > batch, followed by a flush. On ARM64, the flush is implemented
> > > using a dsb instruction within arch_sync_dma_flush().
> > >
> > > For now, add arch_sync_dma_flush() after each
> > > arch_sync_dma_for_*() call. arch_sync_dma_flush() is defined as a
> > > no-op on all architectures except arm64, so this patch does not
> > > change existing behavior. Subsequent patches will introduce true
> > > batching for SG DMA buffers.
> > >
> > > Cc: Leon Romanovsky <[email protected]>
> > > Cc: Catalin Marinas <[email protected]>
> > > Cc: Will Deacon <[email protected]>
> > > Cc: Marek Szyprowski <[email protected]>
> > > Cc: Robin Murphy <[email protected]>
> > > Cc: Ada Couprie Diaz <[email protected]>
> > > Cc: Ard Biesheuvel <[email protected]>
> > > Cc: Marc Zyngier <[email protected]>
> > > Cc: Anshuman Khandual <[email protected]>
> > > Cc: Ryan Roberts <[email protected]>
> > > Cc: Suren Baghdasaryan <[email protected]>
> > > Cc: Joerg Roedel <[email protected]>
> > > Cc: Juergen Gross <[email protected]>
> > > Cc: Stefano Stabellini <[email protected]>
> > > Cc: Oleksandr Tyshchenko <[email protected]>
> > > Cc: Tangquan Zheng <[email protected]>
> > > Signed-off-by: Barry Song <[email protected]>
> > > ---
> > >  arch/arm64/include/asm/cache.h |  6 ++++++
> > >  arch/arm64/mm/dma-mapping.c    |  4 ++--
> > >  drivers/iommu/dma-iommu.c      | 37 +++++++++++++++++++++++++---------
> > >  drivers/xen/swiotlb-xen.c      | 24 ++++++++++++++--------
> > >  include/linux/dma-map-ops.h    |  6 ++++++
> > >  kernel/dma/direct.c            |  8 ++++++--
> > >  kernel/dma/direct.h            |  9 +++++++--
> > >  kernel/dma/swiotlb.c           |  4 +++-
> > >  8 files changed, 73 insertions(+), 25 deletions(-)
> >
> > <...>
> >
> > > +#ifndef arch_sync_dma_flush
> > > +static inline void arch_sync_dma_flush(void)
> > > +{
> > > +}
> > > +#endif
> >
> > Over the weekend I realized a useful advantage of the ARCH_HAVE_* config
> > options: they make it straightforward to inspect the entire DMA path simply
> > by looking at the .config.
> 
> I am not quite sure how much this benefits users, as the same
> information could also be obtained by grepping for
> #define arch_sync_dma_flush in the source code.

It differs slightly. Users no longer need to grep around or guess whether this
platform used the arch_sync_dma_flush path. A simple grep for ARCH_HAVE_ in
/proc/config.gz provides the answer.

> 
> >
> > Thanks,
> > Reviewed-by: Leon Romanovsky <[email protected]>
> 
> Thanks very much, Leon, for reviewing this over the weekend. One thing
> you might have missed is that I place arch_sync_dma_flush() after all
> arch_sync_dma_for_*() calls, for both single and sg cases. I also
> used a Python script to scan the code and verify that every
> arch_sync_dma_for_*() is followed by arch_sync_dma_flush(), to ensure
> that no call is left out.
> 
> In the subsequent patches, for sg cases, the per-entry flush is
> replaced by a single flush of the entire sg. Each sg case has
> different characteristics: some are straightforward, while others
> can be tricky and involve additional contexts.

I didn't overlook it, and I understand your rationale. However, this is
not how kernel patches should be structured. You should not introduce
code in patch X and then move it elsewhere in patch X + Y.

Place the code in the correct location from the start. Your patches are
small enough to review as is.

Thanks"
> 
> Thanks
> Barry

Reply via email to