On Sun, Dec 28, 2025 at 10:45:13AM +1300, Barry Song wrote: > On Sun, Dec 28, 2025 at 9:07 AM Leon Romanovsky <[email protected]> wrote: > > > > On Sat, Dec 27, 2025 at 11:52:44AM +1300, Barry Song wrote: > > > From: Barry Song <[email protected]> > > > > > > Currently, arch_sync_dma_for_cpu and arch_sync_dma_for_device > > > always wait for the completion of each DMA buffer. That is, > > > issuing the DMA sync and waiting for completion is done in a > > > single API call. > > > > > > For scatter-gather lists with multiple entries, this means > > > issuing and waiting is repeated for each entry, which can hurt > > > performance. Architectures like ARM64 may be able to issue all > > > DMA sync operations for all entries first and then wait for > > > completion together. > > > > > > To address this, arch_sync_dma_for_* now issues DMA operations in > > > batch, followed by a flush. On ARM64, the flush is implemented > > > using a dsb instruction within arch_sync_dma_flush(). > > > > > > For now, add arch_sync_dma_flush() after each > > > arch_sync_dma_for_*() call. arch_sync_dma_flush() is defined as a > > > no-op on all architectures except arm64, so this patch does not > > > change existing behavior. Subsequent patches will introduce true > > > batching for SG DMA buffers. > > > > > > Cc: Leon Romanovsky <[email protected]> > > > Cc: Catalin Marinas <[email protected]> > > > Cc: Will Deacon <[email protected]> > > > Cc: Marek Szyprowski <[email protected]> > > > Cc: Robin Murphy <[email protected]> > > > Cc: Ada Couprie Diaz <[email protected]> > > > Cc: Ard Biesheuvel <[email protected]> > > > Cc: Marc Zyngier <[email protected]> > > > Cc: Anshuman Khandual <[email protected]> > > > Cc: Ryan Roberts <[email protected]> > > > Cc: Suren Baghdasaryan <[email protected]> > > > Cc: Joerg Roedel <[email protected]> > > > Cc: Juergen Gross <[email protected]> > > > Cc: Stefano Stabellini <[email protected]> > > > Cc: Oleksandr Tyshchenko <[email protected]> > > > Cc: Tangquan Zheng <[email protected]> > > > Signed-off-by: Barry Song <[email protected]> > > > --- > > > arch/arm64/include/asm/cache.h | 6 ++++++ > > > arch/arm64/mm/dma-mapping.c | 4 ++-- > > > drivers/iommu/dma-iommu.c | 37 +++++++++++++++++++++++++--------- > > > drivers/xen/swiotlb-xen.c | 24 ++++++++++++++-------- > > > include/linux/dma-map-ops.h | 6 ++++++ > > > kernel/dma/direct.c | 8 ++++++-- > > > kernel/dma/direct.h | 9 +++++++-- > > > kernel/dma/swiotlb.c | 4 +++- > > > 8 files changed, 73 insertions(+), 25 deletions(-) > > > > <...> > > > > > +#ifndef arch_sync_dma_flush > > > +static inline void arch_sync_dma_flush(void) > > > +{ > > > +} > > > +#endif > > > > Over the weekend I realized a useful advantage of the ARCH_HAVE_* config > > options: they make it straightforward to inspect the entire DMA path simply > > by looking at the .config. > > I am not quite sure how much this benefits users, as the same > information could also be obtained by grepping for > #define arch_sync_dma_flush in the source code.
It differs slightly. Users no longer need to grep around or guess whether this platform used the arch_sync_dma_flush path. A simple grep for ARCH_HAVE_ in /proc/config.gz provides the answer. > > > > > Thanks, > > Reviewed-by: Leon Romanovsky <[email protected]> > > Thanks very much, Leon, for reviewing this over the weekend. One thing > you might have missed is that I place arch_sync_dma_flush() after all > arch_sync_dma_for_*() calls, for both single and sg cases. I also > used a Python script to scan the code and verify that every > arch_sync_dma_for_*() is followed by arch_sync_dma_flush(), to ensure > that no call is left out. > > In the subsequent patches, for sg cases, the per-entry flush is > replaced by a single flush of the entire sg. Each sg case has > different characteristics: some are straightforward, while others > can be tricky and involve additional contexts. I didn't overlook it, and I understand your rationale. However, this is not how kernel patches should be structured. You should not introduce code in patch X and then move it elsewhere in patch X + Y. Place the code in the correct location from the start. Your patches are small enough to review as is. Thanks" > > Thanks > Barry
