Re: [PATCH 21/21] dma-mapping: replace custom code with generic implementation

2023-03-27 Thread Christoph Hellwig
> +static inline void arch_dma_cache_wback(phys_addr_t paddr, size_t size) > { > + dma_cache_wback(paddr, size); > +} > > +static inline void arch_dma_cache_inv(phys_addr_t paddr, size_t size) > +{ > + dma_cache_inv(paddr, size); > } > +static inline void

Re: [PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules

2023-03-27 Thread Max Filippov
On Mon, Mar 27, 2023 at 5:14 AM Arnd Bergmann wrote: > > From: Arnd Bergmann > > xtensa is one of the platforms that has both write-back and write-through > caches, and needs to account for both in its DMA mapping operations. > > It does this through a set of operations that is different from

Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper

2023-03-27 Thread Russell King (Oracle)
On Mon, Mar 27, 2023 at 02:13:16PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann > > The arm version of the arch_sync_dma_for_cpu() function annotates pages as > PG_dcache_clean after a DMA, but no other architecture does this here. ... because this is an arm32 specific feature.

Re: [PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device

2023-03-27 Thread Guo Ren
On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann wrote: > > From: Arnd Bergmann > > csky is the only architecture that does a full flush for the > dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement > is only make sure there are no dirty cache lines for the buffer, > which can be

Re: [PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range

2023-03-27 Thread Russell King (Oracle)
On Mon, Mar 27, 2023 at 02:13:12PM +0200, Arnd Bergmann wrote: > From: Arnd Bergmann > > These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: > remove dmac_clean_range and dmac_inv_range") in an effort to sanitize > the dma-mapping API. Really no, please no. Let's not go back

Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing

2023-03-27 Thread Arnd Bergmann
On Mon, Mar 27, 2023, at 14:56, Christophe Leroy wrote: > Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : >> From: Arnd Bergmann >> >> The powerpc dma_sync_*_for_cpu() variants do more flushes than on other >> architectures. Reduce it to what everyone else does: >> >> - No flush is needed

Re: [PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing

2023-03-27 Thread Christophe Leroy
Le 27/03/2023 à 14:13, Arnd Bergmann a écrit : > From: Arnd Bergmann > > The powerpc dma_sync_*_for_cpu() variants do more flushes than on other > architectures. Reduce it to what everyone else does: > > - No flush is needed after data has been sent to a device > > - When data has been

Re: [PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper

2023-03-27 Thread Robin Murphy
On 2023-03-27 13:13, Arnd Bergmann wrote: From: Arnd Bergmann The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use

[PATCH 21/21] dma-mapping: replace custom code with generic implementation

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann Now that all of these have consistent behavior, replace them with a single shared implementation of arch_sync_dma_for_device() and arch_sync_dma_for_cpu() and three parameters to pick how they should operate: - If the CPU has speculative prefetching, then the cache has

[PATCH 19/21] ARM: dma-mapping: use generic form of arch_sync_dma_* helpers

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann As the final step of the conversion to generic arch_sync_dma_* helpers, change the Arm implementation to look the same as the new generic version, by calling the dmac_{clean,inv,flush}_area low-level functions instead of the abstracted dmac_{map,unmap}_area version. On

[PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann The arm version of the arch_sync_dma_for_cpu() function annotates pages as PG_dcache_clean after a DMA, but no other architecture does this here. On ia64, the same thing is done in arch_sync_dma_for_cpu(), so it makes sense to use the same hook in order to have identical

[PATCH 18/21] ARM: drop SMP support for ARM11MPCore

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann The cache management operations for noncoherent DMA on ARMv6 work in two different ways: * When CONFIG_DMA_CACHE_RWFO is set, speculative prefetches on in-flight DMA buffers lead to data corruption when the prefetched data is written back on top of data from the

[PATCH 17/21] ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann The arm specific iommu code in dma-mapping.c uses the page+offset based __dma_page_cpu_to_dev()/__dma_page_dev_to_cpu() helpers in place of the phys_addr_t based arch_sync_dma_for_device()/arch_sync_dma_for_cpu() wrappers around the. In order to be able to move the latter

[PATCH 16/21] ARM: dma-mapping: bring back dmac_{clean,inv}_range

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann These were remove ages ago in commit 702b94bff3c5 ("ARM: dma-mapping: remove dmac_clean_range and dmac_inv_range") in an effort to sanitize the dma-mapping API. Now this logic is getting moved into the generic dma-mapping implementation in order to give architectures less

[PATCH 15/21] ARM: dma-mapping: always invalidate WT caches before DMA

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann Most ARM CPUs can have write-back caches and that require cache management to be done in the dma_sync_*_for_device() operation. This is typically done in both writeback and writethrough mode. The cache-v4.S (arm720/740/7tdmi/9tdmi) and cache-v4wt.S (arm920t, arm940t)

[PATCH 14/21] parisc: dma-mapping: use regular flush/invalidate ops

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann non-coherent devices on parisc traditionally use a full flush+invalidate before and after each DMA, which is more expensive that what we do on other architectures. Before transfers to a device, the cache only has to be written back, but apparently there is no operation for

[PATCH 13/21] arc: dma-mapping: skip invalidating before bidirectional DMA

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the

[PATCH 12/21] mips: dma-mapping: split out cache operation logic

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann The mips arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave the same way as on other architectures, but in order to unify the implementations, the code needs to be rearranged to pick the type of cache operation in the outermost function. Signed-off-by: Arnd

[PATCH 11/21] mips: dma-mapping: skip invalidating before bidirectional DMA

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann Some architectures that need to invalidate buffers after bidirectional DMA because of speculative prefetching only do a simpler writeback before that DMA, while architectures that don't need to do the second invalidate tend to have a combined writeback+invalidate before the

[PATCH 10/21] csky: dma-mapping: skip invalidating before DMA from device

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann csky is the only architecture that does a full flush for the dma_sync_*_for_device(..., DMA_FROM_DEVICE) operation. The requirement is only make sure there are no dirty cache lines for the buffer, which can be either done through an invalidate operation (as on most

[PATCH 09/21] riscv: dma-mapping: skip invalidation before bidirectional DMA

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann For a DMA_BIDIRECTIONAL transfer, the caches have to be cleaned first to let the device see data written by the CPU, and invalidated after the transfer to let the CPU see data written by the device. riscv also invalidates the caches before the transfer, which does not appear

[PATCH 08/21] riscv: dma-mapping: only invalidate after DMA, not flush

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann No other architecture intentionally writes back dirty cache lines into a buffer that a device has just finished writing into. If the cache is clean, this has no effect at all, but if a cacheline in the buffer has actually been written by the CPU, there is a drive bug that is

[PATCH 07/21] powerpc: dma-mapping: always clean cache in _for_device() op

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann The powerpc implementation of arch_sync_dma_for_device() is unique in that it sometimes performs a full flush for the arch_sync_dma_for_device(paddr, size, DMA_FROM_DEVICE) operation when the address is unaligned, but otherwise invalidates the caches. Since the _for_cpu()

[PATCH 06/21] powerpc: dma-mapping: minimize for_cpu flushing

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann The powerpc dma_sync_*_for_cpu() variants do more flushes than on other architectures. Reduce it to what everyone else does: - No flush is needed after data has been sent to a device - When data has been received from a device, the cache only needs to be invalidated to

[PATCH 05/21] powerpc: dma-mapping: split out cache operation logic

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann The powerpc arch_sync_dma_for_device()/arch_sync_dma_for_cpu() functions behave differently from all other architectures, at least for some of the operations. As a preparation for making the behavior more consistent, reorder the logic in which they decide whether to flush,

[PATCH 04/21] microblaze: dma-mapping: skip extra DMA flushes

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann The microblaze dma_sync_* implementation uses the same function for both _for_cpu() and _for_device(), which is inconsistent with other architectures and slightly more expensive. Split it up into separate functions and skip the parts that are not needed: - on

[PATCH 03/21] sparc32: flush caches in dma_sync_*for_device

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann Leon has a very minimalistic cache that has no range operations and requires being flushed entirely to deal with noncoherent DMA. Most in-order architectures do their cache management in the dma_sync_*for_device() operations rather than dma_sync_*for_cpu. Since the cache is

[PATCH 02/21] xtensa: dma-mapping: use normal cache invalidation rules

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann xtensa is one of the platforms that has both write-back and write-through caches, and needs to account for both in its DMA mapping operations. It does this through a set of operations that is different from any architecture. This is not a problem by itself, but it makes it

[PATCH 01/21] openrisc: dma-mapping: flush bidirectional mappings

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann The cache management operations on DMA are different from the other architectures: - on DMA_TO_DEVICE, Openrisc currently invalidates the cache after the writeback, where a simple writeback without invalidation should be sufficient. - on DMA_BIDIRECTIONAL, Openrisc

[PATCH 00/21] dma-mapping: unify support for cache flushes

2023-03-27 Thread Arnd Bergmann
From: Arnd Bergmann After a long discussion about adding SoC specific semantics for when to flush caches in drivers/soc/ drivers that we determined to be fundamentally flawed[1], I volunteered to try to move that logic into architecture-independent code and make all existing architectures do the