Right now, smmu is using dma_alloc_coherent() to get memory to save queues and tables. Typically, on ARM64 server, there is a default CMA located at node0, which could be far away from node2, node3 etc. with this patch, smmu will get memory from local numa node to save command queues and page tables. that means dma_unmap latency will be shrunk much. Meanwhile, when iommu.passthrough is on, device drivers which call dma_ alloc_coherent() will also get local memory and avoid the travel between numa nodes.
Cc: Will Deacon <w...@kernel.org> Cc: Robin Murphy <robin.mur...@arm.com> Signed-off-by: Barry Song <song.bao....@hisilicon.com> --- arch/arm64/mm/init.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 8f0e70ebb49d..204a534982b2 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -474,6 +474,8 @@ void __init bootmem_init(void) arm64_numa_init(); + dma_pernuma_cma_reserve(); + #ifdef CONFIG_ARM64_4K_PAGES hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT); #endif -- 2.23.0