Hi Maintainers, We've cross-checked the barrier definition and implementation in VPP and DPDK [1][2]. There may be 2 issues we would like to discuss with VPP community.
Issue 1. Current VPP barriers for Arm CPU are inappropriate for device memory synchronization in the native PMD code. Arm uses different barriers for normal memory and device memory (mapped PCIe device memory space) synchronization. VPP uses __sync_synchronize() for both normal and device memory synchronization. __sync_synchronize() generates 'DMB ISH' instruction [9], which is correct for normal memory. While for device memory, the appropriate instruction should be 'DMB OSH'. More information on their difference and deployed scenario can be found in [7] [8]. There's the comment in [12], saying "Data accesses to device memory locations are coherent for all observers in the system, and correspondingly are treated as being Outer Shareable.", which suggested using OSH qualifier on Arm for device memory. Issue 2. The current VPP barriers for x86 CPU are too strong than required, which will probably degrade the performance on x86 CPU. Per barrier definition, rte_io_*mb, rte_smp_*mb, for x86, in DPDK [2], there's no difference regarding the barriers for normal memory and device memory. The store barrier in VPP uses __builtin_ia32_sfence(), which generates 'sfence' instruction [5][6], while definition of rte_smp_rmb/rte_smp_wmb/rte_io_rmb/rte_io_wmb in DPDK [2] shows that compiler barriers are good enough, probably due to x86 is strong order arch. Our proposal is, 1. For Issue 1, we can take 'DMB OSH' to replace __sync_synchronize() for Arm CPU and use 'DMB OSH' for both normal and device memory, which looks a minimum change and affect Arm CPU only, but may degrade the normal synchronization. [10] 2. To address Issue 1 thoroughly and Issue 2, we probably can introduce the similar idea in DPDK to VPP, by implementing two suites of barriers, CLIB_MEMORY_BARRIER()/CLIB_MEMORY_STORE_BARRIER()/CLIB_MEMORY_LOAD_BARRIER() for normal memory, and CLIB_IOMEMORY_BARRIER()/CLIB_IOMEMORY_STORE_BARRIER()/CLIB_IOMEMORY_LOAD_BARRIER() for device memory synchronization in PMD source code. Could you please provide your suggestions? [1] https://github.com/DPDK/dpdk/blob/main/lib/eal/arm/include/rte_atomic_64.h [2] https://github.com/DPDK/dpdk/blob/main/lib/eal/x86/include/rte_atomic.h [5] x86 clang-13 https://godbolt.org/z/x9MWKE46q [6] x86 gcc-11 https://godbolt.org/z/deY99fo7s [7] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f [8] https://developer.arm.com/documentation/100941/0101/Barriers [9] https://godbolt.org/z/8sYToKq8P [10] https://gerrit.fd.io/r/c/vpp/+/37864/1/src/vppinfra/clib.h [11] https://developer.arm.com/documentation/100941/0101/Memory-attributes [12] https://developer.arm.com/documentation/ddi0487/ia Thanks.
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#22475): https://lists.fd.io/g/vpp-dev/message/22475 Mute This Topic: https://lists.fd.io/mt/96350522/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-