Hi Maintainers,
We've cross-checked the barrier definition and implementation in VPP and DPDK 
[1][2]. There may be 2 issues we would like to discuss with VPP community.

Issue 1. Current VPP barriers for Arm CPU are inappropriate for device memory 
synchronization in the native PMD code.
Arm uses different barriers for normal memory and device memory (mapped PCIe 
device memory space) synchronization. VPP uses __sync_synchronize() for both 
normal and device memory synchronization. __sync_synchronize() generates 'DMB 
ISH' instruction [9], which is correct for normal memory. While for device 
memory, the appropriate instruction should be 'DMB OSH'. More information on 
their difference and deployed scenario can be found in [7] [8].
There's the comment in [12], saying "Data accesses to device memory locations 
are coherent for all observers in the system, and correspondingly are treated 
as being Outer Shareable.", which suggested using OSH qualifier on Arm for 
device memory.

Issue 2. The current VPP barriers for x86 CPU are too strong than required, 
which will probably degrade the performance on x86 CPU.
Per barrier definition, rte_io_*mb, rte_smp_*mb, for x86, in DPDK [2], there's 
no difference regarding the barriers for normal memory and device memory.
The store barrier in VPP uses __builtin_ia32_sfence(), which generates 'sfence' 
instruction [5][6], while definition of 
rte_smp_rmb/rte_smp_wmb/rte_io_rmb/rte_io_wmb in DPDK [2] shows that compiler 
barriers are good enough, probably due to x86 is strong order arch.

Our proposal is,
1. For Issue 1, we can take 'DMB OSH' to replace __sync_synchronize() for Arm 
CPU and use 'DMB OSH' for both normal and device memory, which looks a minimum 
change and affect Arm CPU only, but may degrade the normal synchronization. [10]
2. To address Issue 1 thoroughly and Issue 2, we probably can introduce the 
similar idea in DPDK to VPP, by implementing two suites of barriers,  
CLIB_MEMORY_BARRIER()/CLIB_MEMORY_STORE_BARRIER()/CLIB_MEMORY_LOAD_BARRIER()  
for normal memory, and 
CLIB_IOMEMORY_BARRIER()/CLIB_IOMEMORY_STORE_BARRIER()/CLIB_IOMEMORY_LOAD_BARRIER()
  for device memory synchronization in PMD source code.

Could you please provide your suggestions?

[1] https://github.com/DPDK/dpdk/blob/main/lib/eal/arm/include/rte_atomic_64.h
[2] https://github.com/DPDK/dpdk/blob/main/lib/eal/x86/include/rte_atomic.h
[5] x86 clang-13 https://godbolt.org/z/x9MWKE46q
[6] x86 gcc-11 https://godbolt.org/z/deY99fo7s
[7] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22ec71615d824f4f11d38d0e55a88d8956b7e45f
[8] https://developer.arm.com/documentation/100941/0101/Barriers
[9] https://godbolt.org/z/8sYToKq8P
[10] https://gerrit.fd.io/r/c/vpp/+/37864/1/src/vppinfra/clib.h
[11] https://developer.arm.com/documentation/100941/0101/Memory-attributes
[12] https://developer.arm.com/documentation/ddi0487/ia

Thanks.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22475): https://lists.fd.io/g/vpp-dev/message/22475
Mute This Topic: https://lists.fd.io/mt/96350522/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to