+1 Florin
> On Jul 5, 2021, at 4:27 AM, Dave Barach <v...@barachs.net> wrote: > > “Going across the whole codebase and replacing prefetch macros is something > we should definitely avoid.” > > +1, for sure... FWIW... Dave > > From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io > <mailto:vpp-dev@lists.fd.io>> On Behalf Of Damjan Marion via lists.fd.io > <http://lists.fd.io/> > Sent: Monday, July 5, 2021 4:50 AM > To: Lijian Zhang <lijian.zh...@arm.com <mailto:lijian.zh...@arm.com>> > Cc: vpp-dev <vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>>; nd > <n...@arm.com <mailto:n...@arm.com>>; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com <mailto:honnappa.nagaraha...@arm.com>>; > Govindarajan Mohandoss <govindarajan.mohand...@arm.com > <mailto:govindarajan.mohand...@arm.com>>; Tianyu Li <tianyu...@arm.com > <mailto:tianyu...@arm.com>>; Nitin Saxena <nsax...@marvell.com > <mailto:nsax...@marvell.com>> > Subject: Re: [vpp-dev] Prefetches improvement for VPP Arm generic image > > Hi Lijian, > > It will be good to know if 128 byte cacheline is something ARM platforms will > be using in the future or it is just historical. > > Cacheline size problem is not just about prefetching, even bigger issue is > false sharing, so we need to address both. > Probably best solution is to have 2 VPP images, one for 128 and one for 64 > byte cacheline size. > > Going across the whole codebase and replacing prefetch macros is something we > should definitely avoid. > > — > Damjan > > > On 05.07.2021., at 07:28, Lijian Zhang <lijian.zh...@arm.com > <mailto:lijian.zh...@arm.com>> wrote: > > Hi Damjan, > I committed several patches to address some issues around cache-line > definitions in VPP. > > Patch [1.1] is to resolve the build error [2] on 64Byte cache line Arm CPUs, > e.g., ThunderX2, NeoverseN1, caused by the commit > (https://gerrit.fd.io/r/c/vpp/+/32996 <https://gerrit.fd.io/r/c/vpp/+/32996>, > build: remove unused files and sections). > It also supports building Arm generic image (with command of “make > build-release”) with 128Byte cache line definition, and building native image > with 64Byte cache line definition on some Arm CPUs, e.g., ThunderX2, > NeoverseN1 (with command of “make build-release TARGET_PLATFORM=native”). > > Patch [1.5] is to set the default cache line definition in Arm generic image > from 128Byte to 64Byte. > Setting cache line definition to 128Byte for Arm generic image is required > for ThunderX1 (with 128Byte physical cache line), which is also the build > machine in FD.io <http://fd.io/> lab. I’m thinking for setting 64Byte cache > line definition in VPP for Arm image, which will affect ThunderX1 and > OcteonTX2 CPUs. So it requires the confirmation by Marvell. > > Arm architecture CPUs have 128Byte or 64Byte physical designs. So no matter > the cache line definition is 128Byte or 64Byte in VPP source code, the > prefetch functions in generic image will not work properly on all Arm CPUs. > Patches [1.2] [1.3] [1.4] are to resolve the issue. > > For example when running Arm generic image (cache-line-size is defined as > 128B in Makefile for all Arm architectures) on 64Byte cache-line-size CPUs, > e.g., Neoverse-N1, Ampere altra, ThunderX2. > > [3] shows the prefetch macro definitions in VPP. Using CLIB_PREFETCH(), you > can prefetch data resides in multiple cache lines. > [4] shows some usage examples of the prefetch macros in VPP. When running Arm > generic image (128B cache-line-size definition) on 64B cache-line CPUs (N1SDP > for example), 4.2, 4.3 and 4.4 have issues. > For 4.2, the input for size parameter is 68. On N1SDP with 64B > cache-line-size, there should be two prefetch instructions executed, but due > to 68 is less than CLIB_CACHE_LINE_BYTES (128Byte definition in VPP), only > the first prefetch instruction is executed. > For 4.3, if sizeof (ip0[0]) equals 68 or any other values larger than 64B, > there will be the same issue as 4.2. > For 4.4, the code is trying to prefetch the first 128B of packet content. It > assumes CLIB_CACHE_LINE_BYTES is 64B always. In Arm generic image, the input > for size parameter is 256B, which will execute prefetches on unexpected > cache-lines (expected prefetches on 64B-0 and 64B-1, but actually on B64-0 > and B64-2) . > Packet content: [64B-0][64B-1][64B-2][64B-3] > > Our proposal is introduce a macro CLIB_N_CACHELINE_BYTES via VPP multi-arch > feature (check patch [1.2]), to reflect the runtime CPU cache-line-size in > Arm generic image, so that the prefetch instructions can be executed > correctly. > Then for 4.4, we will need to modify the parameter for size, from > 2*CLIB_CACHE_LINE_BYTES to 128B, to reflect the actual intention. > > Some additional macros [1.3] can be added for users to do prefetch based on > number of cache-lines, besides number of bytes. > > Could you please suggest on the issue and proposal? > > [1]. Patches > [1.1] build: support 128B/64B cache line size in Arm image, > https://gerrit.fd.io/r/c/vpp/+/32968/2 > <https://gerrit.fd.io/r/c/vpp/+/32968/2> > [1.2] vppinfra: refactor prefetch macro, > https://gerrit.fd.io/r/c/vpp/+/32969/3 > <https://gerrit.fd.io/r/c/vpp/+/32969/3> > [1.3] vppinfra: fix functions to prefetch single line, > https://gerrit.fd.io/r/c/vpp/+/32970/2 > <https://gerrit.fd.io/r/c/vpp/+/32970/2> > [1.4] misc: correct prefetch macro usage, > https://gerrit.fd.io/r/c/vpp/+/32971/3 > <https://gerrit.fd.io/r/c/vpp/+/32971/3> > [1.5] build: set 64B cache line size in Arm image, > https://gerrit.fd.io/r/c/vpp/+/32972/2 > <https://gerrit.fd.io/r/c/vpp/+/32972/2> > > [2]. Error message > src/plugins/dpdk/device/init.c:1916:3: error: static_assert failed due to > requirement '128 == 1 << 6' "DPDK RTE CACHE LINE SIZE does not match with > 1<<CLIB_LOG2_CACHE_LINE_BYTES" > STATIC_ASSERT (RTE_CACHE_LINE_SIZE == 1 << CLIB_LOG2_CACHE_LINE_BYTES, > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > /home/lijian/tasks/plsremove/src/vppinfra/error_bootstrap.h:111:34: note: > expanded from macro 'STATIC_ASSERT' > #define STATIC_ASSERT(truth,...) _Static_assert(truth, __VA_ARGS__) > ^ ~~~~~ > 1 error generated. > > [3]. Prefetch macro definitions in VPP source code. > ‘’’ > #define _CLIB_PREFETCH(n,size,type) > \ > if ((size) > (n)*CLIB_CACHE_LINE_BYTES) > \ > __builtin_prefetch (_addr + (n)*CLIB_CACHE_LINE_BYTES, \ > CLIB_PREFETCH_##type, > \ > /* locality */ 3); > > #define CLIB_PREFETCH(addr,size,type) \ > do { > \ > void * _addr = (addr); \ > > \ > ASSERT ((size) <= 4*CLIB_CACHE_LINE_BYTES); \ > _CLIB_PREFETCH (0, size, type); \ > _CLIB_PREFETCH (1, size, type); \ > _CLIB_PREFETCH (2, size, type); \ > _CLIB_PREFETCH (3, size, type); \ > } while (0) > ‘’’ > > [4] > 4.1 CLIB_PREFETCH (p2->pre_data, 32, STORE); > 4.2 CLIB_PREFETCH (p2->pre_data, 68, STORE); > 4.3 CLIB_PREFETCH (b[4]->data, sizeof (ip0[0]), LOAD); > 4.4 CLIB_PREFETCH (p2->data, 2*CLIB_CACHE_LINE_BYTES, LOAD); > > Thanks. > > > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#19709): https://lists.fd.io/g/vpp-dev/message/19709 Mute This Topic: https://lists.fd.io/mt/83991823/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-