Re: [vpp-dev] Prefetches improvement for VPP Arm generic image

Florin Coras Mon, 05 Jul 2021 12:08:13 -0700

+1

Florin


> On Jul 5, 2021, at 4:27 AM, Dave Barach <v...@barachs.net> wrote:
> 
> “Going across the whole codebase and replacing prefetch macros is something 
> we should definitely avoid.”
>
> +1, for sure... FWIW... Dave
>
> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io 
> <mailto:vpp-dev@lists.fd.io>> On Behalf Of Damjan Marion via lists.fd.io 
> <http://lists.fd.io/>
> Sent: Monday, July 5, 2021 4:50 AM
> To: Lijian Zhang <lijian.zh...@arm.com <mailto:lijian.zh...@arm.com>>
> Cc: vpp-dev <vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>>; nd 
> <n...@arm.com <mailto:n...@arm.com>>; Honnappa Nagarahalli 
> <honnappa.nagaraha...@arm.com <mailto:honnappa.nagaraha...@arm.com>>; 
> Govindarajan Mohandoss <govindarajan.mohand...@arm.com 
> <mailto:govindarajan.mohand...@arm.com>>; Tianyu Li <tianyu...@arm.com 
> <mailto:tianyu...@arm.com>>; Nitin Saxena <nsax...@marvell.com 
> <mailto:nsax...@marvell.com>>
> Subject: Re: [vpp-dev] Prefetches improvement for VPP Arm generic image
>
> Hi Lijian,
>
> It will be good to know if 128 byte cacheline is something ARM platforms will 
> be using in the future or it is just historical.
>
> Cacheline size problem is not just about prefetching, even bigger issue is 
> false sharing, so we need to address both.
> Probably best solution is to have 2 VPP images, one for 128 and one for 64 
> byte cacheline size.
>
> Going across the whole codebase and replacing prefetch macros is something we 
> should definitely avoid.
>
> — 
> Damjan
> 
> 
> On 05.07.2021., at 07:28, Lijian Zhang <lijian.zh...@arm.com 
> <mailto:lijian.zh...@arm.com>> wrote:
>
> Hi Damjan,
> I committed several patches to address some issues around cache-line 
> definitions in VPP.
>
> Patch [1.1] is to resolve the build error [2] on 64Byte cache line Arm CPUs, 
> e.g., ThunderX2, NeoverseN1, caused by the commit 
> (https://gerrit.fd.io/r/c/vpp/+/32996 <https://gerrit.fd.io/r/c/vpp/+/32996>, 
> build: remove unused files and sections).
> It also supports building Arm generic image (with command of “make 
> build-release”) with 128Byte cache line definition, and building native image 
> with 64Byte cache line definition on some Arm CPUs, e.g., ThunderX2, 
> NeoverseN1 (with command of “make build-release TARGET_PLATFORM=native”).
>
> Patch [1.5] is to set the default cache line definition in Arm generic image 
> from 128Byte to 64Byte.
> Setting cache line definition to 128Byte for Arm generic image is required 
> for ThunderX1 (with 128Byte physical cache line), which is also the build 
> machine in FD.io <http://fd.io/> lab. I’m thinking for setting 64Byte cache 
> line definition in VPP for Arm image, which will affect ThunderX1 and 
> OcteonTX2 CPUs. So it requires the confirmation by Marvell.
>
> Arm architecture CPUs have 128Byte or 64Byte physical designs. So no matter 
> the cache line definition is 128Byte or 64Byte in VPP source code, the 
> prefetch functions in generic image will not work properly on all Arm CPUs. 
> Patches [1.2] [1.3] [1.4] are to resolve the issue.
>
> For example when running Arm generic image (cache-line-size is defined as 
> 128B in Makefile for all Arm architectures) on 64Byte cache-line-size CPUs, 
> e.g., Neoverse-N1, Ampere altra, ThunderX2.
>
> [3] shows the prefetch macro definitions in VPP. Using CLIB_PREFETCH(), you 
> can prefetch data resides in multiple cache lines.
> [4] shows some usage examples of the prefetch macros in VPP. When running Arm 
> generic image (128B cache-line-size definition) on 64B cache-line CPUs (N1SDP 
> for example), 4.2, 4.3 and 4.4 have issues.
> For 4.2, the input for size parameter is 68. On N1SDP with 64B 
> cache-line-size, there should be two prefetch instructions executed, but due 
> to 68 is less than CLIB_CACHE_LINE_BYTES (128Byte definition in VPP), only 
> the first prefetch instruction is executed.
> For 4.3, if sizeof (ip0[0]) equals 68 or any other values larger than 64B, 
> there will be the same issue as 4.2.
> For 4.4, the code  is trying to prefetch the first 128B of packet content. It 
> assumes  CLIB_CACHE_LINE_BYTES is 64B always. In Arm generic image, the input 
> for size parameter is 256B, which will execute prefetches on unexpected 
> cache-lines (expected prefetches on 64B-0 and 64B-1, but actually on B64-0 
> and B64-2) .
> Packet content: [64B-0][64B-1][64B-2][64B-3]
>
> Our proposal is introduce a macro CLIB_N_CACHELINE_BYTES via VPP multi-arch 
> feature (check patch [1.2]), to reflect the runtime CPU cache-line-size in 
> Arm generic image, so that the prefetch instructions can be executed 
> correctly.
> Then for 4.4, we will need to modify the parameter for size, from 
> 2*CLIB_CACHE_LINE_BYTES to 128B, to reflect the actual intention.
>
> Some additional macros [1.3] can be added for users to do prefetch based on 
> number of cache-lines, besides number of bytes.
>
> Could you please suggest on the issue and proposal?
>
> [1]. Patches
> [1.1] build: support 128B/64B cache line size in Arm image, 
> https://gerrit.fd.io/r/c/vpp/+/32968/2 
> <https://gerrit.fd.io/r/c/vpp/+/32968/2>
> [1.2] vppinfra: refactor prefetch macro, 
> https://gerrit.fd.io/r/c/vpp/+/32969/3 
> <https://gerrit.fd.io/r/c/vpp/+/32969/3>
> [1.3] vppinfra: fix functions to prefetch single line, 
> https://gerrit.fd.io/r/c/vpp/+/32970/2 
> <https://gerrit.fd.io/r/c/vpp/+/32970/2>
> [1.4] misc: correct prefetch macro usage, 
> https://gerrit.fd.io/r/c/vpp/+/32971/3 
> <https://gerrit.fd.io/r/c/vpp/+/32971/3>
> [1.5] build: set 64B cache line size in Arm image, 
> https://gerrit.fd.io/r/c/vpp/+/32972/2 
> <https://gerrit.fd.io/r/c/vpp/+/32972/2>
>
> [2]. Error message
> src/plugins/dpdk/device/init.c:1916:3: error: static_assert failed due to 
> requirement '128 == 1 << 6' "DPDK RTE CACHE LINE SIZE does not match with 
> 1<<CLIB_LOG2_CACHE_LINE_BYTES"
>   STATIC_ASSERT (RTE_CACHE_LINE_SIZE == 1 << CLIB_LOG2_CACHE_LINE_BYTES,
>   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> /home/lijian/tasks/plsremove/src/vppinfra/error_bootstrap.h:111:34: note: 
> expanded from macro 'STATIC_ASSERT'
> #define STATIC_ASSERT(truth,...) _Static_assert(truth, __VA_ARGS__)
>                                  ^              ~~~~~
> 1 error generated.
>
> [3]. Prefetch macro definitions in VPP source code.
> ‘’’
> #define _CLIB_PREFETCH(n,size,type)                                           
>      \
>   if ((size) > (n)*CLIB_CACHE_LINE_BYTES)                                     
>     \
>     __builtin_prefetch (_addr + (n)*CLIB_CACHE_LINE_BYTES,       \
>                                            CLIB_PREFETCH_##type,              
>                             \
>                                            /* locality */ 3);
>
> #define CLIB_PREFETCH(addr,size,type)               \
> do {                                                                          
>      \
>   void * _addr = (addr);                               \
>                                                                               
>         \
>   ASSERT ((size) <= 4*CLIB_CACHE_LINE_BYTES); \
>   _CLIB_PREFETCH (0, size, type);                            \
>   _CLIB_PREFETCH (1, size, type);                            \
>   _CLIB_PREFETCH (2, size, type);                            \
>   _CLIB_PREFETCH (3, size, type);                            \
> } while (0)
> ‘’’
>
> [4]
> 4.1 CLIB_PREFETCH (p2->pre_data, 32, STORE);
> 4.2 CLIB_PREFETCH (p2->pre_data, 68, STORE);
> 4.3 CLIB_PREFETCH (b[4]->data, sizeof (ip0[0]), LOAD);
> 4.4 CLIB_PREFETCH (p2->data, 2*CLIB_CACHE_LINE_BYTES, LOAD);
>
> Thanks.
> 
> 
>
> 
> 
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19709): https://lists.fd.io/g/vpp-dev/message/19709
Mute This Topic: https://lists.fd.io/mt/83991823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] Prefetches improvement for VPP Arm generic image

Reply via email to