Re: [vpp-dev] Future VPP release plan information

2021-07-14 Thread Andrew Yourtchenko
Hi Ashish,

We just discussed the 21.10 release plan yesterday in the VPP community meeting.

It’s now linked off the VPP project page in the usual place.

--a

> On 14 Jul 2021, at 08:58, ashish.sax...@hsc.com wrote:
> 
> Hi Devs,
> 
> Can someone please let me know how can I get information regarding future VPP 
> releases .
> 
> Thanks and Regards,
> Ashish 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19765): https://lists.fd.io/g/vpp-dev/message/19765
Mute This Topic: https://lists.fd.io/mt/84196453/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] High packet drop under high binary-api call rate #binapi #vpp #vpp-dev #vapi

2021-07-14 Thread Govindarajan Mohandoss

Would there be any suggestions that can achieve a lower packet drop rate under 
a high binary-API call rate?

Implementing ready copy update scheme in VPP can address this issue.
https://doc.dpdk.org/guides/prog_guide/rcu_lib.html

Thanks
Govind

From: vpp-dev@lists.fd.io  On Behalf Of benleungbuild via 
lists.fd.io
Sent: Wednesday, July 14, 2021 1:14 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] High packet drop under high binary-api call rate #binapi 
#vpp #vpp-dev #vapi


Hi vpp developers,




**Backgound:**



base on my understanding, the handling of binary API (not-thread-safe) requires 
to stop all other worker threads.

I did a test to study the impact of binary-API call to the packet-drop rate.



topology:

- (dpdk traffic in) > VPP1  (memif) > VPP2  -- (dpdk 
traffic out) ->



The test was run for around 10 minutes, under 8.5 Gbps traffic between 2 vpps 
and around 100 request/second binary API call (classify-add-del) to vpp2 (by 
ligato-vpp-agent), overall around 15-20% of packets are dropped in the path 
from vpp1 to vpp2.

If there is no binary API call, no packet is dropped.




**Questions:**

1. Would there be any suggestions that can achieve a lower packet drop rate 
under a high binary-API call rate?

2. Are there any plan in future vpp release that can improve the "locking" of 
worker thread for non-thread-safe binary API call?



Best

Ben


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19766): https://lists.fd.io/g/vpp-dev/message/19766
Mute This Topic: https://lists.fd.io/mt/84196118/21656
Mute #vpp:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp
Mute #binapi:https://lists.fd.io/g/vpp-dev/mutehashtag/binapi
Mute #vapi:https://lists.fd.io/g/vpp-dev/mutehashtag/vapi
Mute #vpp-dev:https://lists.fd.io/g/vpp-dev/mutehashtag/vpp-dev
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] Prefetches improvement for VPP Arm generic image

2021-07-14 Thread Damjan Marion via lists.fd.io

I spent a bit of time to look at this and come up with some reasonable solution.

First, 128-byte cacheline is not dead, recently announced Marvell Octeon 10 
have 128 byte cacheline.

In current code cacheline size defines both amount of data prefetch instruction 
prefetches and
alignment of data in data structures needed to avoid false sharing.

So I think ideally we should have following:

- on x86:
  - number of bytes prefetch instruction prefetches set to 64
  - data structures should be aligned to 64 bytes
  - due the fact that there is adjacent cacehline prefetcher on x86 it may be 
worth
investigating if aligning to 128 brings some value

- on AArch64
  - number of bytes prefetch instruction prefetches set to 64 or 128, based on 
multiarch variant running
  - data structures should be aligned to 128 bytes as that value prevents false 
sharing for both 64 and 128 byte cacheline systems

Main problem is abuse of CLIB_PREFETCH() macro in our codebase.
Original idea of it was good, somebody wanted to provide macro which 
transparently emits 1-4 prefetch
instructions based on data size recognising that there may be systems with 
different cacheline size

Like:
  CLIB_PREFETCH (p, sizeof (ip6_header_t), LOAD);

But reality is, most of the time we have:
  CLIB_PREFETCH (p, CLIB_CACHE_LINE_BYTES, LOAD);

Where it is assumed that cacheline size is 64 and that just wasted resources on 
system with 128-byte cacheline.

Also, most of places in our codebase are perfectly fine with whatever cacheline 
size is, so I’m thinking about following:

1. set CLIB_CACHE_LINE_BYTES to 64 on x86 and 128 on ARM, that will make sure 
false sharing is not happening

2. introduce CLIB_CACHE_PREFETCH_BYTES which can be set to different value for 
different multiarch variant (64 for N1, 128 ThinderX2)

3. modify CLIB_PREFETCH macro to use CLIB_CACHE_PREFETCH_BYTES to emit proper 
number of prefetch instructions for cases where data size is specified

4. take the stub and replace all `CLIB_PREFETCH (p, CLIB_CACHE_LINE_BYTES, 
LOAD);` with `clib_prefetch_load (p);`.
   There may be exceptions but those lines typically mean: "i want to prefetch 
few (<=64) bytes at this address and i really don’t care what the cache line 
size is”.

5. analyse remaining few cases where CLIB_PREFETCH() is used with size 
specified by CLIB_CACHE_LINE_BYTES.

Thoughts?

— 
Damjan

> On 06.07.2021., at 03:48, Lijian Zhang  wrote:
> 
> Thanks Damjan for your comments. Some replies in lines.
> 
> Hi Lijian,
>  
> It will be good to know if 128 byte cacheline is something ARM platforms will 
> be using in the future or it is just historical.
> [Lijian] Existing ThunderX1 and OcteonTX2 CPUs are 128 byte cache-line. To my 
> knowledge, there may be more CPUs with 128 byte cache-line in the future.
>  
> Cacheline size problem is not just about prefetching, even bigger issue is 
> false sharing, so we need to address both.
> [Lijian] Yes, there may be false-sharing issue when running VPP image with 
> 64B definition on 128B cache-line CPUs. We will do some scalability testing 
> with that case, and check the multi-core performance.
>  
> Probably best solution is to have 2 VPP images, one for 128 and one for 64 
> byte cacheline size.
> [Lijian] For native built image, that’s fine. But I’m not sure if it’s 
> possible for cloud binaries installed via “apt-get install”.
>  
> Going across the whole codebase and replacing prefetch macros is something we 
> should definitely avoid.
> [Lijian] I got your concerns on large scope replacement. My concern is when 
> CLIB_PREFETCH() is used to prefetch packet content into cache as below 
> example, cache-line (CLIB_CACHE_LINE_BYTES) seems to be assumed as 64 bytes 
> always.
> CLIB_PREFETCH (p2->data, 3 * CLIB_CACHE_LINE_BYTES, LOAD);
>  
> — 
> Damjan
> 
> 
> On 05.07.2021., at 07:28, Lijian Zhang  wrote:
>  
> Hi Damjan,
> I committed several patches to address some issues around cache-line 
> definitions in VPP.
>  
> Patch [1.1] is to resolve the build error [2] on 64Byte cache line Arm CPUs, 
> e.g., ThunderX2, NeoverseN1, caused by the commit 
> (https://gerrit.fd.io/r/c/vpp/+/32996, build: remove unused files and 
> sections).
> It also supports building Arm generic image (with command of “make 
> build-release”) with 128Byte cache line definition, and building native image 
> with 64Byte cache line definition on some Arm CPUs, e.g., ThunderX2, 
> NeoverseN1 (with command of “make build-release TARGET_PLATFORM=native”).
>  
> Patch [1.5] is to set the default cache line definition in Arm generic image 
> from 128Byte to 64Byte.
> Setting cache line definition to 128Byte for Arm generic image is required 
> for ThunderX1 (with 128Byte physical cache line), which is also the build 
> machine in FD.io lab. I’m thinking for setting 64Byte cache line definition 
> in VPP for Arm image, which will affect ThunderX1 and OcteonTX2 CPUs. So it 
> requires the confirmation by Marvell.
>  
> Arm architecture CPUs hav

Re: [vpp-dev] Prefetches improvement for VPP Arm generic image

2021-07-14 Thread Florin Coras
+1

Florin

> On Jul 14, 2021, at 10:47 AM, Damjan Marion via lists.fd.io 
>  wrote:
> 
> 
> I spent a bit of time to look at this and come up with some reasonable 
> solution.
> 
> First, 128-byte cacheline is not dead, recently announced Marvell Octeon 10 
> have 128 byte cacheline.
> 
> In current code cacheline size defines both amount of data prefetch 
> instruction prefetches and
> alignment of data in data structures needed to avoid false sharing.
> 
> So I think ideally we should have following:
> 
> - on x86:
>  - number of bytes prefetch instruction prefetches set to 64
>  - data structures should be aligned to 64 bytes
>  - due the fact that there is adjacent cacehline prefetcher on x86 it may be 
> worth
>investigating if aligning to 128 brings some value
> 
> - on AArch64
>  - number of bytes prefetch instruction prefetches set to 64 or 128, based on 
> multiarch variant running
>  - data structures should be aligned to 128 bytes as that value prevents 
> false sharing for both 64 and 128 byte cacheline systems
> 
> Main problem is abuse of CLIB_PREFETCH() macro in our codebase.
> Original idea of it was good, somebody wanted to provide macro which 
> transparently emits 1-4 prefetch
> instructions based on data size recognising that there may be systems with 
> different cacheline size
> 
> Like:
>  CLIB_PREFETCH (p, sizeof (ip6_header_t), LOAD);
> 
> But reality is, most of the time we have:
>  CLIB_PREFETCH (p, CLIB_CACHE_LINE_BYTES, LOAD);
> 
> Where it is assumed that cacheline size is 64 and that just wasted resources 
> on system with 128-byte cacheline.
> 
> Also, most of places in our codebase are perfectly fine with whatever 
> cacheline size is, so I’m thinking about following:
> 
> 1. set CLIB_CACHE_LINE_BYTES to 64 on x86 and 128 on ARM, that will make sure 
> false sharing is not happening
> 
> 2. introduce CLIB_CACHE_PREFETCH_BYTES which can be set to different value 
> for different multiarch variant (64 for N1, 128 ThinderX2)
> 
> 3. modify CLIB_PREFETCH macro to use CLIB_CACHE_PREFETCH_BYTES to emit proper 
> number of prefetch instructions for cases where data size is specified
> 
> 4. take the stub and replace all `CLIB_PREFETCH (p, CLIB_CACHE_LINE_BYTES, 
> LOAD);` with `clib_prefetch_load (p);`.
>   There may be exceptions but those lines typically mean: "i want to prefetch 
> few (<=64) bytes at this address and i really don’t care what the cache line 
> size is”.
> 
> 5. analyse remaining few cases where CLIB_PREFETCH() is used with size 
> specified by CLIB_CACHE_LINE_BYTES.
> 
> Thoughts?
> 
> — 
> Damjan
> 
>> On 06.07.2021., at 03:48, Lijian Zhang  wrote:
>> 
>> Thanks Damjan for your comments. Some replies in lines.
>> 
>> Hi Lijian,
>> 
>> It will be good to know if 128 byte cacheline is something ARM platforms 
>> will be using in the future or it is just historical.
>> [Lijian] Existing ThunderX1 and OcteonTX2 CPUs are 128 byte cache-line. To 
>> my knowledge, there may be more CPUs with 128 byte cache-line in the future.
>> 
>> Cacheline size problem is not just about prefetching, even bigger issue is 
>> false sharing, so we need to address both.
>> [Lijian] Yes, there may be false-sharing issue when running VPP image with 
>> 64B definition on 128B cache-line CPUs. We will do some scalability testing 
>> with that case, and check the multi-core performance.
>> 
>> Probably best solution is to have 2 VPP images, one for 128 and one for 64 
>> byte cacheline size.
>> [Lijian] For native built image, that’s fine. But I’m not sure if it’s 
>> possible for cloud binaries installed via “apt-get install”.
>> 
>> Going across the whole codebase and replacing prefetch macros is something 
>> we should definitely avoid.
>> [Lijian] I got your concerns on large scope replacement. My concern is when 
>> CLIB_PREFETCH() is used to prefetch packet content into cache as below 
>> example, cache-line (CLIB_CACHE_LINE_BYTES) seems to be assumed as 64 bytes 
>> always.
>> CLIB_PREFETCH (p2->data, 3 * CLIB_CACHE_LINE_BYTES, LOAD);
>> 
>> — 
>> Damjan
>> 
>> 
>> On 05.07.2021., at 07:28, Lijian Zhang  wrote:
>> 
>> Hi Damjan,
>> I committed several patches to address some issues around cache-line 
>> definitions in VPP.
>> 
>> Patch [1.1] is to resolve the build error [2] on 64Byte cache line Arm CPUs, 
>> e.g., ThunderX2, NeoverseN1, caused by the commit 
>> (https://gerrit.fd.io/r/c/vpp/+/32996, build: remove unused files and 
>> sections).
>> It also supports building Arm generic image (with command of “make 
>> build-release”) with 128Byte cache line definition, and building native 
>> image with 64Byte cache line definition on some Arm CPUs, e.g., ThunderX2, 
>> NeoverseN1 (with command of “make build-release TARGET_PLATFORM=native”).
>> 
>> Patch [1.5] is to set the default cache line definition in Arm generic image 
>> from 128Byte to 64Byte.
>> Setting cache line definition to 128Byte for Arm generic image is required 
>> for ThunderX1 (with 128Byte physical cache li

[vpp-dev] MPLS protection

2021-07-14 Thread Gudimetla, Leela Sankar via lists.fd.io
Hello,

I am trying out MPLS configurations to test what all features are supported 
currently on stable 1908.

I don’t see protection for LSP like HA or FRR support explicitly. So I am 
wondering if it is supported yet. Or I may be missing something.

Can someone please share (documentation, code) what all protection mechanism 
supported for MPLS?

Thanks,
Leela sankar

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19769): https://lists.fd.io/g/vpp-dev/message/19769
Mute This Topic: https://lists.fd.io/mt/84209148/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] "who should review my change?"

2021-07-14 Thread Andrew Yourtchenko
I wrote a small and relatively hacky script that may, or may not be
useful, wanted to let the folks know, and get feedback, if any...

https://gerrit.fd.io/r/c/vpp/+/33139

The idea is to get the components data from MAINTAINERS for the
changed files, and then use it to generate the list of the reviewers.
Of course potentially
this could be used to auto-set the component name in the commit
message too (provided the MAINTAINERS is accurate).

I wrote it in perl to avoid any dependencies, because I assume most
linux installations will have perl installed by default (which may or
may not be true).

without arguments, it uses the commit from the HEAD, if you supply a
hash it will use that to grab that commit.

This is what it looks like:

ayourtch@ayourtch-lnx:~/vpp$ ./extras/scripts/vpp-review
commit 4b18066a18f0129e2758d6b6f4f0126ac721e48f
Author: Andrew Yourtchenko 
Date:   Wed Jul 14 22:44:05 2021 +0200

misc: experimental script to get the list of the reviewers for a commit

accepts zero or one argument (the commit hash), and outputs
the detected components, the component maintainers,
and the final suggested reviewer list

Change-Id: Ief671fe837c6201bb11fd05d02af881822b0bb33
Type: improvement
Signed-off-by: Andrew Yourtchenko 

:00 100755 0 b9968161b Aextras/scripts/vpp-review


Components of files in the commit:
misc: extras/scripts/vpp-review

Component misc maintainers:
   vpp-dev Mailing List 


Final reviewer list:
   vpp-dev Mailing List 
ayourtch@ayourtch-lnx:~/vpp$


--a

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19770): https://lists.fd.io/g/vpp-dev/message/19770
Mute This Topic: https://lists.fd.io/mt/84211716/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] ACL IPV6 rule addition using the "set acl_plugin acl" command from "vppctl" #vppctl #acl #acl_plugin #ipv6

2021-07-14 Thread RaviKiran Veldanda
Hi Experts,
We were trying to create some ACL rules for IPv6 addresses,
*"set acl-plugin acl permit src 2001:5b0::1150::0/64 " in vppctl.
* "set acl-plugin acl permit ipv6 src 2001:5b0::1150::0/64 " in vppctl. 
giving ACL index but when I check "show acl_plugin acl" its not giving any 
information.

vpp# set acl-plugin acl ipv6 permit src 2001:5b0::1150::0/64
ACL index:1
vpp# show acl-plugin acl
acl-index 0 count 1 tag {cli}
0: ipv4 permit src 172.25.169.0/24 dst 0.0.0.0/0 proto 0 sport 0-65535 dport 
0-65535
acl-index 1 count 0 tag {cli}
vpp#
We are using VPP 20.05 stable version. We couldn't able to set the ACL for IPv6.
We are not seeing any error message on the logs.
We could able to set ACL for IPv4 without any issue.
We tried same thing from vpp_api_test, still we couldn't able to set IPv6 rule.
Can you please provide some pointer for creating "acl rule for IPV6."
Thanks for your help.

//Ravi

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19771): https://lists.fd.io/g/vpp-dev/message/19771
Mute This Topic: https://lists.fd.io/mt/84212274/21656
Mute #acl_plugin:https://lists.fd.io/g/vpp-dev/mutehashtag/acl_plugin
Mute #ipv6:https://lists.fd.io/g/vpp-dev/mutehashtag/ipv6
Mute #vppctl:https://lists.fd.io/g/vpp-dev/mutehashtag/vppctl
Mute #acl:https://lists.fd.io/g/vpp-dev/mutehashtag/acl
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] ACL IPV6 rule addition using the "set acl_plugin acl" command from "vppctl" #vppctl #acl #acl_plugin #ipv6

2021-07-14 Thread Andrew Yourtchenko
Ravi,

appears that the commit 2f8cd914514fe54f91974c6d465d4769dfac8de8 has
hardcoded the IP address family in the CLI handler to IPv4:

0490db79b src/plugins/acl/acl.c(Neale Ranns2020-03-24
15:09:41 + 2873)   else if (unformat (line_input, "src %U/%d",
bf883bb086 src/plugins/acl/acl.c(Neale Ranns2020-04-23
16:01:20 + 2874)  unformat_ip46_address, &src,
IP46_TYPE_ANY,
bf883bb086 src/plugins/acl/acl.c(Neale Ranns2020-04-23
16:01:20 + 2875)  &src_prefix_length))
40490db79b src/plugins/acl/acl.c(Neale Ranns2020-03-24
15:09:41 + 2876) {
40490db79b src/plugins/acl/acl.c(Neale Ranns2020-03-24
15:09:41 + 2877)   vec_validate_acl_rules (rules, rule_idx);
2f8cd91451 src/plugins/acl/acl.c(Jakub Grajciar 2020-03-27
06:55:06 +0100 2878)   ip_address_encode (&src, IP46_TYPE_ANY,
2f8cd91451 src/plugins/acl/acl.c(Jakub Grajciar 2020-03-27
06:55:06 +0100 2879)
&rules[rule_idx].src_prefix.address);
2f8cd91451 src/plugins/acl/acl.c(Jakub Grajciar 2020-03-27
06:55:06 +0100 2880)   rules[rule_idx].src_prefix.address.af =
ADDRESS_IP4;
2f8cd91451 src/plugins/acl/acl.c(Jakub Grajciar 2020-03-27
06:55:06 +0100 2881)   rules[rule_idx].src_prefix.len =
src_prefix_length;
40490db79b src/plugins/acl/acl.c(Neale Ranns2020-03-24
15:09:41 + 2882) }
40490db79b src/plugins/acl/acl.c(Neale Ranns2020-03-24
15:09:41 + 2883)   else if (unformat (line_input, "dst %U/%d",
bf883bb086 src/plugins/acl/acl.c(Neale Ranns2020-04-23
16:01:20 + 2884)  unformat_ip46_address, &dst,
IP46_TYPE_ANY,


I am including the commit author for the clarification on how that
code is supposed to work for the IPv6 case.

Workaround is to use the "binary-api" command which will use vat code
which will work for you:

vpp# binary-api acl_add_replace -1 permit src 2001:db8::1/128
vl_api_acl_add_replace_reply_t_handler:72: ACL index: 0
vpp# show acl acl
acl-index 0 count 1 tag {}
  0: ipv6 permit src 2001:db8::1/128 dst ::/0 proto 0 sport
0-65535 dport 0-65535
vpp#

--a


On 7/14/21, RaviKiran Veldanda  wrote:
> Hi Experts,
> We were trying to create some ACL rules for IPv6 addresses,
> *"set acl-plugin acl permit src 2001:5b0::1150::0/64 " in vppctl.
> * "set acl-plugin acl permit ipv6 src 2001:5b0::1150::0/64 " in vppctl.
> giving ACL index but when I check "show acl_plugin acl" its not giving any
> information.
>
> vpp# set acl-plugin acl ipv6 permit src 2001:5b0::1150::0/64
> ACL index:1
> vpp# show acl-plugin acl
> acl-index 0 count 1 tag {cli}
> 0: ipv4 permit src 172.25.169.0/24 dst 0.0.0.0/0 proto 0 sport 0-65535 dport
> 0-65535
> acl-index 1 count 0 tag {cli}
> vpp#
> We are using VPP 20.05 stable version. We couldn't able to set the ACL for
> IPv6.
> We are not seeing any error message on the logs.
> We could able to set ACL for IPv4 without any issue.
> We tried same thing from vpp_api_test, still we couldn't able to set IPv6
> rule.
> Can you please provide some pointer for creating "acl rule for IPV6."
> Thanks for your help.
>
> //Ravi
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19772): https://lists.fd.io/g/vpp-dev/message/19772
Mute This Topic: https://lists.fd.io/mt/84212274/21656
Mute #acl_plugin:https://lists.fd.io/g/vpp-dev/mutehashtag/acl_plugin
Mute #ipv6:https://lists.fd.io/g/vpp-dev/mutehashtag/ipv6
Mute #vppctl:https://lists.fd.io/g/vpp-dev/mutehashtag/vppctl
Mute #acl:https://lists.fd.io/g/vpp-dev/mutehashtag/acl
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] Please advise on 0 copy memif and memif mode #vpp-memif

2021-07-14 Thread PRANAB DAS
Hi

We have a user mode DPDK or non-DPDK application that needs to receive and 
transmit packet from/to VPP. We plan use memif interface in VPP and use memif 
in the application.
Could you provide some guidelines on which mode (master or slave) the memif 
interface needs to be created in VPP and in the application?
We would like to achieve 0 memory copy in both direction.

< VPP --->                  
                       < my-user-mode-application/DPDK app>

NIC> VPP DPDK rx-poll---> My-VPP_plugin->memif 
interface>rx packet without packet copy, in the same rx buffer from 
NIC
NIC< VPP DPDK rx-poll<--- My-VPP_plugin<-memif 
interface

Re: [vpp-dev] Prefetches improvement for VPP Arm generic image

2021-07-14 Thread Lijian Zhang
It looks good.
Thanks.
From: Florin Coras 
Subject: Re: [vpp-dev] Prefetches improvement for VPP Arm generic image

+1

Florin


On Jul 14, 2021, at 10:47 AM, Damjan Marion via lists.fd.io 
mailto:dmarion=me@lists.fd.io>> wrote:


I spent a bit of time to look at this and come up with some reasonable solution.

First, 128-byte cacheline is not dead, recently announced Marvell Octeon 10 
have 128 byte cacheline.

In current code cacheline size defines both amount of data prefetch instruction 
prefetches and
alignment of data in data structures needed to avoid false sharing.

So I think ideally we should have following:

- on x86:
 - number of bytes prefetch instruction prefetches set to 64
 - data structures should be aligned to 64 bytes
 - due the fact that there is adjacent cacehline prefetcher on x86 it may be 
worth
   investigating if aligning to 128 brings some value

- on AArch64
 - number of bytes prefetch instruction prefetches set to 64 or 128, based on 
multiarch variant running
 - data structures should be aligned to 128 bytes as that value prevents false 
sharing for both 64 and 128 byte cacheline systems

Main problem is abuse of CLIB_PREFETCH() macro in our codebase.
Original idea of it was good, somebody wanted to provide macro which 
transparently emits 1-4 prefetch
instructions based on data size recognising that there may be systems with 
different cacheline size

Like:
 CLIB_PREFETCH (p, sizeof (ip6_header_t), LOAD);

But reality is, most of the time we have:
 CLIB_PREFETCH (p, CLIB_CACHE_LINE_BYTES, LOAD);

Where it is assumed that cacheline size is 64 and that just wasted resources on 
system with 128-byte cacheline.

Also, most of places in our codebase are perfectly fine with whatever cacheline 
size is, so I’m thinking about following:

1. set CLIB_CACHE_LINE_BYTES to 64 on x86 and 128 on ARM, that will make sure 
false sharing is not happening

2. introduce CLIB_CACHE_PREFETCH_BYTES which can be set to different value for 
different multiarch variant (64 for N1, 128 ThinderX2)

3. modify CLIB_PREFETCH macro to use CLIB_CACHE_PREFETCH_BYTES to emit proper 
number of prefetch instructions for cases where data size is specified

4. take the stub and replace all `CLIB_PREFETCH (p, CLIB_CACHE_LINE_BYTES, 
LOAD);` with `clib_prefetch_load (p);`.
  There may be exceptions but those lines typically mean: "i want to prefetch 
few (<=64) bytes at this address and i really don’t care what the cache line 
size is”.

5. analyse remaining few cases where CLIB_PREFETCH() is used with size 
specified by CLIB_CACHE_LINE_BYTES.

Thoughts?

—
Damjan


On 06.07.2021., at 03:48, Lijian Zhang 
mailto:lijian.zh...@arm.com>> wrote:

Thanks Damjan for your comments. Some replies in lines.

Hi Lijian,

It will be good to know if 128 byte cacheline is something ARM platforms will 
be using in the future or it is just historical.
[Lijian] Existing ThunderX1 and OcteonTX2 CPUs are 128 byte cache-line. To my 
knowledge, there may be more CPUs with 128 byte cache-line in the future.

Cacheline size problem is not just about prefetching, even bigger issue is 
false sharing, so we need to address both.
[Lijian] Yes, there may be false-sharing issue when running VPP image with 64B 
definition on 128B cache-line CPUs. We will do some scalability testing with 
that case, and check the multi-core performance.

Probably best solution is to have 2 VPP images, one for 128 and one for 64 byte 
cacheline size.
[Lijian] For native built image, that’s fine. But I’m not sure if it’s possible 
for cloud binaries installed via “apt-get install”.

Going across the whole codebase and replacing prefetch macros is something we 
should definitely avoid.
[Lijian] I got your concerns on large scope replacement. My concern is when 
CLIB_PREFETCH() is used to prefetch packet content into cache as below example, 
cache-line (CLIB_CACHE_LINE_BYTES) seems to be assumed as 64 bytes always.
CLIB_PREFETCH (p2->data, 3 * CLIB_CACHE_LINE_BYTES, LOAD);

—
Damjan


On 05.07.2021., at 07:28, Lijian Zhang 
mailto:lijian.zh...@arm.com>> wrote:

Hi Damjan,
I committed several patches to address some issues around cache-line 
definitions in VPP.

Patch [1.1] is to resolve the build error [2] on 64Byte cache line Arm CPUs, 
e.g., ThunderX2, NeoverseN1, caused by the commit 
(https://gerrit.fd.io/r/c/vpp/+/32996, build: remove unused files and sections).
It also supports building Arm generic image (with command of “make 
build-release”) with 128Byte cache line definition, and building native image 
with 64Byte cache line definition on some Arm CPUs, e.g., ThunderX2, NeoverseN1 
(with command of “make build-release TARGET_PLATFORM=native”).

Patch [1.5] is to set the default cache line definition in Arm generic image 
from 128Byte to 64Byte.
Setting cache line definition to 128Byte for Arm generic image is required for 
ThunderX1 (with 128Byte physical cache line), which is also the build machine 
in FD.io lab. I’m t