Re: [vpp-dev] ACL plugin optimization

2020-05-29 Thread Govindarajan Mohandoss
Thanks Andrew. I will fix the issue and get back to you.

> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Andrew
> Yourtchenko via lists.fd.io
> Sent: Wednesday, May 27, 2020 4:51 PM
> To: Govindarajan Mohandoss 
> Cc: vpp-dev@lists.fd.io; Lijian Zhang ; Jieqiang
> Wang ; Honnappa Nagarahalli
> ; nd 
> Subject: Re: [vpp-dev] ACL plugin optimization
> 
> Hi Govind,
> 
> 1) According to Jenkins, this patch permits some of the packets that should
> be denied, hence JJB voted "-1".
> 
> 2) If you suspect merely the prefetches are the issue, just commenting out
> the body of prefetch_session_entry() in the original code should turn it into 
> a
> no-op that doesn't break anything.
> 
> Hard to say anything else given the functionality is not correct.
> 
> In general - ensure you run "EXTENDED_TESTS=y TEST=acl* make test" as a
> sanity check before extensive perf-tests. It's not a 100% guarantee but it 
> does
> catch a few naughty cases.
> 
> Also - take a look at f1cd92d8d9, which got about 30% improvement back in
> the day, and is the source of much of the trickiness in that node.
> 
> --a
> 
> 
> On 5/27/20, Govindarajan Mohandoss
>  wrote:
> > Hi Andrew,
> >
> >   While profiling the ACL plugin node using perf tool in ARM Neoverse
> > platform, Bihash related prefetches were shown as bottleneck.
> >
> > Performance improvement is seen in ARM N1, TX2 and Intel Skylake
> > servers after removing those prefetches. Testing is done with Ingress
> > ACL/IPv4 forwarding in both SF and SL modes.
> >
> > As the code change is common for Ingress/Egress ACL for both IPv4 and
> > IPv6, performance improvement is expected for those cases also.
> >
> > Following are the test results for Ingress ACL / IPv4 / 1 core / 64B @
> > MRR in ARM N1, TX2 and Intel Skylake servers:
> >
> >
> >
> > Legend:
> >
> > ===
> >
> > N1 - ARM Neoverse
> >
> > TX2 - ARM Thunder X2
> >
> > SKX - Intel Skylake
> >
> > SL: % imp - Performance improvement in stateless mode
> >
> > SF: % imp - Performance improvement in stateful mode
> >
> >
> >
> >
> >
> >
> > SKX
> > N1
> > TX2
> > Num Rules
> > Matching Rules
> > SL: Avg % imp
> > SF: Avg % imp
> > SL: % imp
> > SF: % imp
> > SL: % imp
> > SF: % imp
> > 1
> > 1
> > 0.99
> > 12.09
> > 8.38
> > 10.41
> > 4.48
> > 4.63
> > 50
> > 1 (50th)
> > 0.79
> > 9.63
> > 8.76
> > 10.06
> > 5.32
> > 4.63
> > 100
> > 1 (100th)
> > 4.34
> > 10.75
> > 8.60
> > 10.06
> > 6.98
> > 4.63
> > 1000
> > 1(1000th)
> > 4.18
> > 13.06
> > 8.61
> > 11.14
> > 6.17
> > 5.58
> > 100
> > 100
> > 3.70
> > 11.70
> > 6.65
> > 14
> > 2.82
> > 6.53
> > 1000
> > 1000
> > 1.84
> > 15.96
> > 5.52
> > 27.72
> > 4.72
> > 8.69
> >
> >
> >
> >
> >
> > Please find the patch here: https://gerrit.fd.io/r/c/vpp/+/27167
> >
> >
> >
> > I ran per patch regression on ARM Taishan server in CSIT lab.
> > Following are the results for Stateless and Stateful modes:
> >
> > 1.  perftest-3n-tsh acl_statelessAND1cAND64b:
> >
> >
> > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/23/consol
> > eFull
> >
> >  In the log, I can see the comparative numbers between parent and
> > current (my patch) for 45 test cases.
> >
> >  I searched for "Difference of averages relative to parent" in the
> > log -
> >  41/45 test cases have shown around 4% improvement with the patch.
> > Rest of the 4 test cases stayed neutral.
> >
> >
> >
> > 2. perftest-3n-tsh acl_statefulAND1cAND64b:
> >
> > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/25/
> >
> > Performance improvement is seen in all 36 test cases.
> >
> >
> >
> > Please provide your comments.
> >
> >
> >
> > Thanks
> >
> > Govind
> >
> >
> >
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16582): https://lists.fd.io/g/vpp-dev/message/16582
Mute This Topic: https://lists.fd.io/mt/74507621/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] ACL plugin optimization

2020-05-29 Thread Govindarajan Mohandoss
Thanks Neale. If will fix it and recheck.

> -Original Message-
> From: Neale Ranns (nranns) 
> Sent: Thursday, May 28, 2020 1:56 AM
> To: Andrew Yourtchenko ; Govindarajan Mohandoss
> 
> Cc: vpp-dev@lists.fd.io; Lijian Zhang ; Jieqiang
> Wang ; Honnappa Nagarahalli
> ; nd 
> Subject: Re: [vpp-dev] ACL plugin optimization
> 
> 
> Hi Govind,
> 
> As well as removing the prefetches, you've also removed the per packet call
> to acl_fa_find_session_with_hash(). So IIUC you've removed the per-packet
> session lookup and instead re-use the lookup of packet 0 each time. that'll
> make things quicker but it's not functionally correct.
> 
> /neale
> 
> On 27/05/2020 23:51, "vpp-dev@lists.fd.io on behalf of Andrew
> Yourtchenko"  wrote:
> 
> Hi Govind,
> 
> 1) According to Jenkins, this patch permits some of the packets that
> should be denied, hence JJB voted "-1".
> 
> 2) If you suspect merely the prefetches are the issue, just commenting
> out the body of prefetch_session_entry() in the original code should
> turn it into a no-op that doesn't break anything.
> 
> Hard to say anything else given the functionality is not correct.
> 
> In general - ensure you run "EXTENDED_TESTS=y TEST=acl* make test" as
> a sanity check before extensive perf-tests. It's not a 100% guarantee
> but it does catch a few naughty cases.
> 
> Also - take a look at f1cd92d8d9, which got about 30% improvement back
> in the day, and is the source of much of the trickiness in that node.
> 
> --a
> 
> 
> On 5/27/20, Govindarajan Mohandoss
>  wrote:
> > Hi Andrew,
> >
> >   While profiling the ACL plugin node using perf tool in ARM Neoverse
> > platform, Bihash related prefetches were shown as bottleneck.
> >
> > Performance improvement is seen in ARM N1, TX2 and Intel Skylake
> servers
> > after removing those prefetches. Testing is done with Ingress ACL/IPv4
> > forwarding in both SF and SL modes.
> >
> > As the code change is common for Ingress/Egress ACL for both IPv4 and
> IPv6,
> > performance improvement is expected for those cases also.
> >
> > Following are the test results for Ingress ACL / IPv4 / 1 core / 64B @ 
> MRR
> > in ARM N1, TX2 and Intel Skylake servers:
> >
> >
> >
> > Legend:
> >
> > ===
> >
> > N1 - ARM Neoverse
> >
> > TX2 - ARM Thunder X2
> >
> > SKX - Intel Skylake
> >
> > SL: % imp - Performance improvement in stateless mode
> >
> > SF: % imp - Performance improvement in stateful mode
> >
> >
> >
> >
> >
> >
> > SKX
> > N1
> > TX2
> > Num Rules
> > Matching Rules
> > SL: Avg % imp
> > SF: Avg % imp
> > SL: % imp
> > SF: % imp
> > SL: % imp
> > SF: % imp
> > 1
> > 1
> > 0.99
> > 12.09
> > 8.38
> > 10.41
> > 4.48
> > 4.63
> > 50
> > 1 (50th)
> > 0.79
> > 9.63
> > 8.76
> > 10.06
> > 5.32
> > 4.63
> > 100
> > 1 (100th)
> > 4.34
> > 10.75
> > 8.60
> > 10.06
> > 6.98
> > 4.63
> > 1000
> > 1(1000th)
> > 4.18
> > 13.06
> > 8.61
> > 11.14
> > 6.17
> > 5.58
> > 100
> > 100
> > 3.70
> > 11.70
> > 6.65
> > 14
> > 2.82
> > 6.53
> > 1000
> > 1000
> > 1.84
> > 15.96
> > 5.52
> > 27.72
> > 4.72
> > 8.69
> >
> >
> >
> >
> >
> > Please find the patch here: https://gerrit.fd.io/r/c/vpp/+/27167
> >
> >
> >
> > I ran per patch regression on ARM Taishan server in CSIT lab. Following
> are
> > the results for Stateless and Stateful modes:
> >
> > 1.  perftest-3n-tsh acl_statelessAND1cAND64b:
> >
> >
> > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-
> tsh/23/consoleFull
> >
> >  In the log, I can see the comparative numbers between parent and
> > current (my patch) for 45 test cases.
> >
> >  I searched for "Difference of averages relative to parent" in the 
> log -
> >  41/45 test cases have shown around 4% improvement with the patch.
> Rest of
> > the 4 test cases stayed neutral.
> >
> >
> >
> > 2. perftest-3n-tsh acl_statefulAND1cAND64b:
> >
> > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/25/
> >
> > Performance improvement is seen in all 36 test cases.
> >
> >
> >
> > Please provide your comments.
> >
> >
> >
> > Thanks
> >
> > Govind
> >
> >
> >

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16581): https://lists.fd.io/g/vpp-dev/message/16581
Mute This Topic: https://lists.fd.io/mt/74507621/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] ACL plugin optimization

2020-05-28 Thread Neale Ranns via lists.fd.io

Hi Govind,

As well as removing the prefetches, you've also removed the per packet call to 
acl_fa_find_session_with_hash(). So IIUC you've removed the per-packet session 
lookup and instead re-use the lookup of packet 0 each time. that'll make things 
quicker but it's not functionally correct.

/neale

On 27/05/2020 23:51, "vpp-dev@lists.fd.io on behalf of Andrew Yourtchenko" 
 wrote:

Hi Govind,

1) According to Jenkins, this patch permits some of the packets that
should be denied, hence JJB voted "-1".

2) If you suspect merely the prefetches are the issue, just commenting
out the body of prefetch_session_entry() in the original code should
turn it into a no-op that doesn't break anything.

Hard to say anything else given the functionality is not correct.

In general - ensure you run "EXTENDED_TESTS=y TEST=acl* make test" as
a sanity check before extensive perf-tests. It's not a 100% guarantee
but it does catch a few naughty cases.

Also - take a look at f1cd92d8d9, which got about 30% improvement back
in the day, and is the source of much of the trickiness in that node.

--a


On 5/27/20, Govindarajan Mohandoss  wrote:
> Hi Andrew,
>
>   While profiling the ACL plugin node using perf tool in ARM Neoverse
> platform, Bihash related prefetches were shown as bottleneck.
>
> Performance improvement is seen in ARM N1, TX2 and Intel Skylake servers
> after removing those prefetches. Testing is done with Ingress ACL/IPv4
> forwarding in both SF and SL modes.
>
> As the code change is common for Ingress/Egress ACL for both IPv4 and 
IPv6,
> performance improvement is expected for those cases also.
>
> Following are the test results for Ingress ACL / IPv4 / 1 core / 64B @ MRR
> in ARM N1, TX2 and Intel Skylake servers:
>
>
>
> Legend:
>
> ===
>
> N1 - ARM Neoverse
>
> TX2 - ARM Thunder X2
>
> SKX - Intel Skylake
>
> SL: % imp - Performance improvement in stateless mode
>
> SF: % imp - Performance improvement in stateful mode
>
>
>
>
>
>
> SKX
> N1
> TX2
> Num Rules
> Matching Rules
> SL: Avg % imp
> SF: Avg % imp
> SL: % imp
> SF: % imp
> SL: % imp
> SF: % imp
> 1
> 1
> 0.99
> 12.09
> 8.38
> 10.41
> 4.48
> 4.63
> 50
> 1 (50th)
> 0.79
> 9.63
> 8.76
> 10.06
> 5.32
> 4.63
> 100
> 1 (100th)
> 4.34
> 10.75
> 8.60
> 10.06
> 6.98
> 4.63
> 1000
> 1(1000th)
> 4.18
> 13.06
> 8.61
> 11.14
> 6.17
> 5.58
> 100
> 100
> 3.70
> 11.70
> 6.65
> 14
> 2.82
> 6.53
> 1000
> 1000
> 1.84
> 15.96
> 5.52
> 27.72
> 4.72
> 8.69
>
>
>
>
>
> Please find the patch here: https://gerrit.fd.io/r/c/vpp/+/27167
>
>
>
> I ran per patch regression on ARM Taishan server in CSIT lab. Following 
are
> the results for Stateless and Stateful modes:
>
> 1.  perftest-3n-tsh acl_statelessAND1cAND64b:
>
>
> 
https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/23/consoleFull
>
>  In the log, I can see the comparative numbers between parent and
> current (my patch) for 45 test cases.
>
>  I searched for "Difference of averages relative to parent" in the 
log -
>  41/45 test cases have shown around 4% improvement with the patch. Rest of
> the 4 test cases stayed neutral.
>
>
>
> 2. perftest-3n-tsh acl_statefulAND1cAND64b:
>
> https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/25/
>
> Performance improvement is seen in all 36 test cases.
>
>
>
> Please provide your comments.
>
>
>
> Thanks
>
> Govind
>
>
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16545): https://lists.fd.io/g/vpp-dev/message/16545
Mute This Topic: https://lists.fd.io/mt/74507621/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] ACL plugin optimization

2020-05-27 Thread Andrew Yourtchenko
Hi Govind,

1) According to Jenkins, this patch permits some of the packets that
should be denied, hence JJB voted "-1".

2) If you suspect merely the prefetches are the issue, just commenting
out the body of prefetch_session_entry() in the original code should
turn it into a no-op that doesn't break anything.

Hard to say anything else given the functionality is not correct.

In general - ensure you run "EXTENDED_TESTS=y TEST=acl* make test" as
a sanity check before extensive perf-tests. It's not a 100% guarantee
but it does catch a few naughty cases.

Also - take a look at f1cd92d8d9, which got about 30% improvement back
in the day, and is the source of much of the trickiness in that node.

--a


On 5/27/20, Govindarajan Mohandoss  wrote:
> Hi Andrew,
>
>   While profiling the ACL plugin node using perf tool in ARM Neoverse
> platform, Bihash related prefetches were shown as bottleneck.
>
> Performance improvement is seen in ARM N1, TX2 and Intel Skylake servers
> after removing those prefetches. Testing is done with Ingress ACL/IPv4
> forwarding in both SF and SL modes.
>
> As the code change is common for Ingress/Egress ACL for both IPv4 and IPv6,
> performance improvement is expected for those cases also.
>
> Following are the test results for Ingress ACL / IPv4 / 1 core / 64B @ MRR
> in ARM N1, TX2 and Intel Skylake servers:
>
>
>
> Legend:
>
> ===
>
> N1 - ARM Neoverse
>
> TX2 - ARM Thunder X2
>
> SKX - Intel Skylake
>
> SL: % imp - Performance improvement in stateless mode
>
> SF: % imp - Performance improvement in stateful mode
>
>
>
>
>
>
> SKX
> N1
> TX2
> Num Rules
> Matching Rules
> SL: Avg % imp
> SF: Avg % imp
> SL: % imp
> SF: % imp
> SL: % imp
> SF: % imp
> 1
> 1
> 0.99
> 12.09
> 8.38
> 10.41
> 4.48
> 4.63
> 50
> 1 (50th)
> 0.79
> 9.63
> 8.76
> 10.06
> 5.32
> 4.63
> 100
> 1 (100th)
> 4.34
> 10.75
> 8.60
> 10.06
> 6.98
> 4.63
> 1000
> 1(1000th)
> 4.18
> 13.06
> 8.61
> 11.14
> 6.17
> 5.58
> 100
> 100
> 3.70
> 11.70
> 6.65
> 14
> 2.82
> 6.53
> 1000
> 1000
> 1.84
> 15.96
> 5.52
> 27.72
> 4.72
> 8.69
>
>
>
>
>
> Please find the patch here: https://gerrit.fd.io/r/c/vpp/+/27167
>
>
>
> I ran per patch regression on ARM Taishan server in CSIT lab. Following are
> the results for Stateless and Stateful modes:
>
> 1.  perftest-3n-tsh acl_statelessAND1cAND64b:
>
>
> https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/23/consoleFull
>
>  In the log, I can see the comparative numbers between parent and
> current (my patch) for 45 test cases.
>
>  I searched for "Difference of averages relative to parent" in the log -
>  41/45 test cases have shown around 4% improvement with the patch. Rest of
> the 4 test cases stayed neutral.
>
>
>
> 2. perftest-3n-tsh acl_statefulAND1cAND64b:
>
> https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/25/
>
> Performance improvement is seen in all 36 test cases.
>
>
>
> Please provide your comments.
>
>
>
> Thanks
>
> Govind
>
>
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16543): https://lists.fd.io/g/vpp-dev/message/16543
Mute This Topic: https://lists.fd.io/mt/74507621/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] ACL plugin optimization

2020-05-27 Thread Govindarajan Mohandoss
Hi Andrew,

  While profiling the ACL plugin node using perf tool in ARM Neoverse platform, 
Bihash related prefetches were shown as bottleneck.

Performance improvement is seen in ARM N1, TX2 and Intel Skylake servers after 
removing those prefetches. Testing is done with Ingress ACL/IPv4 forwarding in 
both SF and SL modes.

As the code change is common for Ingress/Egress ACL for both IPv4 and IPv6, 
performance improvement is expected for those cases also.

Following are the test results for Ingress ACL / IPv4 / 1 core / 64B @ MRR in 
ARM N1, TX2 and Intel Skylake servers:



Legend:

===

N1 - ARM Neoverse

TX2 - ARM Thunder X2

SKX - Intel Skylake

SL: % imp - Performance improvement in stateless mode

SF: % imp - Performance improvement in stateful mode






SKX
N1
TX2
Num Rules
Matching Rules
SL: Avg % imp
SF: Avg % imp
SL: % imp
SF: % imp
SL: % imp
SF: % imp
1
1
0.99
12.09
8.38
10.41
4.48
4.63
50
1 (50th)
0.79
9.63
8.76
10.06
5.32
4.63
100
1 (100th)
4.34
10.75
8.60
10.06
6.98
4.63
1000
1(1000th)
4.18
13.06
8.61
11.14
6.17
5.58
100
100
3.70
11.70
6.65
14
2.82
6.53
1000
1000
1.84
15.96
5.52
27.72
4.72
8.69





Please find the patch here: https://gerrit.fd.io/r/c/vpp/+/27167



I ran per patch regression on ARM Taishan server in CSIT lab. Following are the 
results for Stateless and Stateful modes:

1.  perftest-3n-tsh acl_statelessAND1cAND64b:

 https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/23/consoleFull

 In the log, I can see the comparative numbers between parent and current 
(my patch) for 45 test cases.

 I searched for "Difference of averages relative to parent" in the log -  
41/45 test cases have shown around 4% improvement with the patch. Rest of the 4 
test cases stayed neutral.



2. perftest-3n-tsh acl_statefulAND1cAND64b:

https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/25/

Performance improvement is seen in all 36 test cases.



Please provide your comments.



Thanks

Govind


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16539): https://lists.fd.io/g/vpp-dev/message/16539
Mute This Topic: https://lists.fd.io/mt/74507621/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-