Re: [vpp-dev] ACL plugin optimization
Thanks Andrew. I will fix the issue and get back to you. > -Original Message- > From: vpp-dev@lists.fd.io On Behalf Of Andrew > Yourtchenko via lists.fd.io > Sent: Wednesday, May 27, 2020 4:51 PM > To: Govindarajan Mohandoss > Cc: vpp-dev@lists.fd.io; Lijian Zhang ; Jieqiang > Wang ; Honnappa Nagarahalli > ; nd > Subject: Re: [vpp-dev] ACL plugin optimization > > Hi Govind, > > 1) According to Jenkins, this patch permits some of the packets that should > be denied, hence JJB voted "-1". > > 2) If you suspect merely the prefetches are the issue, just commenting out > the body of prefetch_session_entry() in the original code should turn it into > a > no-op that doesn't break anything. > > Hard to say anything else given the functionality is not correct. > > In general - ensure you run "EXTENDED_TESTS=y TEST=acl* make test" as a > sanity check before extensive perf-tests. It's not a 100% guarantee but it > does > catch a few naughty cases. > > Also - take a look at f1cd92d8d9, which got about 30% improvement back in > the day, and is the source of much of the trickiness in that node. > > --a > > > On 5/27/20, Govindarajan Mohandoss > wrote: > > Hi Andrew, > > > > While profiling the ACL plugin node using perf tool in ARM Neoverse > > platform, Bihash related prefetches were shown as bottleneck. > > > > Performance improvement is seen in ARM N1, TX2 and Intel Skylake > > servers after removing those prefetches. Testing is done with Ingress > > ACL/IPv4 forwarding in both SF and SL modes. > > > > As the code change is common for Ingress/Egress ACL for both IPv4 and > > IPv6, performance improvement is expected for those cases also. > > > > Following are the test results for Ingress ACL / IPv4 / 1 core / 64B @ > > MRR in ARM N1, TX2 and Intel Skylake servers: > > > > > > > > Legend: > > > > === > > > > N1 - ARM Neoverse > > > > TX2 - ARM Thunder X2 > > > > SKX - Intel Skylake > > > > SL: % imp - Performance improvement in stateless mode > > > > SF: % imp - Performance improvement in stateful mode > > > > > > > > > > > > > > SKX > > N1 > > TX2 > > Num Rules > > Matching Rules > > SL: Avg % imp > > SF: Avg % imp > > SL: % imp > > SF: % imp > > SL: % imp > > SF: % imp > > 1 > > 1 > > 0.99 > > 12.09 > > 8.38 > > 10.41 > > 4.48 > > 4.63 > > 50 > > 1 (50th) > > 0.79 > > 9.63 > > 8.76 > > 10.06 > > 5.32 > > 4.63 > > 100 > > 1 (100th) > > 4.34 > > 10.75 > > 8.60 > > 10.06 > > 6.98 > > 4.63 > > 1000 > > 1(1000th) > > 4.18 > > 13.06 > > 8.61 > > 11.14 > > 6.17 > > 5.58 > > 100 > > 100 > > 3.70 > > 11.70 > > 6.65 > > 14 > > 2.82 > > 6.53 > > 1000 > > 1000 > > 1.84 > > 15.96 > > 5.52 > > 27.72 > > 4.72 > > 8.69 > > > > > > > > > > > > Please find the patch here: https://gerrit.fd.io/r/c/vpp/+/27167 > > > > > > > > I ran per patch regression on ARM Taishan server in CSIT lab. > > Following are the results for Stateless and Stateful modes: > > > > 1. perftest-3n-tsh acl_statelessAND1cAND64b: > > > > > > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/23/consol > > eFull > > > > In the log, I can see the comparative numbers between parent and > > current (my patch) for 45 test cases. > > > > I searched for "Difference of averages relative to parent" in the > > log - > > 41/45 test cases have shown around 4% improvement with the patch. > > Rest of the 4 test cases stayed neutral. > > > > > > > > 2. perftest-3n-tsh acl_statefulAND1cAND64b: > > > > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/25/ > > > > Performance improvement is seen in all 36 test cases. > > > > > > > > Please provide your comments. > > > > > > > > Thanks > > > > Govind > > > > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16582): https://lists.fd.io/g/vpp-dev/message/16582 Mute This Topic: https://lists.fd.io/mt/74507621/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] ACL plugin optimization
Thanks Neale. If will fix it and recheck. > -Original Message- > From: Neale Ranns (nranns) > Sent: Thursday, May 28, 2020 1:56 AM > To: Andrew Yourtchenko ; Govindarajan Mohandoss > > Cc: vpp-dev@lists.fd.io; Lijian Zhang ; Jieqiang > Wang ; Honnappa Nagarahalli > ; nd > Subject: Re: [vpp-dev] ACL plugin optimization > > > Hi Govind, > > As well as removing the prefetches, you've also removed the per packet call > to acl_fa_find_session_with_hash(). So IIUC you've removed the per-packet > session lookup and instead re-use the lookup of packet 0 each time. that'll > make things quicker but it's not functionally correct. > > /neale > > On 27/05/2020 23:51, "vpp-dev@lists.fd.io on behalf of Andrew > Yourtchenko" wrote: > > Hi Govind, > > 1) According to Jenkins, this patch permits some of the packets that > should be denied, hence JJB voted "-1". > > 2) If you suspect merely the prefetches are the issue, just commenting > out the body of prefetch_session_entry() in the original code should > turn it into a no-op that doesn't break anything. > > Hard to say anything else given the functionality is not correct. > > In general - ensure you run "EXTENDED_TESTS=y TEST=acl* make test" as > a sanity check before extensive perf-tests. It's not a 100% guarantee > but it does catch a few naughty cases. > > Also - take a look at f1cd92d8d9, which got about 30% improvement back > in the day, and is the source of much of the trickiness in that node. > > --a > > > On 5/27/20, Govindarajan Mohandoss > wrote: > > Hi Andrew, > > > > While profiling the ACL plugin node using perf tool in ARM Neoverse > > platform, Bihash related prefetches were shown as bottleneck. > > > > Performance improvement is seen in ARM N1, TX2 and Intel Skylake > servers > > after removing those prefetches. Testing is done with Ingress ACL/IPv4 > > forwarding in both SF and SL modes. > > > > As the code change is common for Ingress/Egress ACL for both IPv4 and > IPv6, > > performance improvement is expected for those cases also. > > > > Following are the test results for Ingress ACL / IPv4 / 1 core / 64B @ > MRR > > in ARM N1, TX2 and Intel Skylake servers: > > > > > > > > Legend: > > > > === > > > > N1 - ARM Neoverse > > > > TX2 - ARM Thunder X2 > > > > SKX - Intel Skylake > > > > SL: % imp - Performance improvement in stateless mode > > > > SF: % imp - Performance improvement in stateful mode > > > > > > > > > > > > > > SKX > > N1 > > TX2 > > Num Rules > > Matching Rules > > SL: Avg % imp > > SF: Avg % imp > > SL: % imp > > SF: % imp > > SL: % imp > > SF: % imp > > 1 > > 1 > > 0.99 > > 12.09 > > 8.38 > > 10.41 > > 4.48 > > 4.63 > > 50 > > 1 (50th) > > 0.79 > > 9.63 > > 8.76 > > 10.06 > > 5.32 > > 4.63 > > 100 > > 1 (100th) > > 4.34 > > 10.75 > > 8.60 > > 10.06 > > 6.98 > > 4.63 > > 1000 > > 1(1000th) > > 4.18 > > 13.06 > > 8.61 > > 11.14 > > 6.17 > > 5.58 > > 100 > > 100 > > 3.70 > > 11.70 > > 6.65 > > 14 > > 2.82 > > 6.53 > > 1000 > > 1000 > > 1.84 > > 15.96 > > 5.52 > > 27.72 > > 4.72 > > 8.69 > > > > > > > > > > > > Please find the patch here: https://gerrit.fd.io/r/c/vpp/+/27167 > > > > > > > > I ran per patch regression on ARM Taishan server in CSIT lab. Following > are > > the results for Stateless and Stateful modes: > > > > 1. perftest-3n-tsh acl_statelessAND1cAND64b: > > > > > > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n- > tsh/23/consoleFull > > > > In the log, I can see the comparative numbers between parent and > > current (my patch) for 45 test cases. > > > > I searched for "Difference of averages relative to parent" in the > log - > > 41/45 test cases have shown around 4% improvement with the patch. > Rest of > > the 4 test cases stayed neutral. > > > > > > > > 2. perftest-3n-tsh acl_statefulAND1cAND64b: > > > > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/25/ > > > > Performance improvement is seen in all 36 test cases. > > > > > > > > Please provide your comments. > > > > > > > > Thanks > > > > Govind > > > > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16581): https://lists.fd.io/g/vpp-dev/message/16581 Mute This Topic: https://lists.fd.io/mt/74507621/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] ACL plugin optimization
Hi Govind, As well as removing the prefetches, you've also removed the per packet call to acl_fa_find_session_with_hash(). So IIUC you've removed the per-packet session lookup and instead re-use the lookup of packet 0 each time. that'll make things quicker but it's not functionally correct. /neale On 27/05/2020 23:51, "vpp-dev@lists.fd.io on behalf of Andrew Yourtchenko" wrote: Hi Govind, 1) According to Jenkins, this patch permits some of the packets that should be denied, hence JJB voted "-1". 2) If you suspect merely the prefetches are the issue, just commenting out the body of prefetch_session_entry() in the original code should turn it into a no-op that doesn't break anything. Hard to say anything else given the functionality is not correct. In general - ensure you run "EXTENDED_TESTS=y TEST=acl* make test" as a sanity check before extensive perf-tests. It's not a 100% guarantee but it does catch a few naughty cases. Also - take a look at f1cd92d8d9, which got about 30% improvement back in the day, and is the source of much of the trickiness in that node. --a On 5/27/20, Govindarajan Mohandoss wrote: > Hi Andrew, > > While profiling the ACL plugin node using perf tool in ARM Neoverse > platform, Bihash related prefetches were shown as bottleneck. > > Performance improvement is seen in ARM N1, TX2 and Intel Skylake servers > after removing those prefetches. Testing is done with Ingress ACL/IPv4 > forwarding in both SF and SL modes. > > As the code change is common for Ingress/Egress ACL for both IPv4 and IPv6, > performance improvement is expected for those cases also. > > Following are the test results for Ingress ACL / IPv4 / 1 core / 64B @ MRR > in ARM N1, TX2 and Intel Skylake servers: > > > > Legend: > > === > > N1 - ARM Neoverse > > TX2 - ARM Thunder X2 > > SKX - Intel Skylake > > SL: % imp - Performance improvement in stateless mode > > SF: % imp - Performance improvement in stateful mode > > > > > > > SKX > N1 > TX2 > Num Rules > Matching Rules > SL: Avg % imp > SF: Avg % imp > SL: % imp > SF: % imp > SL: % imp > SF: % imp > 1 > 1 > 0.99 > 12.09 > 8.38 > 10.41 > 4.48 > 4.63 > 50 > 1 (50th) > 0.79 > 9.63 > 8.76 > 10.06 > 5.32 > 4.63 > 100 > 1 (100th) > 4.34 > 10.75 > 8.60 > 10.06 > 6.98 > 4.63 > 1000 > 1(1000th) > 4.18 > 13.06 > 8.61 > 11.14 > 6.17 > 5.58 > 100 > 100 > 3.70 > 11.70 > 6.65 > 14 > 2.82 > 6.53 > 1000 > 1000 > 1.84 > 15.96 > 5.52 > 27.72 > 4.72 > 8.69 > > > > > > Please find the patch here: https://gerrit.fd.io/r/c/vpp/+/27167 > > > > I ran per patch regression on ARM Taishan server in CSIT lab. Following are > the results for Stateless and Stateful modes: > > 1. perftest-3n-tsh acl_statelessAND1cAND64b: > > > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/23/consoleFull > > In the log, I can see the comparative numbers between parent and > current (my patch) for 45 test cases. > > I searched for "Difference of averages relative to parent" in the log - > 41/45 test cases have shown around 4% improvement with the patch. Rest of > the 4 test cases stayed neutral. > > > > 2. perftest-3n-tsh acl_statefulAND1cAND64b: > > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/25/ > > Performance improvement is seen in all 36 test cases. > > > > Please provide your comments. > > > > Thanks > > Govind > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16545): https://lists.fd.io/g/vpp-dev/message/16545 Mute This Topic: https://lists.fd.io/mt/74507621/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] ACL plugin optimization
Hi Govind, 1) According to Jenkins, this patch permits some of the packets that should be denied, hence JJB voted "-1". 2) If you suspect merely the prefetches are the issue, just commenting out the body of prefetch_session_entry() in the original code should turn it into a no-op that doesn't break anything. Hard to say anything else given the functionality is not correct. In general - ensure you run "EXTENDED_TESTS=y TEST=acl* make test" as a sanity check before extensive perf-tests. It's not a 100% guarantee but it does catch a few naughty cases. Also - take a look at f1cd92d8d9, which got about 30% improvement back in the day, and is the source of much of the trickiness in that node. --a On 5/27/20, Govindarajan Mohandoss wrote: > Hi Andrew, > > While profiling the ACL plugin node using perf tool in ARM Neoverse > platform, Bihash related prefetches were shown as bottleneck. > > Performance improvement is seen in ARM N1, TX2 and Intel Skylake servers > after removing those prefetches. Testing is done with Ingress ACL/IPv4 > forwarding in both SF and SL modes. > > As the code change is common for Ingress/Egress ACL for both IPv4 and IPv6, > performance improvement is expected for those cases also. > > Following are the test results for Ingress ACL / IPv4 / 1 core / 64B @ MRR > in ARM N1, TX2 and Intel Skylake servers: > > > > Legend: > > === > > N1 - ARM Neoverse > > TX2 - ARM Thunder X2 > > SKX - Intel Skylake > > SL: % imp - Performance improvement in stateless mode > > SF: % imp - Performance improvement in stateful mode > > > > > > > SKX > N1 > TX2 > Num Rules > Matching Rules > SL: Avg % imp > SF: Avg % imp > SL: % imp > SF: % imp > SL: % imp > SF: % imp > 1 > 1 > 0.99 > 12.09 > 8.38 > 10.41 > 4.48 > 4.63 > 50 > 1 (50th) > 0.79 > 9.63 > 8.76 > 10.06 > 5.32 > 4.63 > 100 > 1 (100th) > 4.34 > 10.75 > 8.60 > 10.06 > 6.98 > 4.63 > 1000 > 1(1000th) > 4.18 > 13.06 > 8.61 > 11.14 > 6.17 > 5.58 > 100 > 100 > 3.70 > 11.70 > 6.65 > 14 > 2.82 > 6.53 > 1000 > 1000 > 1.84 > 15.96 > 5.52 > 27.72 > 4.72 > 8.69 > > > > > > Please find the patch here: https://gerrit.fd.io/r/c/vpp/+/27167 > > > > I ran per patch regression on ARM Taishan server in CSIT lab. Following are > the results for Stateless and Stateful modes: > > 1. perftest-3n-tsh acl_statelessAND1cAND64b: > > > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/23/consoleFull > > In the log, I can see the comparative numbers between parent and > current (my patch) for 45 test cases. > > I searched for "Difference of averages relative to parent" in the log - > 41/45 test cases have shown around 4% improvement with the patch. Rest of > the 4 test cases stayed neutral. > > > > 2. perftest-3n-tsh acl_statefulAND1cAND64b: > > https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/25/ > > Performance improvement is seen in all 36 test cases. > > > > Please provide your comments. > > > > Thanks > > Govind > > > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16543): https://lists.fd.io/g/vpp-dev/message/16543 Mute This Topic: https://lists.fd.io/mt/74507621/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] ACL plugin optimization
Hi Andrew, While profiling the ACL plugin node using perf tool in ARM Neoverse platform, Bihash related prefetches were shown as bottleneck. Performance improvement is seen in ARM N1, TX2 and Intel Skylake servers after removing those prefetches. Testing is done with Ingress ACL/IPv4 forwarding in both SF and SL modes. As the code change is common for Ingress/Egress ACL for both IPv4 and IPv6, performance improvement is expected for those cases also. Following are the test results for Ingress ACL / IPv4 / 1 core / 64B @ MRR in ARM N1, TX2 and Intel Skylake servers: Legend: === N1 - ARM Neoverse TX2 - ARM Thunder X2 SKX - Intel Skylake SL: % imp - Performance improvement in stateless mode SF: % imp - Performance improvement in stateful mode SKX N1 TX2 Num Rules Matching Rules SL: Avg % imp SF: Avg % imp SL: % imp SF: % imp SL: % imp SF: % imp 1 1 0.99 12.09 8.38 10.41 4.48 4.63 50 1 (50th) 0.79 9.63 8.76 10.06 5.32 4.63 100 1 (100th) 4.34 10.75 8.60 10.06 6.98 4.63 1000 1(1000th) 4.18 13.06 8.61 11.14 6.17 5.58 100 100 3.70 11.70 6.65 14 2.82 6.53 1000 1000 1.84 15.96 5.52 27.72 4.72 8.69 Please find the patch here: https://gerrit.fd.io/r/c/vpp/+/27167 I ran per patch regression on ARM Taishan server in CSIT lab. Following are the results for Stateless and Stateful modes: 1. perftest-3n-tsh acl_statelessAND1cAND64b: https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/23/consoleFull In the log, I can see the comparative numbers between parent and current (my patch) for 45 test cases. I searched for "Difference of averages relative to parent" in the log - 41/45 test cases have shown around 4% improvement with the patch. Rest of the 4 test cases stayed neutral. 2. perftest-3n-tsh acl_statefulAND1cAND64b: https://jenkins.fd.io/job/vpp-csit-verify-perf-master-3n-tsh/25/ Performance improvement is seen in all 36 test cases. Please provide your comments. Thanks Govind -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16539): https://lists.fd.io/g/vpp-dev/message/16539 Mute This Topic: https://lists.fd.io/mt/74507621/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-