Re: [vpp-dev] vlib_validate_buffer_enqueue
Dave, Thanks for your (deep) explanation. Same for Chris, thank you :-) Justin > Dear Justin, > > Quad-loops are generally not effective for table-lookup-intensive tasks. At a > certain point, gcc runs out of registers and starts putting hot variables onto > the stack. I've converted a number of dual loops into quad loops, only to > discover that they're no faster than the dual loop version. > > Rather than having the sample plugin propagate a bunch of "fetch me a rock" > coding work, I went with a dual-single loop. When doing new development, I > shut > off the dual loop, make the single loop work, then build the dual (or quad) > loop. > > With experience, building a dual (or quad) loop becomes a mechanical exercise > easily done during a boring meeting. ()... > > In viable quad-loop use-cases, it's not worth any performance to also provide > a > dual loop. The dual-loop code will run at most one time; there's no chance of > fixed overhead amortization. > > Thanks… Dave > > -Original Message- > From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On > Behalf > Of Justin Iurman > Sent: Monday, November 13, 2017 5:51 AM > To: vpp-dev <vpp-dev@lists.fd.io> > Subject: [vpp-dev] vlib_validate_buffer_enqueue > > Hey guys, > > In buffer_node.h, there are the following macros: > - vlib_validate_buffer_enqueue_x1 > - vlib_validate_buffer_enqueue_x2 > - vlib_validate_buffer_enqueue_x4 > > In a node, I was just wondering what was the use idea behind that ? Is it for > a > reason of speed ? I mean, you're obviously faster if you process 4 packets > horizontally than one after the other. Why then, in the sample plugin, is the > "x4" version not used ? A "perfect" plugin would use each of them to cover > each > case, right ? Also, why not having a "x8" (or more) version ? I guess it's > either for a performance issue or to stop at a specific ceiling. > > Thanks ! > > Justin > ___ > vpp-dev mailing list > vpp-dev@lists.fd.io > https://lists.fd.io/mailman/listinfo/vpp-dev ___ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev
Re: [vpp-dev] vlib_validate_buffer_enqueue
Dear Justin, Quad-loops are generally not effective for table-lookup-intensive tasks. At a certain point, gcc runs out of registers and starts putting hot variables onto the stack. I've converted a number of dual loops into quad loops, only to discover that they're no faster than the dual loop version. Rather than having the sample plugin propagate a bunch of "fetch me a rock" coding work, I went with a dual-single loop. When doing new development, I shut off the dual loop, make the single loop work, then build the dual (or quad) loop. With experience, building a dual (or quad) loop becomes a mechanical exercise easily done during a boring meeting. ()... In viable quad-loop use-cases, it's not worth any performance to also provide a dual loop. The dual-loop code will run at most one time; there's no chance of fixed overhead amortization. Thanks… Dave -Original Message- From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Justin Iurman Sent: Monday, November 13, 2017 5:51 AM To: vpp-dev <vpp-dev@lists.fd.io> Subject: [vpp-dev] vlib_validate_buffer_enqueue Hey guys, In buffer_node.h, there are the following macros: - vlib_validate_buffer_enqueue_x1 - vlib_validate_buffer_enqueue_x2 - vlib_validate_buffer_enqueue_x4 In a node, I was just wondering what was the use idea behind that ? Is it for a reason of speed ? I mean, you're obviously faster if you process 4 packets horizontally than one after the other. Why then, in the sample plugin, is the "x4" version not used ? A "perfect" plugin would use each of them to cover each case, right ? Also, why not having a "x8" (or more) version ? I guess it's either for a performance issue or to stop at a specific ceiling. Thanks ! Justin ___ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev ___ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev
Re: [vpp-dev] vlib_validate_buffer_enqueue
The x4 variant was introduced chronologically after the sample plugin and nobody went back to update it. However, generally speaking the four-wide-stride is only beneficial in some cases, the reasoning for which is a bit arcane based on the likelihood of being able to keep the CPU cache primed and similar. The best I can tell is that there's a bit of judgement based on the empirical experience of a handful of the wizened. By extension the gains from a x8 version is likely marginal. Chris > -Original Message- > From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On > Behalf Of Justin Iurman > Sent: Monday, November 13, 2017 5:51 AM > To: vpp-dev <vpp-dev@lists.fd.io> > Subject: [vpp-dev] vlib_validate_buffer_enqueue > > Hey guys, > > In buffer_node.h, there are the following macros: > - vlib_validate_buffer_enqueue_x1 > - vlib_validate_buffer_enqueue_x2 > - vlib_validate_buffer_enqueue_x4 > > In a node, I was just wondering what was the use idea behind that ? Is it for > a > reason of speed ? I mean, you're obviously faster if you process 4 packets > horizontally than one after the other. Why then, in the sample plugin, is the > "x4" version not used ? A "perfect" plugin would use each of them to cover > each case, right ? Also, why not having a "x8" (or more) version ? I guess > it's > either for a performance issue or to stop at a specific ceiling. > > Thanks ! > > Justin > ___ > vpp-dev mailing list > vpp-dev@lists.fd.io > https://lists.fd.io/mailman/listinfo/vpp-dev ___ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev
[vpp-dev] vlib_validate_buffer_enqueue
Hey guys, In buffer_node.h, there are the following macros: - vlib_validate_buffer_enqueue_x1 - vlib_validate_buffer_enqueue_x2 - vlib_validate_buffer_enqueue_x4 In a node, I was just wondering what was the use idea behind that ? Is it for a reason of speed ? I mean, you're obviously faster if you process 4 packets horizontally than one after the other. Why then, in the sample plugin, is the "x4" version not used ? A "perfect" plugin would use each of them to cover each case, right ? Also, why not having a "x8" (or more) version ? I guess it's either for a performance issue or to stop at a specific ceiling. Thanks ! Justin ___ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev