Re: [vpp-dev] Performance test issues : Average cpu time cost changes as the tx speed changes.
You just demonstrated one of the basic properties of vector packet processing: as the offered load increases, the cost per vector element decreases. Although you didn’t explicitly report the vector sizes involved, the vector size necessarily increases as the offered load increases. Anyhow, it’s easy to fish that that statistic out of the node runtime stats: “clear run” “show run” You might ask: OK, why should the cost per packet decrease as the number of packets in a vector increases? When you first enter a dispatch function, none of the code involved is likely to be in the i-cache. The first packet incurs a bunch of fixed overhead to drag code into the i-cache, and to warm up the branch predictor. All of the other packets in the vector profit. On a per-packet basis, cost decreases as the vector size increases. There are a number of secondary effects with the same net result. Until the vector size reaches 2, none of the graph nodes bother about prefetching. When dealing with quad-looped nodes: ‘s/2/4/’. This property gives rise to a second interesting property: given a specific offered load and configuration, the vector size reaches a stable equilibrium. Imagine the circuit time in equilibrium. Add a small delay [clock interrupt at kernel level?] which increases the graph dispatch circuit time (rx ... process ... tx ... repeat). The next rx vector size will be larger, but since it will be processed more efficiently, the vector size will eventually return to the equilibrium value. HTH... Dave From: vpp-dev@lists.fd.io On Behalf Of Mikado Sent: Thursday, November 22, 2018 10:22 PM To: vpp-dev@lists.fd.io Subject: [vpp-dev] Performance test issues : Average cpu time cost changes as the tx speed changes. Hi, Recently I’m developing a plugin based on VPP 18.07 to decode specified packets. I added it between dpdk-input and interface-output using the same method of adding sample plugin in VPP source code. To test its performance in theory , I used clib_cpu_time_now() to calculate the average cpu time cost when packets go through my pulgin.When I use differnet tx speed to send packets to the device using VPP and my plugin , it turns out that the average cpu time changes as the tx speed changes. At frist , I assume it is caused by the cpu time cost when VPP moves packets to the next node. So I calculate the average cpu time of each node but it appears the same.Then, I use the sample plugin to operate the same test. The result is similar although it does not fluctuate much. Now I’m confused. Isn’t it that all packets go through the same code and cost the same cpu time ? Here is my code and test result. Code added in sample/node.c: static uword sample_node_fn (vlib_main_t * vm, vlib_node_runtime_t * node, vlib_frame_t * frame) { from = vlib_frame_vector_args (frame); n_left_from = frame->n_vectors; next_index = node->cached_next_index; sample_main.last_cpu_time = clib_cpu_time_now (); sample_main.total_pkts += n_left_from; ……… while (n_left_from > 0){ ……… } ……… vlib_node_increment_counter (vm, sample_node.index, SAMPLE_ERROR_SWAPPED, pkts_swapped); sample_main.total_cpu_time += clib_cpu_time_now() - sample_main.last_cpu_time; return frame->n_vectors; } Test result: Tx speed(Mb/s) Average cpu time 19014 47512 6659 9507 1140 7 1900 6 2895 6 -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11385): https://lists.fd.io/g/vpp-dev/message/11385 Mute This Topic: https://lists.fd.io/mt/28292345/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] Performance test issues : Average cpu time cost changes as the tx speed changes.
> On 23 Nov 2018, at 04:22, Mikado wrote: > > Hi, > Recently I’m developing a plugin based on VPP 18.07 to decode specified > packets. I added it between dpdk-input and interface-output using the same > method of adding sample plugin in VPP source code. To test its performance in > theory , I used clib_cpu_time_now() to calculate the average cpu time cost > when packets go through my pulgin.When I use differnet tx speed to send > packets to the device using VPP and my plugin , it turns out that the > average cpu time changes as the tx speed changes. At frist , I assume it is > caused by the cpu time cost when VPP moves packets to the next node. So I > calculate the average cpu time of each node but it appears the same.Then, I > use the sample plugin to operate the same test. The result is similar > although it does not fluctuate much. > Now I’m confused. Isn’t it that all packets go through the same code and cost > the same cpu time ? More packets in batch means code is more efficient and per packet cpu time is lower. Use "show run" debug cli and monitor vector size, VPP is most efficient when vector size is close to 256. Also, note that show run shows per node clocks/packet data (might be inaccurate if turboboost is enabled). -- Damjan-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11384): https://lists.fd.io/g/vpp-dev/message/11384 Mute This Topic: https://lists.fd.io/mt/28292345/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] Performance test issues : Average cpu time cost changes as the tx speed changes.
Hi, Recently I’m developing a plugin based on VPP 18.07 to decode specified packets. I added it between dpdk-input and interface-output using the same method of adding sample plugin in VPP source code. To test its performance in theory , I used clib_cpu_time_now() to calculate the average cpu time cost when packets go through my pulgin.When I use differnet tx speed to send packets to the device using VPP and my plugin , it turns out that the average cpu time changes as the tx speed changes. At frist , I assume it is caused by the cpu time cost when VPP moves packets to the next node. So I calculate the average cpu time of each node but it appears the same.Then, I use the sample plugin to operate the same test. The result is similar although it does not fluctuate much. Now I’m confused. Isn’t it that all packets go through the same code and cost the same cpu time ? Here is my code and test result. Code added in sample/node.c: static uword sample_node_fn (vlib_main_t * vm, vlib_node_runtime_t * node, vlib_frame_t * frame) { from = vlib_frame_vector_args (frame); n_left_from = frame->n_vectors; next_index = node->cached_next_index; sample_main.last_cpu_time = clib_cpu_time_now (); sample_main.total_pkts += n_left_from; ……… while (n_left_from > 0){ ……… } ……… vlib_node_increment_counter (vm, sample_node.index, SAMPLE_ERROR_SWAPPED, pkts_swapped); sample_main.total_cpu_time += clib_cpu_time_now() - sample_main.last_cpu_time; return frame->n_vectors; } Test result: Tx speed(Mb/s) Average cpu time 19014 47512 6659 9507 1140 7 1900 6 2895 6 -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11383): https://lists.fd.io/g/vpp-dev/message/11383 Mute This Topic: https://lists.fd.io/mt/28292345/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-