Re: [vpp-dev] Performance test issues : Average cpu time cost changes as the tx speed changes.

2018-11-23 Thread Dave Barach via Lists.Fd.Io
You just demonstrated one of the basic properties of vector packet processing: 
as the offered load increases, the cost per vector element decreases. Although 
you didn’t explicitly report the vector sizes involved, the vector size 
necessarily increases as the offered load increases. Anyhow, it’s easy to fish 
that that statistic out of the node runtime stats:


“clear run”

“show run”


You might ask: OK, why should the cost per packet decrease as the number of 
packets in a vector increases? When you first enter a dispatch function, none 
of the code involved is likely to be in the i-cache. The first packet incurs a 
bunch of fixed overhead to drag code into the i-cache, and to warm up the 
branch predictor. All of the other packets in the vector profit. On a 
per-packet basis, cost decreases as the vector size increases.

There are a number of secondary effects with the same net result. Until the 
vector size reaches 2, none of the graph nodes bother about prefetching. When 
dealing with quad-looped nodes: ‘s/2/4/’.

This property gives rise to a second interesting property: given a specific 
offered load and configuration, the vector size reaches a stable equilibrium. 
Imagine the circuit time in equilibrium. Add a small delay [clock interrupt at 
kernel level?] which increases the graph dispatch circuit time (rx ... process 
... tx ... repeat).

The next rx vector size will be larger, but since it will be processed more 
efficiently, the vector size will eventually return to the equilibrium value.

HTH... Dave


From: vpp-dev@lists.fd.io  On Behalf Of Mikado
Sent: Thursday, November 22, 2018 10:22 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Performance test issues : Average cpu time cost changes as 
the tx speed changes.

Hi,
Recently I’m developing a plugin based on VPP 18.07 to decode specified 
packets. I added it between dpdk-input and interface-output  using the same 
method of adding sample plugin in VPP source code. To test its performance in 
theory , I used  clib_cpu_time_now() to calculate the average cpu time cost  
when packets  go through my pulgin.When I use differnet tx speed to send 
packets to the device using VPP and my plugin , it  turns out  that the average 
cpu time changes as the tx speed changes. At frist , I assume it is caused by 
the cpu time cost when VPP moves packets to the next node. So I calculate the 
average cpu time of each node but it appears  the same.Then, I use the sample 
plugin to operate  the same test. The result is similar although it does not 
fluctuate much.
Now I’m confused. Isn’t it that all packets go through the same code and cost 
the same cpu time ?

Here is my code and test result.

Code added in sample/node.c:
static uword
sample_node_fn (vlib_main_t * vm,
  vlib_node_runtime_t * node,
  vlib_frame_t * frame)
{
  from = vlib_frame_vector_args (frame);
  n_left_from = frame->n_vectors;
  next_index = node->cached_next_index;

  sample_main.last_cpu_time = clib_cpu_time_now ();

  sample_main.total_pkts += n_left_from;
   ………
  while (n_left_from > 0){
   ………
   }
   ………
  vlib_node_increment_counter (vm, sample_node.index,
   SAMPLE_ERROR_SWAPPED, pkts_swapped);

  sample_main.total_cpu_time += clib_cpu_time_now() - sample_main.last_cpu_time;

  return frame->n_vectors;
}


Test result:
Tx speed(Mb/s)  Average cpu time
19014
47512
6659
9507
1140  7
1900  6
2895  6
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11385): https://lists.fd.io/g/vpp-dev/message/11385
Mute This Topic: https://lists.fd.io/mt/28292345/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Performance test issues : Average cpu time cost changes as the tx speed changes.

2018-11-23 Thread Damjan Marion via Lists.Fd.Io


> On 23 Nov 2018, at 04:22, Mikado  wrote:
> 
> Hi,
> Recently I’m developing a plugin based on VPP 18.07 to decode specified 
> packets. I added it between dpdk-input and interface-output  using the same 
> method of adding sample plugin in VPP source code. To test its performance in 
> theory , I used  clib_cpu_time_now() to calculate the average cpu time cost  
> when packets  go through my pulgin.When I use differnet tx speed to send 
> packets to the device using VPP and my plugin , it  turns out  that the 
> average cpu time changes as the tx speed changes. At frist , I assume it is 
> caused by the cpu time cost when VPP moves packets to the next node. So I 
> calculate the average cpu time of each node but it appears  the same.Then, I 
> use the sample plugin to operate  the same test. The result is similar 
> although it does not fluctuate much. 
> Now I’m confused. Isn’t it that all packets go through the same code and cost 
> the same cpu time ?

More packets in batch means code is more efficient and per packet cpu time is 
lower.
Use "show run" debug cli and monitor vector size, VPP is most efficient when 
vector size is close to 256.
Also, note that show run shows per node clocks/packet data (might be inaccurate 
if turboboost is enabled).


-- 
Damjan-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11384): https://lists.fd.io/g/vpp-dev/message/11384
Mute This Topic: https://lists.fd.io/mt/28292345/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Performance test issues : Average cpu time cost changes as the tx speed changes.

2018-11-23 Thread Mikado
Hi,
Recently I’m developing a plugin based on VPP 18.07 to decode specified 
packets. I added it between dpdk-input and interface-output  using the same 
method of adding sample plugin in VPP source code. To test its performance in 
theory , I used  clib_cpu_time_now() to calculate the average cpu time cost  
when packets  go through my pulgin.When I use differnet tx speed to send 
packets to the device using VPP and my plugin , it  turns out  that the average 
cpu time changes as the tx speed changes. At frist , I assume it is caused by 
the cpu time cost when VPP moves packets to the next node. So I calculate the 
average cpu time of each node but it appears  the same.Then, I use the sample 
plugin to operate  the same test. The result is similar although it does not 
fluctuate much. 
Now I’m confused. Isn’t it that all packets go through the same code and cost 
the same cpu time ?

Here is my code and test result.

Code added in sample/node.c:
static uword
sample_node_fn (vlib_main_t * vm,
  vlib_node_runtime_t * node,
  vlib_frame_t * frame)
{
  from = vlib_frame_vector_args (frame);
  n_left_from = frame->n_vectors;
  next_index = node->cached_next_index;

  sample_main.last_cpu_time = clib_cpu_time_now ();

  sample_main.total_pkts += n_left_from;
………
  while (n_left_from > 0){
………
}
………
  vlib_node_increment_counter (vm, sample_node.index, 
   SAMPLE_ERROR_SWAPPED, pkts_swapped);

  sample_main.total_cpu_time += clib_cpu_time_now() - sample_main.last_cpu_time;

  return frame->n_vectors;
}


Test result:
Tx speed(Mb/s)  Average cpu time
19014
47512
6659
9507
1140  7
1900  6
2895  6
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11383): https://lists.fd.io/g/vpp-dev/message/11383
Mute This Topic: https://lists.fd.io/mt/28292345/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-