Re: [vpp-dev] Poor L3/L4 Performance

2017-09-25 Thread Dave Barach (dbarach)
As discussed off-list: please stick to best-practice coding patterns. 
Single-packet frames simply cannot perform, etc.

Thanks… Dave

From: vpp-dev-boun...@lists.fd.io [mailto:vpp-dev-boun...@lists.fd.io] On 
Behalf Of Alessio Silvestro
Sent: Monday, September 25, 2017 10:13 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] Poor L3/L4 Performance

Dear all,

I am performing some experiments on VPP in order to get some performance 
metrics for specific applications.

I am working on vpp v17.04.2-2.

In order to have a baseline of my system, I run L2 XConnect (XC) as in 
[https://perso.telecom-paristech.fr/~drossi/paper/vpp-bench-techrep.pdf].

In this case, I can achieve, similarly to the paper, ~13Mpps -- which somehow 
confirm that the
current setup is correct.

I implemented 2 further experiments:

1) L3-Xconnect

I implemented a new node that listens for traffic with specific ether_type with 
the following api:

ethernet_register_input_type(vm, ETHERNET_TYPE_X, my_node.index)

Once the traffic is received, the node sends the traffic directly to l2_output 
without any further processing.

The achieved packet rate is less than 5 Mpps.

2) L4-Xconnect

I implemented another node that listens for UDP traffic on  a specific port 
with the following api:


udp_register_dst_port (vm, UDP_DST_PORT_vxlan, vxlan_input_node.index, 1 /* 
is_ip4 */);

Once the traffic is received, the node sends the traffic directly to l2_output 
without any further processing.

The achieved packet rate is less than 4 Mpps.


The testbed is composed of 2 servers. The first server is running VPP whereas 
the second server runs the traffic generator (packetgen). The servers are 
equipped with Intel NICs capable of dual-port 10 Gbps full-duplex link. 
Generated packets have the size of 64kb.

VPP is configured to run with one main thread and one worker thread. Therefore, 
the previous values are meant for a single CPU-core.

In my opinion those values are a bit too low compared to other state-of-the-art 
approaches.

Do you have any idea on why this is happening and, if this is my fault, how I 
can fix it.

Thanks,
Alessio

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Poor L3/L4 Performance

2017-09-25 Thread Damjan Marion (damarion)

Dear Alessio,

It is hard to guess where is the problem out of your description,
but I would not be surprised that your implementation of those graph nodes is 
not properly performance tuned.
One missing prefetch can hurt performance really badly.

If you are able to share your code I can take a quick look...

Thanks,

Damjan


On 25 Sep 2017, at 07:12, Alessio Silvestro 
> wrote:

Dear all,

I am performing some experiments on VPP in order to get some performance 
metrics for specific applications.

I am working on vpp v17.04.2-2.

In order to have a baseline of my system, I run L2 XConnect (XC) as in 
[https://perso.telecom-paristech.fr/~drossi/paper/vpp-bench-techrep.pdf].

In this case, I can achieve, similarly to the paper, ~13Mpps -- which somehow 
confirm that the
current setup is correct.

I implemented 2 further experiments:

1) L3-Xconnect

I implemented a new node that listens for traffic with specific ether_type with 
the following api:

ethernet_register_input_type(vm, ETHERNET_TYPE_X, my_node.index)

Once the traffic is received, the node sends the traffic directly to l2_output 
without any further processing.

The achieved packet rate is less than 5 Mpps.

2) L4-Xconnect

I implemented another node that listens for UDP traffic on  a specific port 
with the following api:


udp_register_dst_port (vm, UDP_DST_PORT_vxlan, vxlan_input_node.index, 1 /* 
is_ip4 */);

Once the traffic is received, the node sends the traffic directly to l2_output 
without any further processing.

The achieved packet rate is less than 4 Mpps.


The testbed is composed of 2 servers. The first server is running VPP whereas 
the second server runs the traffic generator (packetgen). The servers are 
equipped with Intel NICs capable of dual-port 10 Gbps full-duplex link. 
Generated packets have the size of 64kb.

VPP is configured to run with one main thread and one worker thread. Therefore, 
the previous values are meant for a single CPU-core.

In my opinion those values are a bit too low compared to other state-of-the-art 
approaches.

Do you have any idea on why this is happening and, if this is my fault, how I 
can fix it.

Thanks,
Alessio

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

[vpp-dev] Poor L3/L4 Performance

2017-09-25 Thread Alessio Silvestro
Dear all,

I am performing some experiments on VPP in order to get some performance
metrics for specific applications.

I am working on vpp v17.04.2-2.

In order to have a baseline of my system, I run L2 XConnect (XC) as in [
https://perso.telecom-paristech.fr/~drossi/paper/vpp-bench-techrep.pdf].

In this case, I can achieve, similarly to the paper, ~13Mpps -- which
somehow confirm that the
current setup is correct.

I implemented 2 further experiments:

*1) L3-Xconnect *

I implemented a new node that listens for traffic with specific ether_type
with the following api:

ethernet_register_input_type(vm, ETHERNET_TYPE_X, my_node.index)

Once the traffic is received, the node sends the traffic directly to
l2_output without any further processing.

The achieved packet rate is less than 5 Mpps.

*2) L4-Xconnect*

I implemented another node that listens for UDP traffic on  a specific port
with the following api:


udp_register_dst_port (vm, UDP_DST_PORT_vxlan, vxlan_input_node.index, 1 /*
is_ip4 */);

Once the traffic is received, the node sends the traffic directly to
l2_output without any further processing.

The achieved packet rate is less than 4 Mpps.


The testbed is composed of 2 servers. The first server is running VPP
whereas the second server runs the traffic generator (packetgen). The
servers are equipped with Intel NICs capable of dual-port 10 Gbps
full-duplex link. Generated packets have the size of 64kb.

VPP is configured to run with one main thread and one worker thread.
Therefore, the previous values are meant for a single CPU-core.

In my opinion those values are a bit too low compared to other
state-of-the-art approaches.

Do you have any idea on why this is happening and, if this is my fault, how
I can fix it.

Thanks,
Alessio
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev