Hello,
Just a reminder about this, see below.
Best regards,
Elias

-------- Forwarded Message --------
From: Elias Rudberg <elias.rudb...@bahnhof.net>
To: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io>
Subject: [vpp-dev] NAT port number selection problem, leads to wrong 
thread index for some sessions
Date: Thu, 02 Jul 2020 20:43:12 +0000

Hello VPP experts,

There seems to be a problem with the way port number is selected for
NAT: sometimes the selected port number leads to a different thread
index being selected for out2in packets, making that session useless.
This applies to the current master branch as well as the latest stable
branches, I think.

Here is the story as I understand it, please correct me if I have
misunderstood something. Each NAT thread has a range of port numbers
that it can use, and when a new session is created a port number is
picked at random from within that range. That happens when a in2out
packet is NATed. Then later when a response comes as a out2in packet,
VPP needs to make sure it is handled by the correct thread, the same
thread that created the session.

The port number to use for a new session is selected in
nat_alloc_addr_and_port_default() like this:

portnum = (port_per_thread * snat_thread_index) + snat_random_port(1,
port_per_thread) + 1024;

where port_per_thread is the number of ports each thread is allowed to
use, and snat_random_port() returns a random number in the given range.
This means that the smallest possible portnum is 1025, that can happen
when snat_thread_index is zero.

The corresponding calculation to get the thread index back based on the
port number is essentially this:

(portnum - 1024) / port_per_thread

This works most of the time, but not always. It works in all cases
except when snat_random_port() returns the largest possible value, in
that case we end up with the wrong thread index. That means that out2in
packets arriving for that session get handed off to another thread. The
other thread is unaware of that session so all out2in packets are then
dropped for that session.

Since each thread has thousands of port numbers to choose from and the
problem only appears for one particular choice, only a small fraction
of all sessions are affected by this. In my tests there was 8 NAT
threads, then the port_per_thread value was about 8000 so that the
probability was about 1/8000 or roughly 0.0125% of all sessions that
failed.

The test I used was simply to try many separate ping commands with the
"-c 1" option, all should give the normal result "1 packets
transmitted, 1 received, 0% packet loss" but due to this problem some
of the pings fail. Note that it needs to be separate ping commands so
that VPP creates a new session for each of them. Provided that you test
a large enough number of sessions, it is straightforward to reproduce
the problem.

It could be fixed in different ways, one way is to simply shift the
arguments to snat_random_port() down by one:
snat_random_port(1, port_per_thread)
-->
snat_random_port(0, port_per_thread-1)

I pushed such a change to gerrit, here: 
https://gerrit.fd.io/r/c/vpp/+/27786

The smallest port number used then becomes 1024 instead of 1025 as it
has been so far, I suppose that should be OK since it is the "well-
known ports" from 0 to 1023 that should be avoided, port 1024 should be
okay to use. What do you think, does it make sense to fix it in this
way?

Best regards,
Elias

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17052): https://lists.fd.io/g/vpp-dev/message/17052
Mute This Topic: https://lists.fd.io/mt/75267169/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to