Hi Ben,

Thanks for your answer.

Now I think I found the problem, looks like a bug in
plugins/rdma/input.c related to what happens when the list of input
packets wrap around to the beginning of the ring buffer.
To fix it, the following change is needed:

diff --git a/src/plugins/rdma/input.c b/src/plugins/rdma/input.c
index 30fae83e0..f9979545d 100644
--- a/src/plugins/rdma/input.c
+++ b/src/plugins/rdma/input.c
@@ -318,7 +318,7 @@ rdma_device_input_inline (vlib_main_t * vm,
vlib_node_runtime_t * node,
                            &bt);
   if (n_tail < n_rx_packets)
     n_rx_bytes +=
-      rdma_device_input_bufs (vm, rd, &to_next[n_tail], &rxq->bufs[0], 
wc,
+      rdma_device_input_bufs (vm, rd, &to_next[n_tail], &rxq->bufs[0], 
&wc[n_tail],
                              n_rx_packets - n_tail, &bt);
   rdma_device_input_ethernet (vm, node, rd, next_index);

At that point in the code, the rdma_device_input_bufs() function is
called twice to handle the n_rx_packets that have arrived. First it is
called for the part up to the end of the buffer, and then a second call
is made to handle the remaining part, starting from the beginning of
the buffer. The problem is that the same "wc" argument is passed both
times, when in fact that pointer needs to be moved forward for the
second call, so we need &wc[n_tail] instead of just wc for the second
call to rdma_device_input_bufs() -- n_tail is the number of packets
that were handled by the first rdma_device_input_bufs() call.

In my tests so far it looks like the above change fixes the problem
completely, after the fix there are no longer any "ip4 length > l2
length" errors.

This explanation fits with what we saw in our tests earlier, that the
problem with erroneous packets became smaller when the buffer size was
increased, since the second call to rdma_device_input_bufs() only comes
into play at the end of the ring buffer, which happens more rarely when
the buffer is larger. (But after the fix above there is no longer any
need to increase the buffer size.)

What do you think, does this seem right?

Best regards,
Elias



On Mon, 2020-02-17 at 15:38 +0000, Benoit Ganne (bganne) via
Lists.Fd.Io wrote:
> Hi Elias,
> 
> As the problem only arise with VPP rdma driver and not the DPDK
> driver, it is fair to say it is a VPP rdma driver issue.
> I'll try to reproduce the issue on my setup and keep you posted.
> In the meantime I do not see a big issue increasing the rx-queue-size 
> to mitigate it.
> 
> ben
> 
> > -----Original Message-----
> > From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Elias
> > Rudberg
> > Sent: vendredi 14 février 2020 16:56
> > To: vpp-dev@lists.fd.io
> > Subject: [vpp-dev] VPP ip4-input drops packets due to "ip4 length >
> > l2
> > length" errors when using rdma with Mellanox mlx5 cards
> > 
> > Hello VPP developers,
> > 
> > We have a problem with VPP used for NAT on Ubuntu 18.04 servers
> > equipped with Mellanox ConnectX-5 network cards (ConnectX-5 EN
> > network
> > interface card; 100GbE dual-port QSFP28; PCIe3.0 x16; tall bracket;
> > ROHS R6).
> > 
> > VPP is dropping packets in the ip4-input node due to "ip4 length >
> > l2
> > length" errors, when we use the RDMA plugin.
> > 
> > The interfaces are configured like this:
> > 
> > create int rdma host-if enp101s0f1 name Interface101 num-rx-queues
> > 1
> > create int rdma host-if enp179s0f1 name Interface179 num-rx-queues
> > 1
> > 
> > (we have set num-rx-queues 1 now to simplify while troubleshooting,
> > in
> > production we use num-rx-queues 4)
> > 
> > We see some packets dropped due to "ip4 length > l2 length" for
> > example
> > in TCP tests with around 100 Mbit/s -- running such a test for a
> > few
> > seconds already gives some errors. More traffic gives more errors
> > and
> > it seems to be unrelated to the contents of the packets, it seems
> > to
> > happen quite randomly and already at such moderate amounts of
> > traffic,
> > very far below what should be the capacity of the hardware.
> > 
> > Only a small fraction of packets are dropped: in tests at 100
> > Mbit/s
> > and packet size 500, for each million packets about 3 or 4 packets
> > get
> > the "ip4 length > l2 length" drop problem. However, the effect
> > appears
> > stronger for larger amounts of traffic and has impacted some of our
> > end
> > users who observe decresed TCP speed as a result of these drops.
> > 
> > The "ip4 length > l2 length" errors can be seen using vppctl "show
> > errors":
> > 
> >     142                ip4-input               ip4 length > l2
> > length
> > 
> > To get more info about the "ip4 length > l2 length" error we
> > printed
> > the involved sizes when the error happens (ip_len0 and cur_len0 in
> > src/vnet/ip/ip4_input.h), which shows that the actual packet size
> > is
> > often much smaller than the ip_len0 value which is what the IP
> > packet
> > size should be according to the IP header. For example, when
> > ip_len0=500 as is the case for many of our packets in the test
> > runs,
> > the cur_len0 value is sometimes much smaller. The smallest case we
> > have
> > seen was cur_len0 = 59 with ip_len0 = 500 -- the IP header said the
> > IP
> > packet size was 500 bytes, but the actual size was only 59 bytes.
> > So it
> > seems some data is lost, packets have been truncated, sometimes
> > large
> > parts of the packets are missing.
> > 
> > The problems disappear if we skip using the RDMA plugin and use the
> > (old?) dpdk way of handling the interfaces, then there are no "ip4
> > length > l2 length" drops at all. That makes us think there is
> > something wrong with the rdma plugin, perhaps a bug or something
> > wrong
> > with how it is configured.
> > 
> > We have tested this with both the current master branch and the
> > stable/1908 branch, we see the same problem for both.
> > 
> > We tried updating the Mellanox driver from v4.6 to v4.7 (latest
> > version) but that did not help.
> > 
> > After trying some different values of the rx-queue-size parameter
> > to
> > the "create int rdma" command, it seems like the "ip4 length > l2
> > length" becomes smaller as the rx-queue-size is increased, perhaps
> > indicating the problem has to do with what happens when the end of
> > that
> > queue is reached.
> > 
> > Do you agree that the above points to a problem with the RDMA
> > plugin in
> > VPP?
> > 
> > Are there known bugs or other issues that could explain the "ip4
> > length
> > > l2 length" drops?
> > 
> > Does it seem like a good idea to set a very large value of the rx-
> > queue-size parameter if that alleviates the "ip4 length > l2
> > length"
> > problem, or are there big downsides of using a large rx-queue-size
> > value?
> > 
> > What else could we do to troubleshoot this further, are there
> > configuration options to the RDMA plugin that could be used to
> > solve
> > this and/or get more information about what is happening?
> > 
> > Best regards,
> > Elias

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#15434): https://lists.fd.io/g/vpp-dev/message/15434
Mute This Topic: https://lists.fd.io/mt/71273976/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to