Hello VPP experts,

There seems to be a problem with the RDMA driver in VPP when using
Mellanox ConnectX5 network interfaces. This problem appears for the
master branch and for the stable/2005 branch, while stable/2001 does
not have this problem.

The problem is that when a frame with 2 packets is to be sent, only the
first packets is sent directly while the second packet gets delayed. It
seems like the second packet is only sent later, when some other frame
with other packets is to be sent then the delayed earlier packet is
also sent.

Perhaps this can go undetected if there is lots of traffic all the
time, if there is always new traffic to flush out any delayed packets
from earlier. So to reproduce it, it seems best to have a testing setup
with very little traffic such that there are several seconds without
any traffic, then it seems like packets can get delayed for several
seconds. Note that the delay is not seen inside VPP where packet traces
look like the packets are sent directly, VPP thinks they are sent but
it seems some packets are held in the NIC and only sent later on.
Monitoring traffic arriving at the other end shows that there was a
delay.

The behavior seems reproducible, except when there is other traffic
being sent soon after since that causes the delayed packets to be sent.

The specific case when this came up for us was when using VPP for NAT
with ipfix logging turned on, and doing some ping tests. Then when a
single ping echo request packet is to be NATed, that usually works fine
but sometimes there is also a ipfix logging packet to be sent, that
ends up in the same frame so that the frame has 2 packets. Then the
ipfix logging packet gets sent directly while the ICMP packet is
delayed, sometimes so much that the ping failed, it timed out. I don't
think the problem has anything to do with NAT or ipfix logging, it
seems like a more general problem with the rdma plugin.

Testing previous commits indicates that the problem started with this
commit:

dc812d9a7 (rdma: introduce direct verb for Cx4/5 tx, 2019-12-16)

That commit exists in master and in stable/2005 but not in stable/2001
which fits with that this problem is seen for master and stable/2005
but not for stable/2001.

Tried updating to the latest Mellanox driver (v5.0-2.1.8) but that did
not help.

In the code in src/plugins/rdma/output.c it seems like the function
rdma_device_output_tx_mlx5() is handling the packets, but I was not
able to fully understand how it works. There is a concept of a
"doorbell" function call there, apparently the idea is that when
packets are to be sent, info about the packets is prepared and then the
"doorbell" is used to alert the NIC that there are things to send. From
my limited understanding, it seems like the doorbell currently results
in only the first packet is really being physically sent by the NIC
directly, while remaining packets are somehow stored and sent later. So
far I don't understand exactly why that happens or how to fix it.

As a workaround, it seems to work to simply revert the entire rdma
plugin to the way it looks in the stable/2001 branch, then the problem
seems to disappear. But that probably means we lose performance gains
and other improvements in the newer code.

Can someone with insight in the rdma plugin please help try to fix
this?

Best regards,
Elias
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16822): https://lists.fd.io/g/vpp-dev/message/16822
Mute This Topic: https://lists.fd.io/mt/75120690/21656
Mute #mellanox: https://lists.fd.io/g/fdio+vpp-dev/mutehashtag/mellanox
Mute #rdma: https://lists.fd.io/g/fdio+vpp-dev/mutehashtag/rdma
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to