On 04/01/2018 22:42, hiren panchasara wrote:
On 01/04/18 at 09:52P, Steven Hartland wrote:
On 04/01/2018 20:50, Eugene Grosbein wrote:
05.01.2018 3:05, Steven Hartland wrote:

Author: smh
Date: Thu Jan  4 20:05:47 2018
New Revision: 327559
URL: https://svnweb.freebsd.org/changeset/base/327559

Log:
    Disabled the use of flowid for lagg by default
Disabled the use of RSS hash from the network card aka flowid for
    lagg(4) interfaces by default as it's currently incompatible with
    the lacp and loadbalance protocols.
The incompatibility is due to the fact that the flowid isn't know
    for the first packet of a new outbound stream which can result in
    the hash calculation method changing and hence a stream being
    incorrectly split across multiple interfaces during normal
    operation.
This can be re-enabled by setting the following in loader.conf:
    net.link.lagg.default_use_flowid="1"
Discussed with: kmacy
    Sponsored by:       Multiplay
RSS by definition has meaning to received stream. What is "outbound" stream
in this context, why can the hash calculatiom method change and what exactly
does it mean "a stream being incorrectly split"?
Yes RSS is indeed a received stream but that is used by lagg for lacp
and loadbalance protocols to decide which port of the lagg to "send" the
packet out of. As the flowid is not known when a new "output" stream is
instigated the current code falls back to manual hash calculation to
determine which port to send the initial packet from. Once a response is
received a tx then uses the flowid. This change of hash calculation
method can result in the initial packet being sent from a different port
than the rest of the stream; this is what I meant by "incorrectly split".
For my understanding, is this just an issue for the first packet when we
originate the flow? Once we have a response and if flowid is there, we'd
use it, right? OR am I missing something?
Initially yes, but that can cause a whole cascading set of problems. If the source machine sends from two different ports then flow can traverse across the network using different paths and hence arrive at the destination on different ports too, causing the corresponding  issue on the other side.
And with this change, we'd always go and do manual calculation even when
we have a valid flowid (i.e. we didn't initiate a connection)?
Correct, but there's potentially no easy way to correctly determine what the flowid and hence hash should be in this case, likely impossible if the lagg consists of different interface types.

In addition if the hardware hash doesn't match the requested one as per laggproto then additional issues could also be triggered.

Our TCP stack seems fragile during setup to out of order packets which this multipath behavior causes, we've seen this on our loadbalancers which is what triggered the investigation. The concrete result is many aborted TCP connections, over 300k ~2% on the machine I'm looking at.

I hope there's some improvements that can be made, for example if we can determine the stream was instigated remotely then flowid would always be valid hence we can use it assuming it matches the requested spec or if we can make it clear to the user that laggproto is not the one they requested, I'm open to ideas?

    Regards
    Steve

_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to