Hi again I took another look at this problem, and now I'm even more convinced that what we really need is IP_BIND_ADDRESS_NO_PORT. Here's why.
If torrc OutboundBindAddress is configured, tor calls bind(2) on every outgoing connection: https://gitlab.torproject.org/tpo/core/tor/-/blob/tor-0.4.7.12/src/core/mainloop/connection.c#L2245 with sockaddr_in.sin_port set to 0 on #L2438. The kernel doesn't know that we'll not be using this socket for listen(2), so the kernel attempts to find an unused local two-tuple (according to [1]. Actually a three-tuple: <protocol, source ip, source port>): The bind syscall is handled by inet_bind: https://elixir.bootlin.com/linux/v5.15.56/source/net/ipv4/af_inet.c#L438 which calls __inet_bind that in turn calls sk->sk_prot->get_port on #L531 (notice the if on #L529). get_port is implemented by inet_csk_get_port in inet_connection_sock.c: https://elixir.bootlin.com/linux/v5.15.56/source/net/ipv4/inet_connection_sock.c#L362 On #L375, we call inet_csk_find_open_port (defined on #L190) to find a free port. inet_csk_find_open_port gets the local port range on #L206 (i.e net.ipv4.ip_local_port_range), selects a random starting point (L#222), and loops through all the ports until it finds one that is free (#L230). For every port candidate, if it is already in use (#L240) it calls inet_csk_bind_conflict (#L241), which is defined on #L133. As far as I understand, it is inet_csk_bind_conflict's job is to determine if it is safe to bind to the port anyway (ex, the existing connection could be in TCP_TIME_WAIT and SO_REUSEPORT set on the socket). This is where your server spend so much time. Increasing net.ipv4.ip_local_port_range doesn't solve the problem, but makes it more likely to find a port that is free. Lets trace back to the "if" in __inet_bind on #L529: https://elixir.bootlin.com/linux/v5.15.56/source/net/ipv4/af_inet.c#L529 Since we call bind with sockaddr_in.sin_port set to 0, snum is 0, and we can avoid the whole call chain by setting inet->bind_address_no_port to 1. I.e this patch: https://gitlab.torproject.org/tpo/core/tor/-/merge_requests/579/diffs?commit_id=b65ffa6f06b2d7bc313e0780f3d76a8acb499ac9#a65580094313324792dd24fed1904263b271abd5_2227_2230 That should allow the kernel to use already in use src ports as long as the TCP 4-tuple is unique. Please include it in the next tor release! :) [1]: https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-start-to-love-long-lived-connections/ - Anders On Fri, Dec 9, 2022 at 10:47 AM Alexander Færøy <a...@torproject.org> wrote: > On 2022/12/01 20:35, Christopher Sheats wrote: > > Does anyone have experience troubleshooting and/or fixing this problem? > > Like I wrote in [1], I think it would be interesting to hear if the > patch from pseudonymisaTor in ticket #26646[2] would be of any help in > the given situation. The patch allows an exit operator to specify a > range of IP addresses for binding purposes for outbound connections. I > would think this could split the load wasted on trying to resolve port > conflicts in the kernel amongst the set of IP's you have available for > outbound connections. > > All the best, > Alex. > > [1]: https://mastodon.social/@ahf/109382411984106226 > [2]: > https://gitlab.torproject.org/tpo/core/tor/-/issues/26646#note_2795959 > > -- > Alexander Færøy > _______________________________________________ > tor-relays mailing list > tor-relays@lists.torproject.org > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays >
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays