> It is surprising, isn't it? It certainly feels like calling connect > without first binding to an address should have the same effect as > manually binding to an address and then calling connect, especially if > the address you bind to is the same as the kernel would have chosen > automatically. It seems like it might be a bug, but I'm not qualified to > judge that. Yes, I'm starting to think so too. And strange that Cloudflare doesn't mention stumbling upon this problem in their blogpost on running out of ephemeral ports. [1] If I find the time, I'll make an attempt at understanding exactly what is going on in the kernel.
> If I am interpreting your results correctly, it means that either of the > two extremes is safe Yes. That is what I think too. > Anyway, thank your for the insight. I apologize if I was inconsiderate > in my prior reply. Likewise! Best regards Anders Trier Olesen [1] https://blog.cloudflare.com/how-to-stop-running-out-of-ephemeral-ports-and-start-to-love-long-lived-connections/ On Mon, Dec 12, 2022 at 4:16 PM David Fifield <da...@bamsoftware.com> wrote: > On Mon, Dec 12, 2022 at 12:39:50AM +0100, Anders Trier Olesen wrote: > > I wrote some tests[1] which showed behaviour I did not expect. > > IP_BIND_ADDRESS_NO_PORT seems to work as it should, but calling bind > without it > > enabled turns out to be even worse than I thought. > > This is what I think is happening: A successful bind() on a socket > without > > IP_BIND_ADDRESS_NO_PORT enabled, with or without an explicit port > configured, > > makes the assigned (or supplied) port unavailable for new connect()s (on > > different sockets), no matter the destination. I.e if you exhaust the > entire > > net.ipv4.ip_local_port_range with bind() (no matter what IP you bind > to!), > > connect() will stop working - no matter what IP you attempt to connect > to. You > > can work around this by manually doing a bind() (with or without an > explicit > > port, but without IP_BIND_ADDRESS_NO_PORT) on the socket before > connect(). > > > > What blows my mind is that after running test2, you cannot connect to > anything > > without manually doing a bind() beforehand (as shown by test1 and test3 > above)! > > This also means that after running test2, software like ssh stops > working: > > > > When using IP_BIND_ADDRESS_NO_PORT, we don't have this problem (1 5 6 > can be > > run in any order): > > Thank you for preparing that experiment. It's really valuable, and it > looks a lot like what I was seeing on the Snowflake bridge: calls to > connect would fail with EADDRNOTAVAIL unless first bound concretely to a > port number. IP_BIND_ADDRESS_NO_PORT causes bind not to set a concrete > port number, so in that respect it's the same as calling connect without > calling bind first. > > It is surprising, isn't it? It certainly feels like calling connect > without first binding to an address should have the same effect as > manually binding to an address and then calling connect, especially if > the address you bind to is the same as the kernel would have chosen > automatically. It seems like it might be a bug, but I'm not qualified to > judge that. > > If I am interpreting your results correctly, it means that either of the > two extremes is safe: either everything that needs to bind to a source > address should call bind with IP_BIND_ADDRESS_NO_PORT, or else > everything (whether it needs a specific source address or not) should > call bind *without* IP_BIND_ADDRESS_NO_PORT. (The latter situation is > what we've arrived at on the Snowflake bridge.) The middle ground, where > some connections use IP_BIND_ADDRESS_NO_PORT and some do not, is what > causes trouble, because connections that do not use > IP_BIND_ADDRESS_NO_PORT somehow "poison" the ephemeral port pool for > connections that do use IP_BIND_ADDRESS_NO_PORT (and for connections > that do not bind at all). It would explain why causing HAProxy not to > use IP_BIND_ADDRESS_NO_PORT resolved errors in my case. > > > > Removing the IP_BIND_ADDRESS_NO_PORT option from Haproxy and > > > *doing nothing else* is sufficient to resolve the problem. > > > > Maybe there are other processes on the same host which calls bind() > without > > IP_BIND_ADDRESS_NO_PORT, and blocks the ports? E.g OutboundBindAddress or > > similar in torrc? > > OutboundBindAddress is a likely culprit. We did end up setting > OutboundBindAddress on the bridge during the period of intense > performance debugging at the end of September. > > One thing doesn't quite add up, though. The earliest EADDRNOTAVAIL log > messages started at 2022-09-28 10:57:26: > > https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40198 > Whereas according to the change history of /etc on the bridge, > OutboundBindAddress was first set some time between 2022-09-29 21:38:37 > and 2022-09-29 22:37:06, over 30 hours later. I would be tempted to say > this is a case of what you initially suspected, simple tuple exhaustion > between two static IP addresses, if not for the fact that pre-binding an > address resolved the problem in that case as well ("I get EADDRNOTAVAIL > sometimes even with netcat, making a connection to the haproxy port—but > not if I specify a source address in netcat"). But I only ran that > netcat test after OutboundBindAddress had been set, so there may have > been many factors being conflated. > > Anyway, thank your for the insight. I apologize if I was inconsiderate > in my prior reply. > _______________________________________________ > tor-relays mailing list > tor-relays@lists.torproject.org > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays >
_______________________________________________ tor-relays mailing list tor-relays@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-relays