On 21/09/2013 9:24 a.m., Alex Rousskov wrote:
Hello,I am dealing with a Squid hierarchy where network connectivity between a child Squid and several parent Squids is insecure and comes with long round-trip times. To combat insecurity, the child Squid encrypts connections to its peers, which makes connection establishment even slower because it adds SSL handshake to the TCP handshake. All proxies involved are using HTTP persistent connections to the extent possible, but when a child Squid does not talk to a parent for a while, pconns timeout (not to mention non-idempotent requests, network errors, and other connection closure reasons). Due to erratic user behavior, complex peer routing, network errors, and stringent performance demands, it is not possible to just configure a long-enough timeout to always keep a few connections open and ready for use. The unlucky child Squid user whose requests are going to hit that unused-for-a-while parent suffers from significant initial delays as the child Squid has to establish several new connections to its peer, performing TCP and SSL handshakes (accumulating many round-trip times). After a few long seconds, the connections are established, the first "page" finally loads. Unfortunately, these users need nearly-instant responses, and it is especially difficult to pacify them because they see decent performance on established connections (so they know that available connectivity can support their needs well). It might be possible to utilize some kind of TCP level proxy that would maintain a pool of TCP connections between the two proxies, but that solution would significantly increase administrative overheads and will not integrate well with the existing idle pconn pool already maintained by Squid (not to mention existing Squid peer configuration and management code). The best solution I can come up with is to modify Squid so that it proactively maintains a pool of N idle connections between two peers. If an idle connection is removed from the pool (for any reason), the child Squid opens a new one, even if there is no request that can use that newly opened connection immediately. The size of this "steady" pool would be configurable, of course. Squid may also need to pace connection opening rate to avoid creating an impression of a DoS attack, especially when priming large steady pools. There are some other minor aspects of this algorithm that may need to be configurable. I think this "steady" connection pool feature would be useful in other performance-sensitive environments with a fixed number of peers where regular connection establishment is just too slow for the given user delay tolerance. Is there a better way to solve this problem? If not, any objections to adding such a feature to Squid?
There are a few things that I think need to be done in the current 3.x which can probably help resolve this. Listed here by ease of implementation:
1) the background neighbour probe which opens a TCP connection to the peer then drops it should be at least saving that connection in the pconn pool for re-use if possible. This also brings Squid more inline with browsers which open multiple connections then leave some idle until later use happens. NP: there is talk in HTTPbis WG about the negative side effects these connections have on servers. It would be beneficial to shove out a small dummy request such as OPTIONS to ensure that the server is aware of it being a legit connection and able to log things like Squid UA instead of treating as part of a DoS.
2) porting the 2.7 feature for cache_peer idle= option which maintains a minimum pool count for idle connections open at all times to the relevant peer. --> this seems to be what you are describing adding to 3.x. It can be combined with (1) to reduce the TCP overheads imposed on the server.
3) happy-eyeballs TCP connections based on the peer FwdState destination list. Where N out of the potential destinations are attempted simultaneously and the N-1 which lose the race get pushed into the pconn pool for later.
4) server-side pipelining. So that all requests for the page get queued for the pipeline and only one TCP+SSL setup costs is paid. This will also help push us towards HTTP/2 preparedness.
What I have been thinking of implementing for some time is to have the neighbour.cc function peerProbeConnectDone() on success push the conn into pconn pool (1 above). Then (2 above) having the pconn pool idle timeout/close/pop functions check the peer idle count and call peerProbeConnect() if the idle count is smaller than the setting. There are some complications that the peer configured with hostnames will have multiple IP addresses and thus multiple idle pools - sice the cache_peer related pools are indexed by destination IP:port pair.
Amos
