Assume that Wget has retrieved a document from the host A, which
hasn't closed the connection in accordance with Wget's keep-alive
request.

Then Wget needs to connect to host B, which is really the same as A
because the provider uses DNS-based virtual hosts.  Is it OK to reuse
the connection to A to talk to B?  Specifically, Wget does this:

  0. Assume Wget has a file descriptor, old_fd, connected to A.
     Connection to B is needed to get a URL `http://B/...'.

  1. Resolve B, getting an array of IP addresses that it resolves to.
     This would need to be done anyway, so no additional overhead is
     introduced.

  2. Use getpeername(old_fd) to determine the address of the endpoint
     old_fd is connected to.

  3. If old_fd's peer address is one of the addresses that B resolves
     to, determined in step 1, use old_fd to communicate with B.
     Since Wget uses the `Host' header to specify the virtual host to
     talk to, everything works.  (IP-based virtual hosts are not a
     problem because they have different addresses in the first
     place.)

I cannot decide if this is totally evil or perfectly fine.  On the one
hand, persistent connections are (AFAIK) supposed to be a mere
optimization, and should not modify the semantics of the
communication.  So if I have the connection to the endpoint, I should
be able to reuse it.  But on the other hand, a server might decide to
connect a file descriptor to a handler for a specific virtual host,
which would be unable to serve anything else.  FWIW, it works fine
with Apache.

Reply via email to