Hi. I've been working to add prefetching to squid3. It works by analyzing HTML and looking for various tags that a graphical browser an be expected to request.
So far, it seems to just-barely work. What works is checking the content-type of the document, avoiding encoded (gzip'ed) documents, analyzing the HTML using libxml2 in "tag soup" mode, resolving the full URL from relative references, and fetching the files into the cache. (I would, of course, appreciate code reviews of the branch before I diverge too far!) However, I've run into a few problems. To prefetch a page, we call clientBeginRequest. I've already had to extend the richness of this interface a little. The main problem is that it will open up a new socket for each call. On a page with 100 prefetchables, it will open 100 TCP connections to the remote server. That's not nice. I need a way to re-use a connection for multiple requests. How should I do this? I'd like clientBeginRequest to be smart enough to handle this behind the scenes. Occasionally I see duplicate prefetches. I think what's going on here is that the object is uncacheable. The only way I can think of solving this is by adding an "uncacheable" entry type to the store -- but that just seems wrong, conceptually. On a related note, maybe we could terminate a prefetch as soon as we receive the headers and notice that it's uncacheable. Currently, we download the whole thing and just discard it (after analyzing it for more prefetchables if it's HTML). Finally, does anyone have suggestions for how to test for performance improvement due to prefetching? Thanks, Nick Lewycky