Reuben Farrelly wrote: > > Changing the subject a little, there have been many new people introduce > themselves on this list maybe with good intentions of working on squid, > who seem to vanish as fast as they arrive. I wonder if they've simply > (a) never intended to contribute in the first place, (b) done some work > privately but never released it or (c) taken a good look at the code, > and run away fast deciding it was all too hard ;-)
Option c. Let me tell you what happened in my research group. It started with one undergraduate thesis at my University. The intent was to do what I've done in my branch; add prefetching by interpreting HTML. The thesis student is a competent programmer, but after four months of work (reading code) and consulting other non-Squid programmers, he still didn't know how to attack the problem. It'd gotten so bad that he was just trying to read the log files for URLs fetched and then figure out which entry in the cache directory corresponded to the newly fetched object. Four months into an eight month thesis, the advising professor started looking for another student to help and found me. I only managed through judicious use of doxygen, profiling and debugging tools, trial and error (my first version hooked all the code into http.cc) then finally a help session with Robert Collins. The Squid source code is miserable. I'm not saying it's the worst code I've seen, because it isn't, but it's really really bad. Functions don't always do what they say they'll do. Take urlParse for example. You'd think it parses a URL, but it actually takes a URL and returns a new HttpRequest. Sure HttpRequests contain URLs, but they also contain a whole ton of other stuff too that goes well beyond the idea of breaking up a URL. "urlCanonicalClean" is the function that converts an HttpRequest back to a URL. That's not what the name sounds like. There's some confusion about the architecture. A lot of mailing list posts begin with "which part is the cache?" to which the answer is invariably "the whole thing is a cache!" An HTTP parser is not a cache. A client stream is not a cache. The config system is not a cache. The "store" is. Now, some parts of Squid are clearly delineated. Some parts are fuzzier. Can you tell me where the bloom filter is? And if you do know which functions are responsible, why haven't you grouped them into a bloom_filter.cc? The good news is that it looks like newer code that gets checked in is usually clearer and better commented. Perhaps it's because Squid3 is undergoing various transitions (C->C++, new ClientStream system, etc) which are unfinished and so it looks more complicated than it is. Either that or I'm starting to understand the internals better. I would recommend putting a doxygen output up on the website, then learn the doxygen comment format and start using it. That'd be a start. Beyond that, finish the transitions I guess. Try to make Squid a more straight-forward pipeline. Sorry if I sound too negative. I'm trying to be helpful with this email, but I find it very difficult to point at one thing and say "there's your problem". As for me, I want squid-prefetching off my hands, whether by landing it into HEAD or just walking away from it. Nick Lewycky
