Re: so what is involved in calling squid-3.0 'stable'?

Nick Lewycky Sat, 22 Apr 2006 08:41:57 -0700

Reuben Farrelly wrote:
> 
> Changing the subject a little, there have been many new people introduce
> themselves on this list maybe with good intentions of working on squid,
> who seem to vanish as fast as they arrive.  I wonder if they've simply
> (a) never intended to contribute in the first place, (b) done some work
> privately but never released it or (c) taken a good look at the code,
> and run away fast deciding it was all too hard ;-)


Option c.

Let me tell you what happened in my research group. It started with one
undergraduate thesis at my University. The intent was to do what I've
done in my branch; add prefetching by interpreting HTML. The thesis
student is a competent programmer, but after four months of work
(reading code) and consulting other non-Squid programmers, he still
didn't know how to attack the problem. It'd gotten so bad that he was
just trying to read the log files for URLs fetched and then figure out
which entry in the cache directory corresponded to the newly fetched
object. Four months into an eight month thesis, the advising professor
started looking for another student to help and found me. I only managed
through judicious use of doxygen, profiling and debugging tools, trial
and error (my first version hooked all the code into http.cc) then
finally a help session with Robert Collins.

The Squid source code is miserable. I'm not saying it's the worst code
I've seen, because it isn't, but it's really really bad. Functions don't
always do what they say they'll do. Take urlParse for example. You'd
think it parses a URL, but it actually takes a URL and returns a new
HttpRequest. Sure HttpRequests contain URLs, but they also contain a
whole ton of other stuff too that goes well beyond the idea of breaking
up a URL. "urlCanonicalClean" is the function that converts an
HttpRequest back to a URL. That's not what the name sounds like.

There's some confusion about the architecture. A lot of mailing list
posts begin with "which part is the cache?" to which the answer is
invariably "the whole thing is a cache!" An HTTP parser is not a cache.
A client stream is not a cache. The config system is not a cache. The
"store" is. Now, some parts of Squid are clearly delineated. Some parts
are fuzzier. Can you tell me where the bloom filter is? And if you do
know which functions are responsible, why haven't you grouped them into
a bloom_filter.cc?

The good news is that it looks like newer code that gets checked in is
usually clearer and better commented. Perhaps it's because Squid3 is
undergoing various transitions (C->C++, new ClientStream system, etc)
which are unfinished and so it looks more complicated than it is. Either
that or I'm starting to understand the internals better.

I would recommend putting a doxygen output up on the website, then learn
the doxygen comment format and start using it. That'd be a start. Beyond
that, finish the transitions I guess. Try to make Squid a more
straight-forward pipeline.

Sorry if I sound too negative. I'm trying to be helpful with this email,
but I find it very difficult to point at one thing and say "there's your
problem". As for me, I want squid-prefetching off my hands, whether by
landing it into HEAD or just walking away from it.

Nick Lewycky

Re: so what is involved in calling squid-3.0 'stable'?

Reply via email to