Michael Wray wrote: > > db/porn/domains > contains playboy.com > > I also note that there are several entries in the domains that > also contain "playboy.com" and a few urls in the urls file that > also contain playboy.com. > > playboy.com is redirected just fine. > www.playboy.com is not redirected at all
That's the "domain/subdomain" problem. Let's see if I can explain it: Let's say blacklist #1 contains the single entry: nudepictures.com >From that entry, squidGuard will block: *nudepictures.com* (where '*'= anything or nothing) Blacklist #2 contains 2 entries: nudepictures.com dirty.nudepictures.com Blacklist #2 will block: nudepictures.com* dirty.nudepictures.com* Here are some examples that would be blocked by blacklist #1 but passed by blacklist #2: www.nudepictures.com girlie.nudepictures.com big.girlie.nudepictures.com horrible.nudepictures.com etc. >From a strictly logical standpoint, I think I can see some sense in it. Adding a subdomain to a domains file IMPLIES that there must be other subdomains that will not be blocked. If the entire domain was bad, the domain would have been used and everything would be blocked. Practically speaking, however, it creates a tremendous sanity check problem for all of our blacklist processing. If the owner of <pornsite.com> can get <bad.pornsite.com> added to our blacklists he has a new lease on life with almost limitless possibilities - <nasty.pornsite.com>, <nastier.pornsite.com>, <nastiest.pornsite.com>, etc. Most blacklist admins don't know this problem exists. Even if they are aware of the problem, how do you implement a solution? In some cases, playboy.com for example, the domain should be in the file, and you should not allow any subdomains of playboy.com to be added to the domains file. In other cases, the domain might belong to an isp. Some of their customers put up porn sites, most do not. In that case the subdomains should be in the domains file, not the domain. As you've noted, it can blow a sizeable hole in your coverage. I have a utility (written by someone on this list) that will identify domain and subdomains that are together and separate them into new files. I'll post it or provide a link to it if anyone is interested. I'll need to look back to see who wrote it so that I can give him proper credit, and I'll write up a paragraph or two on its usage based on my experience with it. Rick
