On Tuesday, July 20, 2004, 10:59:27 AM, Marc Kool wrote:
(David Hooton wrote:)
>> However SURBL's in general don't use subdomains, I've just run a test
>> on my personal SURBL and SpamCopURI doesn't currently look at
>> subdomains.  I suspect because of the requirement for a lookup per
>> domain level which would obviously both make things inefficient and
>> also leave room for a denial of service.

> Hmmm. I am afraid that spammers will abuse this property of SpamCopURI.

Actually the design decision to reduce subdomains to base domains
was made to eliminate the abuse by spammers of using randomized
subdomains....

Since AOL, ATT, MSN, or other legitimate ISPs and their subdomains
are not often professional spammer destinations, it seemed more
important to catch the deliberate randomizers.  It looks like
that may be less so for sex sites.

> This is what I stated in the original proposal: let's make a SURBL list
> for adult-related URI's, not necessarily spammers.
> I know that SURBL is meant to fight spam, but it is relatively easy to extend
> with functionality to ban emails that refer to adult sites, that I think
> SURBL is the place to do it instead of creating a new mechanism in SA.

I agree about some of the value in this, certainly for squid use.
 I can think of a few different ways to proceed:

1.  Discard all subdomains: probably too drastic for squid use
since some legitimate sites could be lost, but probably
appropriate for SURBL use.

2.  Fold subdomains to registrar domains: creates too
many false positives (at least for SURBL use) of sites hosted on
otherwise legitimate hosting providers like att.net, etc.
Would also break some squid matches.

3.  Include the subdomains (the fully qualified-domain names) in
the list as they appear in the data: this will prevent the
registrar domains (like att.net) from matching in SURBLs, and
it's also faithful to the original data, which can be a good
thing in general and is probably preferable for squid use.

The main problem is that most code for using SURBLs on the
client (mail server) side try to reduce the subdomains down to
base domains.  So they will tend not to match deliberately
included subdomains.  That can be an ok thing for SURBLs.
Essentially it tells SURBLs to ignore the subdomains.
If we wanted SURBLs to actually match these spam sites we'd
check the full subdomains.

For Squid use #3 is probably the desirable since it best captures
the original data.

So #3 would probably get the best results for both squid and
SURBLs (by side effect of not matching the registrar domains).
It's probably the best compromise under the current designed
uses of both squid and SURBLs.

Comments?

Jeff C.

Reply via email to