At 7/9/02 6:36 PM, Mark Reynolds wrote:
>Hi Robert and Justin,
>
>I've documented the original idea (well, saved some emails :-)
>at http://bl.reynolds.net.au/ksi/
>
>I've been focusing on learning who to run a dnsbl service,
>scanning, and integrating it all together. Current system
>was spread over 4 servers, so I'm also merging it all onto
>one.
>
>http://bl.reynolds.net.au/
>
>So I think I've got that sussed now, and will be moving onto
>the ksi project in the next few months.
Sounds excellent ("Key Spam Indicators" is a good name, too -- I couldn't
think of one). I'd appreciate it if you could post progress reports so I
(and others) could help out where it would be useful. (I could also
provide some US bandwidth and a domain name.)
I'd definitely recommend including phone numbers as well as URLs and
e-mail addresses. I have a manual content blocking list that I maintain
for egregious spammers, and I've found that the most spam is blocked by
URLs, followed by phone numbers, then e-mail addresses. I think this is
just because of the relative permanency of these: a spammer who has
bought a domain name or a phone number can't ditch it as quickly as he
can sign up for a new free e-mail address.
The main problem with phone numbers in my scheme is that spammers
disguise the format by writing idiotic things like "88 8 - 729 89 76",
which makes it harder for my current postfix content filter to catch 'em
unless I write CPU-intensive rules for each phone number. Of course,
smarter code would strip out the extraneous characters before trying an
lookup.
And as I mentioned offlist to Jason, I think the hardest part of this
system is automating the submission process. The trouble is that spammers
do sometimes include other people's e-mail addresses and so forth in
their spam -- for example, I get plenty of spam saying "your site
www.tigertech.net is not listed in search engines!" -- and an automated
process would presumably tag that. Obviously, people submitting spam
could be presented with a list of contact info and then uncheck any that
isn't related to the spammer, like SpamCop, but people frequently screw
up the SpamCop reports (I've accidentally reported myself a couple of
times). In addition, it would be nice to accept reports based on a
spamassassin -r report, which isn't interactive (and also to parse input
from spamtraps, NANAE feeds, and so forth).
Probably the solution there is to simply not list an indicator unless
it's been reported by multiple people, as you suggested, and to increase
the resulting weight as it gets reported by more and more people.
Obviously, then, low rankings should be taken with a grain of salt. It
might be a good idea to run two separate RBL facilities -- one of which
returns a weight (for clients smart enough to deal with that), and one of
which just returns a yes/no answer based on whether the weight exceeds a
certain level for less intelligent clients.
I also agree that just removing anyone who asks is a good idea; I doubt
it would become a problem.
Finally, an observation about:
>- trim off any text after "?" or "&" (to avoid URLs like
> http://foo.com/?49435 and http://foo.com/?9438 being treated as
> differing, when they are not)
Sometimes the part on the end is an affiliate ID, and only one affiliate
is spamming, so it might be useful to retain in some cases. (On the other
hand, if one affiliate is spamming, it's probably some kind of shady
scheme that will just encourage other idiots to do it as well, so perhaps
it doesn't matter so much.)
------------------------------------
Robert L Mathews, Tiger Technologies
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Two, two, TWO treats in one.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk