My plan is to create another free reputation service, like a combination of
a whitelist and a blacklist, except providing the actual data instead of
just yes/no/maybe.  To help SpamAssassin filtering, obviously.

The data I'm planning to provide is, for every IP address, the percentage
of email from it which was ham (normalized like the S/O value in
SpamAssassin ruleqa), and total count of recent emails from that IP
(a logarithm of it).  Output data based on my own email:

http://www.chaosreigns.com/iprep/iprep.txt


With my 2618 hams, and 2956 spams, there were only *two* IP addresses that
were not 100% spam or 100% ham (both belong to google).  This kind of thing
is why black lists and white lists are useful for predicting if an email is
spam or ham.  The highest ranked test in SpamAssassin is RCVD_IN_XBL, a
spamhaus.org blacklist.  #7 is RCVD_IN_PSBL, and #11 is RCVD_IN_DNSWL_HI,
which is also the highest ranking "nice" rule.


To do this, I need data from you.

Create a folder containing only email you've confirmed is ham, and another
containing what you've confirmed is spam.

http://www.chaosreigns.com/iprep/dl/iprep.pl

./iprep.pl ham:dir:~/masscheckwork/ham spam:dir:~/masscheckwork/spam/

The arguments are the same as the "targets" used by SpamAssassin's
mass-check (using its perl modules):

    <class>:<format>:<location>
    <class>       is "spam" or "ham"
    <format>      is "dir", "file", "mbx", "mbox", or "detect"
    <location>    is a file or directory name.  globbing of ~ and * is supported

You can specify many targets at once.  

Please run it as a daily cron job.

The required ~/.ipreprc config file:
$trusted_networks = '<space delimited list of trusted hosts>';
$user = 'username';
$pass = 'password';

$trusted_networks is very important, and needs to contain everything from
both your trusted_networks and internal_networks values from SpamAssassin,
which are documented here:  
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#network_test_options
http://wiki.apache.org/spamassassin/TrustPath
This is to prevent reporting the IP of your trusted relays instead of the
actual IP sending the email.  

Email me to get an account to upload the data.  Please email me from a
non-freemail account, one not listed in
http://svn.apache.org/repos/asf/spamassassin/trunk/rules/20_freemail_domains.cf
Major examples of freemail accounts, which I don't want you to email me from,
are:  gmail.com, yahoo.com, and hotmail.com.  This is just to make it
slightly harder for spammers to send me bad data.  And if you're on this
list, I know you have a non-freemail account.

I won't tell anybody your email address, and I consider the uploaded data
confidential.


I'm thinking about providing the data only via rsync, instead of via DNS,
because I think that should reduce network load.  I'd create a plugin that
would grab the data directly.


Just as a disclosure, I have been involved with dnswl.org since November
2006.  I have no plan to use any of their data, other than to look for
problems in my data.

-- 
"Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'." - The Color of Magic
http://www.ChaosReigns.com

Reply via email to