For Dominic Raferd: Another approach also works for me: if you can automatically capture the addresses you've sent mail to, these addresses make a perfect, self- maintaining whitelist. If you're running Postfix then you can use its automatic BCC option to feed a copy of all mail, including outbound messages, whatever process you use to build a list of your mail recipients. Other MTAs probably have a similar ability, but I don't use them, so can't comment further.
A database makes a convenient place to keep the your correspondent list because discarding duplicate addresses then becomes a built-in facility and writing an SA plugin plus associated rule to interrogate the list and add negative points to the message is simple. My correspondent list is part of my mail archive, which is held as a PostgreSQL database. The associated functions I use to maintain and interrogate the correspondent list are: a) a BCC directive added to the Postfix configuration or the equivalent if you use a different MTA b) a Java application run each night to load the previous day's mail, both received and sent, into the database c) an SQL view that selects any message(s) in the archive that were sent to the address being checked d) a Perl plugin to execute the view using the message's sender as its search key and return TRUE if any messages were selected e) an SA rule to trigger the Perl plugin and add a negative score if the Perl plugin returns TRUE You'd need code to implement all five functions, but if you store your correspondent address list as a sorted text file, then all the code would be much simplified: - 'b' could be a Perl or awk script run as an additional 'logwatch' report that scans the previous day's part of the mail log, adds any new addresses to the sorted list - 'c' and 'd' could be combined as a single Perl plugin. Martin