After trying to figure out Michael's problems with AWL, I've managed to get similarly confusing results... although I know exactly HOW they got this way... and how to fix 'em.
SETUP: I'm running SA 2.63 (debian linux), with a system-wide bayes setup. I receive mail on a list that is normally fine (bugtraq@securityfocus.com). Someone recently posted there from a domain listed in a blacklist, resulting in an overall score of 88 ("insanely spammy"). 1. After realizing the problem, I've REMOVED the blacklisting from the .cf (it's a popular .cf), and restarted spamd, the message is score dropped to (results edited): Content analysis details: (38.4 points, 5.0 required) -4.9 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] -15 USER_IN_DEF_WHITELIST From: address is in the default white-list 50 AWL AWL: Auto-whitelist adjustment 2. Noticing the AWL score adjustment, I su'ed to the receiving user, and ran both: spamassassin -R < msg and spamassassin --remove-addr-from-whitelist="address" and it (appeared) to drop (results edited): Content analysis details: (18.4 points, 5.0 required) -4.9 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] -15 USER_IN_DEF_WHITELIST From: address is in the default white-list 30 AWL AWL: Auto-whitelist adjustment 3. I restarted spamd (having thought per-user whitelist adjustments did NOT require restart), and got: Content analysis details: (13.4 points, 5.0 required) -4.9 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] -15 USER_IN_DEF_WHITELIST From: address is in the default white-list 25 AWL AWL: Auto-whitelist adjustment I thought that indicated a restart mattered... but on further reflection, I it doesn't. The AWL data is persistent. 4. I figured out that /usr/share/spamassassin/60_whitelist.cf contains: def_whitelist_from_rcvd [EMAIL PROTECTED] securityfocus.com which I left in, but am relieved to have discovered. (Note: No similar entry for debian!) 5. I fed the message into spamassassin without AWL, and it's scoring a negative (non-spam) score as expected. 6. It (finally) dawned on me that spamd is doing the AWL scoring, and IT is running as a different user. I su'ed to THAT user and did: spamassassin --remove-addr-from-whitelist="address" 7. NOW it's working as expected: Content analysis details: (-11.6 points, 5.0 required) -4.9 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] -15 USER_IN_DEF_WHITELIST From: address is in the default white-list 8. After poking about, I found check_whitelist under /usr/share/doc/spamassassin and moved it to /usr/local/bin. I verified that the address was gone after step 6, and that it is now back after step 7: check_whitelist | grep domain <other entries -- actual spammers!> -11.6 (-11.6/1) -- [EMAIL PROTECTED]|ip=X.X So, AWL is working AS IT SHOULD, and re-testing the message WITHOUT clearing the AWL database (for the spamassassin/spamd user) is resulting in progressively lower scores each pass (AWL levelling out). The key to FIXING AWL after white/blacklist problems seems to judicious use of check_whitelist and --remove-add-from-whitelist= . So, if AWL seems WAY off: 1. Check ALL .cf files to make sure there are no erroneous black or whitelist entries. Remove any that are causing problems. THESE CAN CONTINUE TO CAUSE AWL PROBLEMS EVEN LONG AFTER DELETION! 2. Verify that you're running as the user that actually does the AWL lookups (spamassassin or spamd). 3. Take a look at the AWL database with check_whitelist. 4. Drop any "insane" scores in there with spamassassin --remove-addr-from-whitelist=<name> (regardless of whether it was a white OR BLACKlist entry that broke it.) Maybe just delete (good idea?) files if they're all out-of-whack. 5. Verify they're gone with check_whitelist 6. Repeat testing (save those erroneously scored messages!) 7. Continue using AWL and enjoy the benefits it provides. Does this seem a sound approach? Didn't find anything similar on the Wiki. Aren't the default whitelists (which I'd previously overlooked) a potential exploit waiting to happen? I haven't been able to exploit them, but then I've barely tried. In retrospect, it seems that white/blacklisting is a very heavy-handed and error-prone approach, especially if the domains or users later change their ways. I was actually lucky that I only had ONE user (spamd) to fix, but I can imagine doing hundreds or thousands. A change to a SYSTEM config file can ripple it's way into USER configurations that can be tricky to pinpoint (for the non SA expert like myself). I think I will convert black/whitelist entries into scored rules with "less insane" values to minimize these problems in the future. Perhaps only enough to offset any bayes scoring adjustments. Any thoughts or feedback? - Bob