More testing and a possible fix for skewed AWL scoring (was Re: AWL vs. mailing lists)

Bob George 26 Feb 2004 20:10:16 -0000

After trying to figure out Michael's problems with AWL, I've managed to get
similarly confusing results... although I know exactly HOW they got this way...
and how to fix 'em.


SETUP:
I'm running SA 2.63 (debian linux), with a system-wide bayes setup. I receive
mail on a list that is normally fine (bugtraq@securityfocus.com). Someone
recently posted there from a domain listed in a blacklist, resulting in an
overall score of 88 ("insanely spammy").

1. After realizing the problem, I've REMOVED the blacklisting from the .cf
(it's a popular .cf), and restarted spamd, the message is score dropped to
(results edited):

Content analysis details:   (38.4 points, 5.0 required)
-4.9 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
                            [score: 0.0000]
 -15 USER_IN_DEF_WHITELIST  From: address is in the default white-list
  50 AWL                    AWL: Auto-whitelist adjustment


2. Noticing the AWL score adjustment, I su'ed to the receiving user, and ran
both:

    spamassassin -R < msg

and

    spamassassin --remove-addr-from-whitelist="address"

and it (appeared) to drop (results edited):

Content analysis details:   (18.4 points, 5.0 required)
-4.9 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
                            [score: 0.0000]
-15 USER_IN_DEF_WHITELIST  From: address is in the default white-list
   30 AWL                    AWL: Auto-whitelist adjustment

3. I restarted spamd (having thought per-user whitelist adjustments did NOT
require restart), and got:

Content analysis details:   (13.4 points, 5.0 required)
-4.9 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
                            [score: 0.0000]
 -15 USER_IN_DEF_WHITELIST  From: address is in the default white-list
  25 AWL                    AWL: Auto-whitelist adjustment

I thought that indicated a restart mattered... but on further reflection, I it
doesn't. The AWL data is persistent.

4. I figured out that /usr/share/spamassassin/60_whitelist.cf contains:

    def_whitelist_from_rcvd  [EMAIL PROTECTED]
securityfocus.com

which I left in, but am relieved to have discovered. (Note: No similar entry
for debian!)

5. I fed the message into spamassassin without AWL, and it's scoring a negative
(non-spam) score as expected.

6. It (finally) dawned on me that spamd is doing the AWL scoring, and IT is
running as a different user. I su'ed to THAT user and did:

    spamassassin --remove-addr-from-whitelist="address"

7. NOW it's working as expected:

Content analysis details:   (-11.6 points, 5.0 required)
-4.9 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
                            [score: 0.0000]
-15 USER_IN_DEF_WHITELIST  From: address is in the default white-list

8. After poking about, I found check_whitelist under
/usr/share/doc/spamassassin and moved it to /usr/local/bin. I verified that the
address was gone after step 6, and that it is now back after step 7:

check_whitelist | grep domain
<other entries -- actual spammers!>
   -11.6       (-11.6/1)  --  [EMAIL PROTECTED]|ip=X.X

So, AWL is working AS IT SHOULD, and re-testing the message WITHOUT clearing
the AWL database (for the spamassassin/spamd user) is resulting in
progressively lower scores each pass (AWL levelling out). The key to FIXING AWL
after white/blacklist problems seems to judicious use of check_whitelist
and --remove-add-from-whitelist= .

So, if AWL seems WAY off:

1. Check ALL .cf files to make sure there are no erroneous black or whitelist
entries. Remove any that are causing problems. THESE CAN CONTINUE TO CAUSE AWL
PROBLEMS EVEN LONG AFTER DELETION!

2. Verify that you're running as the user that actually does the AWL lookups
(spamassassin or spamd).

3. Take a look at the AWL database with check_whitelist.

4. Drop any "insane" scores in there with
spamassassin --remove-addr-from-whitelist=<name> (regardless of whether it was
a white OR BLACKlist entry that broke it.) Maybe just delete (good idea?) files
if they're all out-of-whack.

5. Verify they're gone with check_whitelist

6. Repeat testing (save those erroneously scored messages!)

7. Continue using AWL and enjoy the benefits it provides.

Does this seem a sound approach? Didn't find anything similar on the Wiki.

Aren't the default whitelists (which I'd previously overlooked) a potential
exploit waiting to happen? I haven't been able to exploit them, but then I've
barely tried.

In retrospect, it seems that white/blacklisting is a very heavy-handed and
error-prone approach, especially if the domains or users later change their
ways. I was actually lucky that I only had ONE user (spamd) to fix, but I can
imagine doing hundreds or thousands. A change to a SYSTEM config file can
ripple it's way into USER configurations that can be tricky to pinpoint (for
the non SA expert like myself). I think I will convert black/whitelist entries
into scored rules with "less insane" values to minimize these problems in the
future. Perhaps only enough to offset any bayes scoring adjustments.

Any thoughts or feedback?

- Bob

More testing and a possible fix for skewed AWL scoring (was Re: AWL vs. mailing lists)

Reply via email to