Odd Spamassassin/Bayes behavior

Kevin Lewis 25 Mar 2004 00:37:25 -0000

I'm experience an strange problem with spamassassin, bayes and
auto-whitelist-from.   I've looked through everything I can find to figure
this one out but I'm stuck.  Anyone got any suggestion?   Here's some
background.


SA has been running sitewide my RH mail gateway fine here for more than a
year.   I do very little other than update additional rule sets such a
bigevil and run sa-learn on the few new spams that make it through for
some reason.   So... last week I was doing this except when I pointed
sa-learn at my spams I forgot to add --mbox flag and I guess it ran
through it as one big message.   So I figured I hosed the bayes database. 
I let it run a day or so and noticed that I wasn't getting ANY bayes hits
even though it had 50K+  hams and spams.   So I decided to delete the
bayes files and relearn with corpus of about 10k hams and spams I stashed
away.   Everything went fine but still no bayes hits.  None.   OK, so I
went to one of my backup tapes and restored the bayes files from
there.....   STILL NO Bayes hits.   Also I began to notice that none of my
auto-whitelist-from sites are not getting their proper scores either.   So
what's up?   --lint -D show no errors but here's what's really odd.   I'm
using sendmail - spamd sitewide.   When a spam comes through the normal
channel it is getting scanned but it looks as if not all the rules are
getting triggered, no bayes and the white-list-from domains are not
getting applied.   For example, here's a header of an spam that came
through the normal channels.

X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on
        cookie.aetn.org
X-Spam-Report:
        *  1.0 OACYS_CONS_6 BODY: Email has six consonants in a row
        *  0.1 LG_4C_2V_3C BODY: Gibberish found?
        *  0.0 RM_rb_FONT BODY: Testing for HTML Font tag in emails
        *  0.0 RM_rb_ANCHOR BODY: Testing for HTML end of anchor in emails
        *  0.0 RM_rb_PARA BODY: Testing for HTML Paragraph in emails
        *  0.0 RM_rb_BREAK BODY: Testing for HTML Break in emails
        *  0.0 RM_rb_HTML BODY: Testing for HTML tag in emails
        *  0.4 FVGT_m_MULTI_ODD5 Contains multiple odd letter combinations
        *  1.1 FVGT_m_MULTI_ODD2 Contains multiple odd letter combinations
        *  0.3 FVGT_m_MULTI_ODD3 Contains multiple odd letter combinations
        *  0.3 FVGT_m_MULTI_ODD4 Contains multiple odd letter combinations
        *  0.0 AWL AWL: Auto-whitelist adjustment
X-Spam-Level: ***
X-Spam-Status: No, hits=3.2 required=5.0 tests=AWL,FVGT_m_MULTI_ODD2,
        FVGT_m_MULTI_ODD3,FVGT_m_MULTI_ODD4,FVGT_m_MULTI_ODD5,LG_4C_2V_3C,
        OACYS_CONS_6,RM_rb_ANCHOR,RM_rb_BREAK,RM_rb_FONT,RM_rb_HTML,
        RM_rb_PARA autolearn=no version=2.60


If I run the same message run through using spamassassin -t <  test.txt  I
get this:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on
        cookie.aetn.org
X-Spam-Report:
        *  0.8 SUBJ_YOUR_DEBT Subject contains "Your Bills" or similar
        *  1.0 OACYS_CONS_6 BODY: Email has six consonants in a row
        *  0.1 LG_4C_2V_3C BODY: Gibberish found?
        *  1.5 HTML_WEB_BUGS BODY: Image tag intended to identify you
        *  0.1 HTML_FONTCOLOR_RED BODY: HTML font color is red
        *  5.4 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
        *      [score: 1.0000]
        *  0.3 HTML_MESSAGE BODY: HTML included in message
        *  0.0 RM_rb_FONT BODY: Testing for HTML Font tag in emails
        *  0.0 RM_rb_ANCHOR BODY: Testing for HTML end of anchor in emails
        *  0.0 RM_rb_PARA BODY: Testing for HTML Paragraph in emails
        *  0.0 RM_rb_BREAK BODY: Testing for HTML Break in emails
        *  0.0 RM_rb_HTML BODY: Testing for HTML tag in emails
        *  1.0 URI_OFFERS URI: Message has link to company offers
        *  1.1 FVGT_m_MULTI_ODD2 Contains multiple odd letter combinations
        *  0.3 FVGT_m_MULTI_ODD3 Contains multiple odd letter combinations
        *  0.3 FVGT_m_MULTI_ODD4 Contains multiple odd letter combinations
        *  0.4 FVGT_m_MULTI_ODD5 Contains multiple odd letter combinations
X-Spam-Status: Yes, hits=12.3 required=5.0
tests=BAYES_99,FVGT_m_MULTI_ODD2,
        FVGT_m_MULTI_ODD3,FVGT_m_MULTI_ODD4,FVGT_m_MULTI_ODD5,
        HTML_FONTCOLOR_RED,HTML_MESSAGE,HTML_WEB_BUGS,LG_4C_2V_3C,
        OACYS_CONS_6,RM_rb_ANCHOR,RM_rb_BREAK,RM_rb_FONT,RM_rb_HTML,
        RM_rb_PARA,SUBJ_YOUR_DEBT,URI_OFFERS autolearn=no version=2.60
X-Spam-Level: ************

Output of --lint -D

debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/kerberos/bin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/usr/X11R6/bin', keeping.
debug: PATH included '/opt/ARCserveIT/bin', keeping.
debug: PATH included '/opt/ARCserveIT/sbin', keeping.
debug: PATH included '/home/klewis/bin', which doesn't exist, dropping.
debug: Final PATH set to:
/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/ARCserveIT/bin:/opt/ARCserveIT/sbin
debug: ignore: using a test message to lint rules
debug: using "/usr/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
debug: using "/home/klewis/.spamassassin" for user state dir
debug: using "/home/klewis/.spamassassin/user_prefs" for user prefs file
debug: bayes: 24854 tie-ing to DB file R/O /var/local/bayes/bayes_toks
debug: bayes: 24854 tie-ing to DB file R/O /var/local/bayes/bayes_seen
debug: bayes: found bayes db version 2
debug: Score set 3 chosen.
debug: Initialising learner
debug: running header regexp tests; score so far=0
debug: running body-text per-line regexp tests; score so far=2.077
debug: bayes corpus size: nspam = 466, nham = 1239
debug: uri tests: Done uriRE
debug: tokenize: header tokens for *F = "U*ignore
D*compiling.spamassassin.taint.org D*spamassassin.taint.org D*taint.org
D*org"
debug: tokenize: header tokens for *m = " 1080174745 lint_rules "
debug: bayes token 'somewhat' => 0.0489090909090909
debug: bayes token 'N:H*m:NNNNNNNNNN' => 0.0509685368818047
debug: bayes token 'H*F:D*org' => 0.078531083962566
debug: bayes: score = 0.00495606550449745
debug: bayes: 24854 untie-ing
debug: bayes: 24854 untie-ing db_toks
debug: bayes: 24854 untie-ing db_seen
debug: Razor2 is not available
debug: running raw-body-text per-line regexp tests; score so far=2.077
debug: running uri tests; score so far=2.077
debug: uri tests: Done uriRE
debug: running full-text regexp tests; score so far=2.077
debug: Razor2 is not available
debug: DCCifd is not available: no r/w dccifd socket found.
debug: Current PATH is:
/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/opt/ARCserveIT/bin:/opt/ARCserveIT/sbin
debug: DCC is not available: no executable dccproc found.
debug: Pyzor is not available: pyzor not found
debug: all '*From' addrs: [EMAIL PROTECTED]
debug: all '*To' addrs:
ebug: is Net::DNS::Resolver available? yes
debug: trying (3) slashdot.org...
debug: looking up MX for 'slashdot.org'
debug: MX for 'slashdot.org' exists? 1
debug: MX lookup of slashdot.org succeeded => Dns available (set
dns_available to hardcode)
debug: is DNS available? 1
debug: running meta tests; score so far=2.077
debug: is spam? score=-2.823 required=5
tests=BAYES_00,DATE_MISSING,NO_REAL_NAME

Kevin Lewis
AETN
Dir. of Information Technology

Odd Spamassassin/Bayes behavior

Reply via email to