From: <[EMAIL PROTECTED]>

Kristopher Austin wrote:
RANK    RULE NAME                     COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM
------------------------------------------------------------
   1    HTML_MESSAGE                  45870     5.13   27.72   70.37
55.36

Wait... so 27% of all mail is HTML, 70% of spam is HTML, and 55% of ham is HTML?

<<jdow>>
So what's the problem? (He's not running Bayes or it's badly broken, though.)

TOP SPAM RULES FIRED
------------------------------------------------------------
RANK    RULE NAME                       COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
------------------------------------------------------------
  1    BAYES_99                          962     4.81   32.97   93.04    0.11
  2    RCVD_IN_XBL                       574     2.87   19.67   55.51    0.05
  3    HTML_MESSAGE                      571     2.86   19.57   55.22    7.91
  4    URIBL_BLACKB                      563     2.82   19.29   54.45    0.05
  5    URIBL_JP_SURBL                    484     2.42   16.59   46.81    0.00
  6    URIBL_SC_SURBL                    479     2.40   16.42   46.32    0.00
  7    RCVD_IN_BL_SPAMCOP_NET            440     2.20   15.08   42.55    3.34
  8    URIBL_OB_SURBL                    409     2.05   14.02   39.56    0.00
  9    URIBL_WS_SURBL                    403     2.02   13.81   38.97    0.00
 10    URIBL_SBL                         397     1.99   13.61   38.39    0.05
 11    URIBL_AB_SURBL                    368     1.84   12.61   35.59    0.00
 12    JD_TO_EARTHLINK                   357     1.79   12.23   34.53    2.71
 13    RCVD_IN_SORBS_DUL                 270     1.35    9.25   26.11    0.53
 14    RCVD_IN_DSBL                      253     1.27    8.67   24.47    0.00
 15    URIBL_XS_SURBL                    241     1.21    8.26   23.31    0.00
 16    LW_MULT_RECIP3                    237     1.19    8.12   22.92    2.60
 17    JD_MY_NAME                        231     1.16    7.92   22.34    2.71
 18    DNS_FROM_RFC_POST                 194     0.97    6.65   18.76    0.11
 19    MIME_HTML_ONLY                    192     0.96    6.58   18.57    0.80
 20    JD_TO_EARTHLINKCOM                189     0.95    6.48   18.28    0.00
------------------------------------------------------------

TOP HAM RULES FIRED
------------------------------------------------------------
RANK    RULE NAME                       COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
------------------------------------------------------------
  1    BAYES_00                         1654    20.19   56.68    0.39   87.79
  2    JD_LKML_RELAY                     787     9.61   26.97    0.77   41.77
  3    JD_PATCH_SUBJ                     316     3.86   10.83    0.00   16.77
  4    RATWR10a_MESSID                   287     3.50    9.84    2.71   15.23
  5    JD_CHICKENPOX                     247     3.02    8.46   11.90   13.11
  6    NOT_TO_ME                         231     2.82    7.92   16.54   12.26
  7    RCVD_BY_IP                        183     2.23    6.27   11.70    9.71
  8    HTML_MESSAGE                      149     1.82    5.11   55.22    7.91
  9    UHS_BCW                           135     1.65    4.63    0.10    7.17
 10    SARE_MSGID_LONG40                 120     1.46    4.11    0.19    6.37
 11    JD_MANGY_MORTGAGES                118     1.44    4.04   11.61    6.26
 12    USER_IN_WHITELIST                 111     1.35    3.80    0.00    5.89
 13    JD_GENERIC                         90     1.10    3.08    0.87    4.78
 14    BAYES_50                           78     0.95    2.67    0.97    4.14
 15    HELO_EQ_LT4_SA                     76     0.93    2.60    4.16    4.03
 16    BAYES_20                           63     0.77    2.16    0.10    3.34
 17    RCVD_IN_BL_SPAMCOP_NET             63     0.77    2.16   42.55    3.34
 18    FM_MULTI_ODD2                      61     0.74    2.09    6.09    3.24
 19    WHITELIST_NTDEV                    61     0.74    2.09    0.00    3.24
 20    JD_MANGY_MORTGAGE                  60     0.73    2.06    0.48    3.18
------------------------------------------------------------
==========8<-------------
Note that there are a lot of rules in there which I intentionally score at
the 0.01 point level so I see them explicitly. I use them in meta rules to
create some interesting special cases that are rather effective. (And as for
the HTML - consider that I am on a lot of mailing lists that are basically
text only and have high volume.

Note that the results are a little "skewed" with reality. I live with a
fairly high false positive rate for the LKML nonsense. So that will affect
the false positive rate on some of the BLs, for example.

{^_-}

Reply via email to