On 03/09/16 07:33, Dave Funk wrote:
On Tue, 8 Mar 2016, Marc Perkel wrote:

This is the for what it's worth department.

I've generated the following rules combination lists.

The ham list are rule combinations sorted by the number of ham hits that have 0 spam hits. The spam list are rule combinations sorted by the number of spam hits that have 0 ham hits.

There are some of my personal rules mixed in.

Just posting this just to see if anyone sees any value in this.

SPAM RULES:

    11648 HTML_MESSAGE RAZOR2_CF_RANGE_51_100 SUBJ_GROUP
    11308 HTML_MESSAGE RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
    11212 RAZOR2_CF_RANGE_51_100 RAZOR2_CF_RANGE_E8_51_100 SUBJ_GROUP
    10749 RAZOR2_CF_RANGE_51_100 RAZOR2_CHECK SUBJ_GROUP
    10646 RAZOR2_CF_RANGE_E8_51_100 RAZOR2_CHECK SUBJ_GROUP
     5042 DKIM_VALID MIME_HTML_ONLY MISSING_DATE
     5024 DKIM_VALID_AU MIME_HTML_ONLY MISSING_DATE
[snip..]

HAM RULES:

   132983 DKIM_SIGNED MAILTO_LINK RDNS_DYNAMIC
   132558 DKIM_VALID MAILTO_LINK RDNS_DYNAMIC
   131916 DKIM_VALID_AU MAILTO_LINK RDNS_DYNAMIC
[snip..]
    80056 HTML_MESSAGE
    78472 DKIM_SIGNED MAILTO_LINK UNPARSEABLE_RELAY
    77994 DKIM_VALID MAILTO_LINK UNPARSEABLE_RELAY
    77635 DKIM_VALID_AU MAILTO_LINK UNPARSEABLE_RELAY
    76959 HTML_MESSAGE RDNS_DYNAMIC UNPARSEABLE_RELAY
    72949 MAILTO_LINK RDNS_DYNAMIC UNPARSEABLE_RELAY
    59189 DKIM_SIGNED
    56792 DKIM_VALID
[snip..]

Marc,

Maybe I'm misunderstanding your list but it looks like you've got HTML_MESSAGE by itself in the HAM RULES (IE zero spam hits on HTML_MESSAGE) but you've also got a rule combo of HTML_MESSAGE RAZOR2_CF_RANGE_51_100 SUBJ_GROUP as the top SPAM RULES (which implies that there is SPAM that hits HTML_MESSAGE too).

Similar situation for DKIM_SIGNED & DKIM_VALID

Also how can you have 132983 hits on the combo of DKIM_SIGNED MAILTO_LINK RDNS_DYNAMIC
but only 59189 hits on DKIM_SIGNED by itself?


That's a valid observation. In the learner I'm working on I'm experimenting with and interesting forgetter that wipes out and restarts some of the keys. Part of the process of getting rid of bad data takes some good data with it and usually the good data recovers over time. This is still very experimental. I'm just applying my new filter to just the rule names coming out of SA and completely ignoring the scoring or even if it's a spam or ham rule. I just wanted to see what the result would be. To see if I can generate SA rules from my data.

So far - crude at best.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Reply via email to