perl -p -i -e 's/__/T_/g' /usr/share/spamassassin/updates_spamassassin_org/*
This converts the rules. I'm doing something very interesting. It's
going to take a few days to see if it works.
I'm applying the same techniques of my evolution filter to the SA rule
names.
I extract the names and then run them into a program that create all
combinations up to 4 levels and learn those combos as either spam or ham.
Then after building a ham and spam corpus sets I take the test message -
create set of rule combinations and then do set campares against the to
ham and spam sets.
What I'm looking for is combos matching ham and NOT matching spam - or -
combinations matching spam and NOT matching ham.
In theory I should be able to create thousands of combination rules for
both ham and spam that all have a very high probably of being accurate.
It's just an interesting experiment to see how well it works.
Right now I have 151728 ham combination, 113632 spam combinations. Of
those only 22933 are in both sets. It's only been learning for one day.
I want to see where it is after a week.
Buy changing the rules from __ to T_ I exposed a lot more rule names.
The way this works is that I don't need to know what rules are ham rules
or spam rules in advance. And I don't need to score them. The filter
figures it all out on it's own. So the rule names are just information.
I think this trick will make SA far more accurate. We'll see. I want to
give it till at least Friday for the system to learn. I'm also storing
hit counts so that I could pick out maybe the best 1000 rules and
publish them.
Anyhow - that's what I'm up to and so far results are good. But because
it's early in the learning cycle most message are not yet producing
significant scores. The ones that are producing scores are making the
right call however.
On 02/02/16 20:19, Dave Funk wrote:
You can do that but it requires editing all your rule files, altho
then you see those matches in all your reports.
If you just want to test one particular message, just use the -D
option to spamassassin and grep for ' got hit: '
Mar 11 21:51:44.203 [5074] dbg: rules: ran header rule __MIME_VERSION
======> got hit: "<YES>"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule
__TO_HEADER_EXISTS ======> got hit: "<"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __TOCC_EXISTS
======> got hit: "<YES>"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_UPS2
======> got hit: "negative match"
Mar 11 21:51:44.204 [5074] dbg: rules: ran header rule __KAM_JURY3
======> got hit: "negative match"
Mar 11 21:51:44.205 [5074] dbg: rules: ran header rule __HAS_FROM
======> got hit: "<YES>"
(Yes, Marc, you probably already know this, this is for the other
people who might be following this thread ;)
On Tue, 2 Feb 2016, Marc Perkel wrote:
Never mind ....
I found that if I change __ to T_ that it does what I want.
On 02/02/16 18:05, Marc Perkel wrote:
On 02/02/16 17:55, Marc Perkel wrote:
Normally SA creates a header that has a list of the names of rules
that matched. It skips the listing of hidden rules that start with
__ .
Is there a command where I can easily tell SA to include the hidden
rules in the report in the headers so I can see all of it?
I'm also - I suppose asking it to list rules that match that produce
no scores.
body __LATE_RICH_RELATIVE /\blate
.{0,15}(?:father|wife|widow|husband|general|president|daughter|son|minister|client)/i
body __CT_CLICK /\b(click(ing)?
(here|now|this|on|below|.{0,9}(hyper)?link))|visit(ing)?this link\b/i
body __BENEFICIARY /\bbeneficiary\b/i
body __CT_BEGGER /\b(kind assist[ae]nce|feed my
family|need (of )?your help|donat(e|ion))\b/i
body __CT_CONTACT /\b((contact(?:ing) you|contact
(information|me|email|number|us)|your contact))|to (inform|email) you/i
body __CT_REPLY_TO_ME /\b(reply to me|please reply|my
email address|private email|contact me|prompt response|reply from
you|hearing from you|assist me)/i
body __CT_DYING /\b(diagnosed with|months to
live|dying of|transplant)\b/i
body __CT_UNITED_NATIONS /\bUnited Nations?\b/i
meta __CT_STRANGER CT_MY_NAME_IS || CT_DEAR_FRIEND
|| CT_DEAR_SOMETHING || CT_SIR_MADAM || CT_INTRODUCE
meta __CT_MONEY CT_TRANSFER_MONEY ||
CT_THE_SUM_OF || CT_EARN_MONEY || LOTS_OF_MONEY || MILLION_USD ||
FUZZY_MILLION || GIVE_YOU_MONEY || __CT_BANK || BILLION_DOLLARS ||
US_DOLLARS_2 || ADVA$
meta __CT_VICTIM __BENEFICIARY ||
CT_LATE_PRESIDENT || CT_LATE_RICH_RELATIVE || __CT_DYING
meta __CT_FORM FILL_THIS_FORM ||
FILL_THIS_FORM_LONG || T_FILL_THIS_FORM_SHORT
meta __CT_CONFIDENTIAL CT_PRIVATE_EMAIL ||
CT_PRIVATE_PHONE || CONFIDENTIAL_SCAM1 || CONFIDENTIAL_SCAM2
meta __CT_NOW CT_ACT_NOW || CT_DO_IT_TODAY ||
CT_URGENT_RESPOND
meta CT_GOD_BENEFICIARY __CT_GOD && __CT_VICTIM
describe CT_GOD_BENEFICIARY God and Beneficiary
score CT_GOD_BENEFICIARY 4
meta CT_GOD_BEGGER __CT_GOD && __CT_BEGGER
describe CT_GOD_BEGGER Begging in Religious Language
score CT_GOD_BEGGER 3
--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400