Tom,

Let me know if you are still interested in setting up a masschecker.  That goes for anyone on this list as well.  I have worked out the sorting issue pretty well now and my ena-weekX masscheckers are now the largest contributions to the RuleQA corpus keeping the nightly rule scoring updating regularly the past year.

http://ruleqa.spamassassin.org/ (see the ena-weekX in the green box)

New/more masscheckers are always welcome and will help you learn the best way to tune your SA platform to get every last drop of accuracy from your local meta rules.  We could really use masscheckers with primary languages not English to add/improve core SA rules.

Here's my setup:

- I have an iRedmail server that I split copies of most of my email to an internal-only email domain "sa.ena.net."

- The iRedmail server has Sieve rules (easily managed by RoundCube) based on certain rule hits and scores from my main Internet edge MailScanner filtering that move them into Ham and Spam folders as unread.  Mail scoring in the middle -- not high enough for obvious Spam or low enough for obvious Ham are left in the main Inbox.

- I spend a few minutes each day visually scanning the Subjects of the unread email then mark them as Read.

- If I find a zero-hour email in the main Inbox, then I move it to a SpamCop folder.  A script that runs every 5 minutes to check the SpamCop folder, strips of some extra Received headers from my internal hops, then submits it as an attachment to my SpamCop account.

- A script moves the Maildir email to 4 other masschecker VMs to split out the load so they will be able to submit their results quickly.  Ena-week0 is the last week of ham/spam that is still on the iRedMail server.  Ena-week1-4 are running on the other 4 masschecker VMs to give a total of 5 weeks of recent corpus.  I currently have 100,939 Ham and 292,001 Spam in ena-week0-4.

- I run a local Bayesian train on the ena-week0 Ham and Spam folder to my Redis-based Bayes storage shared across my 8 MailScanner nodes and my iRedMail/amavis server.  This method has shown to keep my Bayes scores very accurate.

Hope someone finds this information helpful.

Dave


On 01/20/2017 01:02 PM, Tom Hendrikx wrote:
On 20-01-17 19:46, David Jones wrote:
From: Kevin Golding <k...@caomhin.org>
Sent: Friday, January 20, 2017 11:59 AM
To: users@spamassassin.apache.org
Subject: Re: No rule updates since 1/1/17
On Fri, 20 Jan 2017 17:26:01 -0000, Bill Keenan
<developerli...@wjkeenan.org> wrote:
What is the fix needed so /usr/bin/sa-update starts getting updates? I
too have not received an update from updates.spamassassin.org
<http://updates.spamassassin.org/> since 1-Jan-17.

Besides updates.spamassassin.org <http://updates.spamassassin.org/>,
what other rule sets are commonly used? Hundreds of spam messages are
getting through with only updates.spamassassin.org
<http://updates.spamassassin.org/> rules.
This seems like a good time to mention
https://wiki.apache.org/spamassassin/NightlyMassCheck
If more people can contribute, even just a small corpora of mail, then
updates will be published more frequently. At the moment a very small
number of people provide data, meaning there is very little margin for
error.
I would like to help with the nightly masscheck but I don't have the
resources to manually check ham and spam.  This also gets into the
grey area of how people define spam.  I also have a very good MTA
setup with RBLs and DNS checks that block most of the spam before
it reaches SA in MailScanner.  My SA only has to block a very small
percentage of my definition of spam so I am not sure how helpful
my mail filtering platform can be even though it's very accurate.

Dave

I think I can say the same about my platform, but since this issue keeps
popping up I just applied for an account just to find out if my
contribution could help. I can't speculate so I'm just gonna try if it
helps :)

Kind regards,
        Tom


--
David Jones

Reply via email to