In general, please stop worrying about your corpus being ideal.  Our sample
size right now is so small that even non-ideal corpora would be helpful.
Get started with cron nightly masschecks then work on improving your corpus
later.

I personally include:
* The last 4 weeks of spam.  I use logrotate to automatically rotate one
week at a time so I don't have to worry about it.  I receive LOTS of spam so
this is a good quantity.  IMHO, spam older than a month is far less useful
to test spamassassin's rules.
* Last 2 years of ham.  If we had 10x as many contributors to nightly
masscheck then I might reduce this to last 1 year of ham.

Warren

Reply via email to