Re: mass-check submissions Re: My attempt at re-calculating test scores

2010-12-25 Thread Warren Togami Jr.
I thought a bit more about the --reuse problem. While there are pros and cons to reuse, I guess there is more benefit to --reuse than without. So I now recommend it in all cases of masscheck. On Fri, Dec 24, 2010 at 1:58 PM, Warren Togami Jr. wtog...@gmail.comwrote: This does remind me

Re: mass-check submissions Re: My attempt at re-calculating test scores

2010-12-25 Thread John Hardin
On Fri, 24 Dec 2010, Warren Togami Jr. wrote: Also current is referring to the nightly masscheck snapshot of svn trunk including the latest rules. Sorry, I realize now that was unclear. What does current in current emails mean? What time window? Since the last masscheck? A week? Six months?

Re: mass-check submissions Re: My attempt at re-calculating test scores

2010-12-25 Thread Darxus
On 12/25, John Hardin wrote: Sorry, I realize now that was unclear. What does current in current emails mean? What time window? Since the last masscheck? A week? Six months? Since the last mass check of that type (network / nightly), yes. And how do you ensure a sufficiently large corpora

Re: mass-check submissions Re: My attempt at re-calculating test scores

2010-12-25 Thread Warren Togami Jr.
In general, please stop worrying about your corpus being ideal. Our sample size right now is so small that even non-ideal corpora would be helpful. Get started with cron nightly masschecks then work on improving your corpus later. I personally include: * The last 4 weeks of spam. I use

mass-check submissions Re: My attempt at re-calculating test scores

2010-12-24 Thread Darxus
I am one of the editors of the dnswl.org database, and while it is tempting to participate in the mass-checks, considering the effects that would have on the dnswl tests or not, I think it's better to not have that skew. I like having the QA test results to independently evaluate dnswl. I wonder

Re: mass-check submissions Re: My attempt at re-calculating test scores

2010-12-24 Thread John Hardin
On Fri, 24 Dec 2010, dar...@chaosreigns.com wrote: And it still disturbs me that mass checks use anything but the test results at the time the email is originally scored (like from the tests value of the X-Spam-Status header). Since I'm sure the time variance improves the accuracy of things

Re: mass-check submissions Re: My attempt at re-calculating test scores

2010-12-24 Thread Darxus
On 12/24, John Hardin wrote: If there was some way to capture the score of RBL tests separately from non-RBL tests and use them in place of the current RBL results I might agree you have a point; but if the mass checks ignore the scores that the current ruleset generates against historical

Re: mass-check submissions Re: My attempt at re-calculating test scores

2010-12-24 Thread Warren Togami Jr.
http://www.mail-archive.com/users@spamassassin.apache.org/msg69546.html Whitelists have almost zero impact on spamassassin's determination of ham vs spam. Believe me. This is not harmful. If you have any ham corpus it would be extremely useful to spamassassin. We have a severe lack of variety

Re: mass-check submissions Re: My attempt at re-calculating test scores

2010-12-24 Thread John Hardin
On Fri, 24 Dec 2010, dar...@chaosreigns.com wrote: On 12/24, John Hardin wrote: If there was some way to capture the score of RBL tests separately from non-RBL tests and use them in place of the current RBL results I might agree you have a point; but if the mass checks ignore the scores that

Re: mass-check submissions Re: My attempt at re-calculating test scores

2010-12-24 Thread Warren Togami Jr.
I think what he is failing to understand is the scores are irrelevant, as the masscheck is only determining yes or no for each rule across a corpus. Also current is referring to the nightly masscheck snapshot of svn trunk including the latest rules. This does remind me however that there is a