From: "Charles Gregory" <cgreg...@hwcn.org>
Sent: Sunday, 2009/December/20 06:20


On Sat, 19 Dec 2009, Daryl C. W. O'Shea wrote:
More unfortunately, privacy concerns prevent me from building a useful
corpus of ham. Sigh....
But otherwise such a good idea....
Can you not trust yourself to use your own ham?  You don't need to
provide us with your mail.  You can scan your own mail locally on your
own machine(s).

I run an ISP. The corpus I would so love to build is the hundreds of messages per day that all our clients receive. It's *their* privacy that
is the cocern.

Do you think that my own private collection of saved mail (perhaps 1100 ham) would really be of benefit? I'd have to start saving my spam as well....

And it would always be skewed by the fact that I SMTP reject anything caught by Zen.

I'm just a touch naive here; but, it seems to me it should be possible,
somehow, to build running spamd daemons, one with the regular rules
and one with the mass check rules. The second one is fed the email in
parallel with the first but deletes the mail once the scores are logged.

The downside is that this is not "confirmed ham" and "confirmed spam".
It is a way to safely test new rule sets, though.

I must admit that the vast majority of email I receive is not hand
checked for ham/spam. I simply read headers on several lists to see
what the current buzz is. I read threads that look interesting and
toss the rest. So it'd be hard to mass check validly with that as a
corpus. (Besides, I suspect animal husbandry companies would hardly
be interested in passing things that look like typical LKML mailings,
would they?)

I wonder how much companies would pay for a part time SpamAssassin
honcho who can be trusted (bonded?) and can write SARE-ish rules
tailored to the company's email. Is there a job opportunity for
somebody here? (And, yes, I do suspect the burnout time would be
rather short.)

{^_^}

Reply via email to