On 12/20/2009 09:20 AM, Charles Gregory wrote:
On Sat, 19 Dec 2009, Daryl C. W. O'Shea wrote:
More unfortunately, privacy concerns prevent me from building a useful
corpus of ham. Sigh....
But otherwise such a good idea....
Can you not trust yourself to use your own ham? You don't need to
provide us with your mail. You can scan your own mail locally on your
own machine(s).

I run an ISP. The corpus I would so love to build is the hundreds of
messages per day that all our clients receive. It's *their* privacy that
is the cocern.

Right, they would need to opt-in and the manual sorting requirements are a bit too difficult and time consuming for all but the most dedicated to this cause.


Do you think that my own private collection of saved mail (perhaps 1100
ham) would really be of benefit? I'd have to start saving my spam as
well....

A Ham-only corpus is still useful, as long as it contains mail from a variety of sources. (Mailing lists are not very useful.)


And it would always be skewed by the fact that I SMTP reject anything
caught by Zen.


Not a problem.

Warren

Reply via email to