Robert Menschel <[EMAIL PROTECTED]> schrieb am 17.01.2006 03:41:39:

> sad> I'm currently trying to build up a new bayes DB here, ...
> sad> ... yet it poses a problem to build up the ham part.
> sad> ... Much of the inbound smtp mail either contains private or
> sad> confidential information, so I cannot use them as I keep the
> sad> source of the bayes messages in a Notes DB serverside - I'd run
> sad> into privacy issues.
> 
> If you keep the source of your bayes messages in a Notes DB, then you
> should have had enough ham to retrain your bayes with, no?

Uhm, no? If you reread my message, you see that I have used
autolearning before instead of manually training. I just ditched
the old bayes DB and disabled autolearning, now building up
a new bayes DB.

I'm keeping the full corpus of both ham and spam to have more
control over the bayes DB. Keeping the the sources of it enables
me to always reproduce the DB and especially to remove selected
messages containing tokens that prove to be problematic in the
future. Of course I could do that with relearning wrongly tagged
messages as ham - but 1 message as ham usually doesn't make much
of a difference for bayes.

> Bigger problem: bayes can only learn what it's taught.  If you have
> ham that really should be trained, and because of privacy issues it
> should not be kept after training, then you really should develop a
> system which will enable you to train without retaining.  Bayes works
> best when properly and fully trained, not just trained on "those
> unimportant non-private emails are ham".

Yes, I might forfeit the storage of ham mails in a Notes DB for that,
BUT... I really doubt that the management would even give permission to
send those messages into SA.
When I say "confidential" it is really one of those few times where
it means "confidential" ;) Our customers are mostly big banks, big
insurance companies and the German government. Even the slightest
risk of leaking _any_ kind of information could
get us into problems noone even wants to imagine here...

> I can't make recommendations on how to do so in your system, but
> you'll get better results from bayes if you figure out how to manage
> it.

That's natural. I just wanted to know how bad it will come at me ;)

regards
        sash

--------------------------------------------------
Sascha Runschke
Netzwerk Administration
IT-Services

ABIT AG
Robert-Bosch-Str. 1
40668 Meerbusch

Tel.:+49 (0) 2150.9153.226
Mobil:+49 (0) 173.5419665
mailto:[EMAIL PROTECTED]

http://www.abit.net
http://www.abit-epos.net
---------------------------------
Sicherheitshinweis zur E-Mail Kommunikation /
  Security note regarding email communication:
http://www.abit.net/sicherheitshinweis.html

Reply via email to