Mark A. DeMichele <[EMAIL PROTECTED]> wrote: > This may be slightly off-topic but I think I have a related > question. > > If spammers start putting a bunch of "good" words at the end > of the spam, which some of them seem to be doing, then when > you "learn" them, won't that screw things up a bit and defeat > the whole process?
There are also other 'tidbits' in those messages that are useful indicators though. > In this case the rules based checks would be still work, but > the Bayes checks my offset them. I am not a expert but... if the spammers are using *truly random* words, there should still be a large number of words that are NOT normally present in ham. And although a random assortment might contain some "good" words, statistically they won't be significant so -- if I've got it right -- won't break things at all. So I don't think a random smattering of non-spam words will have much impact. > Please tell me if I'm misunderstanding this. Any enlightenment welcome here too! FWIW: I've been feeding random-word spams to bayes, and it is still working well for me (admittedly in a non-heavy production use setting). - Bob
