J Anjen <scruple532 at ...> writes: > The signup process for the devl and tech mailing lists > have been screwed up lately so I'm not sure if this > will make it or not. > > Come to #Frost on TOR if you want to coordinate. > > Here's a copy of the script I've been working on, it's > not done yet and it doesn't have Bayesian although > Reverned might work if I can get more documentation. --snip--
Thanks for making the effort, but please realise that bayesian or any other kind of body content filtering can never work against completely freeform spam (at least on its own) so I wouldn't waste too much time on this approach if I were you. It does work on email spam, because there has to be some structure for it to serve its purpose : there's usually a site URL, the headers are virtually always forged in detectable ways, if there's an originating IP you can see if it's in cable modem zombie land, if it's phishing it's generally disguised in a few common ways, there's often symbols or unusual letter patterns to get past naive string filters, viruses/worms generally use a few static attachment names or they're all the same size, uncommon words are often randomly appended as hash busters and so on. The only purpose of Frost spam on the other hand is to annoy you, so it can be absolutely anything. So far we've had racist jokes, blank posts, random single symbols and one-liners from some story or other. But it could be anything else. How do you propose to filter text from random blog searches or other people's valid posts being reposted, for example? How about markov chains? You can't, so don't bother trying. The *only* viable approach to the spam problem I have seen is a full web of trust system. When identities must behave for a significant time to be read by any number of people and become ignored across whole trust webs as soon as they misbehave, trusted identity effectively becomes expensive and thus spamming becomes a lot more effort and much less effective. Bob
