Hi,

I am a developer on a fairly large community site (30-50,000 active users) with blogs, photo albums and forums.

I spent yesterday tinkering with a spam prevension system which runs each new comment to a blog post or image in a photo album through SpamAssassin. I take the provided comment, and assemble a RFC822-compliant message based on the users IP address and sender and reciever's registered email addresses, and then run it through Mail::SpamAssassin (the Perl module) with default settings.

This seems to work. At least it intercepts the test-message provided in the SpamAssassin documentation.

This system requires me to have a utility where people can mark spam as ham in the case of SpamAssassin wrongly identifying a valid comment as spam. I was planning of having this utility teach the Bayesian filter on a community-wide basis, i.e. for all users. Therefore, people cannot mark their own messages as ham. This to guard against spammers teaching the filter wrongly.

 - Is learning a good idea at all in this setting?
- If so, what are the advantages and more importantly disadvantages of having community-wide learning?
   - Should I use autolearning?
- Is there anything else I should be aware of when implementing SpamAssassin in this setting?
   - Settings
   - Thresholds
   - &c?


After testing this a bit on comments, I hope to expand to blog posts and forum posts as well, so that moderators gets a heads-up when people post spam.

--
Ole Kasper Olsen
Information Systems Developer
Opera Software ASA

Reply via email to