SpamBayes doesn't follow links (see http://spambayes.sourceforge.net/faq.html#will-show-spam-clues-notify-a-spammer-that-i-opened-their-message for a tangentially related discussion), but it does process message headers. Lots of good information in there that you might think came from a Web site.
Unless you're willing to dive into the the code and the math, I'd caution against trying to second-guess SpamBayes. You're going to want it to behave rationally, and it doesn't (at least at the level you're looking at); it behaves statistically. That's why the FAQ (http://spambayes.svn.sourceforge.net/viewvc/spambayes/trunk/spambayes/Outlook2000/docs/troubleshooting.html#Messages_have_incorrect_or_unexpected) suggests sending all the Spam clues to the list when trying to understand why a given message isn't classified as expected. -----Original Message----- From: [email protected] on behalf of Ocean Sent: Thu 2/4/2010 8:58 AM To: [email protected] Subject: [Spambayes] Problems with classifying as spam In addition to the startup problems, Spambayes is having problems marking messages as spam. As an example, I received this email: ------------------------------ Subject: ***Discount_Viagra_VXPL_Percocet*_Adderall**** Body: <URL Link>***Discount_Viagra_VXPL_Percocet*_Adderall****! <Links to:> http://kashertqdum17.com/ ------------------------------ That's it. The only text in the body of the message is that URL link. There are two issues I see showing up: 1. The subject and link text isn't being parsed properly. Nowhere in the spam clues are the words "viagra", "percocet", or "adderall" showing up. The spam token involving the subject is "'subject:****'" So, not only is SpamBayes not treating the underscores as word seperators, but it's not even getting to the words, because it looks like it's getting choked up on the asterisks. 2. I've got a *lot* of tokens showing up in the Spam Clues that are nowhere in the email itself. I'm guessing that Spambayes is actually going to that link and processing what's on the page, but if so, that's a big problem. First of all, it gives the spammers more flexibility in trying to bypass spambayes. And second, if it's following links, then it's confirming to the spammers that my email address is valid. That's a huge no-no. Spambayes should not be following links at all, but should only look in the message itself. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
_______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
