http://bugzilla.spamassassin.org/show_bug.cgi?id=2933
Summary: Lack of stop-words should be a trigger
Product: Spamassassin
Version: unspecified
Platform: Other
OS/Version: other
Status: NEW
Severity: enhancement
Priority: P5
Component: Rules
AssignedTo: [EMAIL PROTECTED]
ReportedBy: [EMAIL PROTECTED]
In addition to SpamAssassin, which my ISP recently installed, I have been
running my own spam filter for about a year now, and it has some rules that
SpamAssassin is lacking. Some of these are specific to me, but one that I've
added recently that's very effective is an analysis of the frequency of stop
words. It does two things:
After cleaning up of the message body, it computes the percentage of words that
are stop words (e.g. "a", "the", "you", "it", etc.). If that percentage is too
low (currently 5% is the threshold), it increments the spam score.
It also finds the longest string of words containing no stop words. If that is
too long (currently 20 words, which is extremely conservative), it also
increments the spam score.
I think this would be a useful feature for SpamAssassin, although it should be
enabled only for messages in the language(s) that it's implemented for.
If I find the time, I may implement it, unless someone beats me to it.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.