http://bugzilla.spamassassin.org/show_bug.cgi?id=2933

           Summary: Lack of stop-words should be a trigger
           Product: Spamassassin
           Version: unspecified
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: Rules
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


In addition to SpamAssassin, which my ISP recently installed, I have been
running my own spam filter for about a year now, and it has some rules that
SpamAssassin is lacking. Some of these are specific to me, but one that I've
added recently that's very effective is an analysis of the frequency of stop
words. It does two things:

After cleaning up of the message body, it computes the percentage of words that
are stop words (e.g. "a", "the", "you", "it", etc.). If that percentage is too
low (currently 5% is the threshold), it increments the spam score.

It also finds the longest string of words containing no stop words. If that is
too long (currently 20 words, which is extremely conservative), it also
increments the spam score.

I think this would be a useful feature for SpamAssassin, although it should be
enabled only for messages in the language(s) that it's implemented for.

If I find the time, I may implement it, unless someone beats me to it.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to