On Mon, 2009-12-07 at 08:55 -0800, Marc Perkel wrote:
>
> Except for very short messages I would think that if you spell checked
> the message in several languages and found that 80% was spelled
> correctly that you have a match. You wouldn't have to check every
> language, just start with some common ones and if you don't match them
> go to less common ones. 
> 
It might work better if you inverted the test: if the textual content
appears to be badly misspelled in all the languages you accept then its
spam.

This should be fairly easy to do: configure SA with the language(s) you
will accept and the ratio of misspellings to total words that you'll
accept as meaning 'unwanted language' after numbers and HTML tags have
been excluded from the check. Apply the test to the whole body of a
non-MIME message or to all MIME parts with type="text/*".


Martin

Reply via email to