On Mon, 2009-12-07 at 08:55 -0800, Marc Perkel wrote: > > Except for very short messages I would think that if you spell checked > the message in several languages and found that 80% was spelled > correctly that you have a match. You wouldn't have to check every > language, just start with some common ones and if you don't match them > go to less common ones. > It might work better if you inverted the test: if the textual content appears to be badly misspelled in all the languages you accept then its spam.
This should be fairly easy to do: configure SA with the language(s) you will accept and the ratio of misspellings to total words that you'll accept as meaning 'unwanted language' after numbers and HTML tags have been excluded from the check. Apply the test to the whole body of a non-MIME message or to all MIME parts with type="text/*". Martin