-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Murty -- It should be easy enough to write a plugin which - - registers an eval rule function - - calls $permsgstatus->get_decoded_stripped_body_text_array() in that, to get the array of decoded lines in the message (HTML stripped, MIME decoded etc.) - - splits those line strings into words and analyzes them. The results would be interesting, I think. - --j. Murty Rompalli writes: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hash: SHA1 > > Hi > Any ideas on how to implement this are appreaciated: > > Frequency Analysis of English Vocabulary and Grammar: Based on > the LOB Corpus by Stig Johansson and Knut Hofland (OUP, 1989, ISBN > 0-19-8242212-2) gives the top eighteen words and their frequencies > as: > > 1. the 68315 > 2. of 35716 > 3. and 27856 > 4. to 26760 > 5. a 22744 > 6. in 21108 > 7. that 11188 > 8. is 10978 > 9. was 10499 > 10. it 10010 > 11. for 9299 > 12. he 8776 > 13. as 7337 > 14. with 7197 > 15. be 7186 > 16. on 7027 > 17. I 6696 > 18. his 6266 > > If the body contains http: ftp: or https: link, I want to test it further; > otherwise, skip this test. The test is as follows: > > Check each paragraph that does not contain any of the above 18 words > (paragraphs seperated by \n). > > 1. For each para without common English words, assign a score. > 2. For each para containing words with 0-9, ', " (anywhere), : and ~ > (middle), assign score based on number of matches > > Thanks > Murty Rompalli > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.3 (GNU/Linux) > > iD8DBQFB1h6bqbgVhXQ+7mURAtafAKC++FtF6OZIkHC2hVD90509VTgFVwCfZPSw > wVqnkz5XYQOG8ZBGa8Pvow4= > =oON4 > -----END PGP SIGNATURE----- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFB2YxqMJF5cimLx9ARAhBSAKCzzwH4F3DfgIn+P0loaQoUmn6BswCfeHH8 zunJqn4IuEwD6jp1WKHH8jQ= =MLTc -----END PGP SIGNATURE-----