-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Murty --

It should be easy enough to write a plugin which

- - registers an eval rule function
- - calls $permsgstatus->get_decoded_stripped_body_text_array() in that, to
  get the array of decoded lines in the message (HTML stripped, MIME
  decoded etc.)
- - splits those line strings into words and analyzes them.

The results would be interesting, I think.

- --j.

Murty Rompalli writes:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hash: SHA1
> 
> Hi
> Any ideas on how to implement this are appreaciated:
> 
> Frequency Analysis of English Vocabulary and Grammar: Based on
> the LOB Corpus by Stig Johansson and Knut Hofland (OUP, 1989, ISBN
> 0-19-8242212-2) gives the top eighteen words and their frequencies
> as:
> 
>       1.  the       68315
>       2.  of        35716
>       3.  and       27856
>       4.  to        26760
>       5.  a         22744
>       6.  in        21108
>       7.  that      11188
>       8.  is        10978
>       9.  was       10499
>      10.  it        10010
>      11.  for        9299
>      12.  he         8776
>      13.  as         7337
>      14.  with       7197
>      15.  be         7186
>      16.  on         7027
>      17.  I          6696
>      18.  his        6266
> 
> If the body contains http: ftp: or https: link, I want to test it further;
> otherwise, skip this test. The test is as follows:
> 
> Check each paragraph that does not contain any of the above 18 words
> (paragraphs seperated by \n).
> 
> 1. For each para without common English words, assign a score.
> 2. For each para containing words with 0-9, ', " (anywhere), : and ~
> (middle), assign score based on number of matches
> 
> Thanks
> Murty Rompalli
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.3 (GNU/Linux)
> 
> iD8DBQFB1h6bqbgVhXQ+7mURAtafAKC++FtF6OZIkHC2hVD90509VTgFVwCfZPSw
> wVqnkz5XYQOG8ZBGa8Pvow4=
> =oON4
> -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFB2YxqMJF5cimLx9ARAhBSAKCzzwH4F3DfgIn+P0loaQoUmn6BswCfeHH8
zunJqn4IuEwD6jp1WKHH8jQ=
=MLTc
-----END PGP SIGNATURE-----

Reply via email to