
I sent one mail one week ago asking for collaboration in order to do some 
research on distributions of ham and spam mail among users on the Web. In 
order to increase the probability that people trust on what I am saying, I 
uploaded a page on the web server of my research institution. The page is 
available at http://www.l3s.de/~olmedilla/projects/mail/mail.html

There I explain what is the content of an auto-whitelist and how it can be 
extracted. In addition, I show an example of the information I would be 
interested in. Please, read the page if you would like to help me.

So far I received only 5 e-mails from people interested in helping me. I would 
need much more information to be able to get some conclusions.

I would like to ask again if anyone else would like to help me. The script 
takes maximum 5 to 10 seconds to run.  As I said in one of my previous 
e-mails, you can check my publications at 
and check my public key (this message is signed) at
in order to see that I am who I am saying I am :-).

For each person that has sent to me his/her data I have sent to him a small 
report with some information on the form:

Total number of e-mail addresses: 21176
Total number of e-mails: 126719
Classification of e-mail addresses:
        Ham: 59.1802%
        Spam: 32.1165%
        Unknown: 8.70325%
Classification of e-mails:
        Ham: 89.3662%
        Spam: 8.89212%
        Unknown: 1.74165%
Frequencies of number of e-mails sent by an e-mail address:
                1 mail sent:   36.6901%
                2 mails sent:  16.4698%
                3 mails sent:  9.36802%
                >3 mails sent: 37.4721%
                1 mail sent:   95.1037%
                2 mails sent:  3.47008%
                3 mails sent:  0.558741%
                >3 mails sent: 0.867519%
                1 mail sent:   91.6441%
                2 mails sent:  5.26316%
                3 mails sent:  1.35648%
                >3 mails sent: 1.7363%

and I will continue doing it.

Hope you would kindly help me.

Best regards,


On Friday 15 October 2004 17:09, Daniel Olmedilla wrote:
> Dear all,
> I am a Ph.D. student that works in Hanover (Germany). I am currently
> studying the distribution of spam mails and e-mail addresses. For that I am
> gathering some information from e-mail distributions of institutions and
> also individuals. I developed a script that gathers the information from
> auto-whitelists and, because of privacy issues, hashes the e-mail
> addresses. Therefore I just get the distribution of the auto-whitelist but
> there is not problem with privacy as the e-mails are not available.
> The information I get is how many e-mails were sent and which is the
> average score. With this information I get the distribution of for example
> how much spam that institution/university receives.
> I would like to ask you if it would be possible that you provide me with
> that information from your institution and/or individual machine. I attach
> the script which takes only some seconds. It is in perl so it is really
> easy to see what it does so you could check that it does exactly what I say
> and no security would be compromised. It will create too files: one with
> the e-mail addresses in plain text and another one with them hashed. I am
> interested in the hashed one.
> Please, don't send your answer to the mailing list because of the
> attachments. I promise to keep you informed of the results I gather from
> that
> information and post here the statistics as e.g. average percentage of spam
> received so you would all know about it.
> I would like to thank you all for your attention and hopefully your help
> too.
> Best regards,

     Daniel Olmedilla
     Learning Lab Lower Saxony (L3S)
     Deutscher Pavillon
     Expo plaza 1
     D - 30539 Hannover

     Phone: +49 (0)511 762.9741 / +49 (0)511 7621.9714
     Fax:     +49 (0)511 762.9779 / +49 (0)511-7621.9712


Attachment: pgpI1WJoY314W.pgp
Description: PGP signature

Reply via email to