Hi, I sent one mail one week ago asking for collaboration in order to do some research on distributions of ham and spam mail among users on the Web. In order to increase the probability that people trust on what I am saying, I uploaded a page on the web server of my research institution. The page is available at http://www.l3s.de/~olmedilla/projects/mail/mail.html
There I explain what is the content of an auto-whitelist and how it can be extracted. In addition, I show an example of the information I would be interested in. Please, read the page if you would like to help me. So far I received only 5 e-mails from people interested in helping me. I would need much more information to be able to get some conclusions. I would like to ask again if anyone else would like to help me. The script takes maximum 5 to 10 seconds to run. As I said in one of my previous e-mails, you can check my publications at http://www.l3s.de/~olmedilla/pub/publications.html and check my public key (this message is signed) at http://www.l3s.de/~olmedilla/contact/contact.html in order to see that I am who I am saying I am :-). For each person that has sent to me his/her data I have sent to him a small report with some information on the form: Total number of e-mail addresses: 21176 Total number of e-mails: 126719 Classification of e-mail addresses: Ham: 59.1802% Spam: 32.1165% Unknown: 8.70325% Classification of e-mails: Ham: 89.3662% Spam: 8.89212% Unknown: 1.74165% Frequencies of number of e-mails sent by an e-mail address: Ham: 1 mail sent: 36.6901% 2 mails sent: 16.4698% 3 mails sent: 9.36802% >3 mails sent: 37.4721% Spam: 1 mail sent: 95.1037% 2 mails sent: 3.47008% 3 mails sent: 0.558741% >3 mails sent: 0.867519% Unknown: 1 mail sent: 91.6441% 2 mails sent: 5.26316% 3 mails sent: 1.35648% >3 mails sent: 1.7363% and I will continue doing it. Hope you would kindly help me. Best regards, DOC On Friday 15 October 2004 17:09, Daniel Olmedilla wrote: > Dear all, > > I am a Ph.D. student that works in Hanover (Germany). I am currently > studying the distribution of spam mails and e-mail addresses. For that I am > gathering some information from e-mail distributions of institutions and > also individuals. I developed a script that gathers the information from > auto-whitelists and, because of privacy issues, hashes the e-mail > addresses. Therefore I just get the distribution of the auto-whitelist but > there is not problem with privacy as the e-mails are not available. > > The information I get is how many e-mails were sent and which is the > average score. With this information I get the distribution of for example > how much spam that institution/university receives. > > I would like to ask you if it would be possible that you provide me with > that information from your institution and/or individual machine. I attach > the script which takes only some seconds. It is in perl so it is really > easy to see what it does so you could check that it does exactly what I say > and no security would be compromised. It will create too files: one with > the e-mail addresses in plain text and another one with them hashed. I am > interested in the hashed one. > > Please, don't send your answer to the mailing list because of the > attachments. I promise to keep you informed of the results I gather from > that > information and post here the statistics as e.g. average percentage of spam > received so you would all know about it. > > I would like to thank you all for your attention and hopefully your help > too. > > Best regards, -- Daniel Olmedilla Learning Lab Lower Saxony (L3S) Deutscher Pavillon Expo plaza 1 D - 30539 Hannover Phone: +49 (0)511 762.9741 / +49 (0)511 7621.9714 Fax: +49 (0)511 762.9779 / +49 (0)511-7621.9712 http://www.l3s.de/~olmedilla/ E-Mail: [EMAIL PROTECTED]
pgpI1WJoY314W.pgp
Description: PGP signature