Here is my scenario. I am using SA as a oracle/ground truth for a research project. It is generally hard to get hold of a real time mail corpus, so I opted for a service provided by mailinator. Mailinator is a company which provides users with disposable email ID's and it offers an API to obtain the mails of the disposable ID's. Unfortunately it provides the mail in JSON, and SA takes the mail in RFC 2822.
I have written a script which converts JSON to RFC 2822 (though there are a lot of specifications on the RFC 2822 , I managed to capture most of them just so that SA has something to work with) I have also trained SA using sa-learn on known public corpuses like enron etc. I use SA, to classify the converted mails from Mailinator as HAM or SPAM. for example, if a mail is stored in the text file mail.txt I run spamassassin mail.txt This returns the necessary score for me to decide if it is SPAM or not. What do you guys suggest me to do in this case? Is there a better way to do it? On Tue, May 31, 2016 at 1:48 AM, Reindl Harald <h.rei...@thelounge.net> wrote: > > > Am 31.05.2016 um 08:18 schrieb Shivram Krishnan: > >> It is not on production. I am using this to evaluate spamassassin. >> > > how will you evaluate something when you slay your setup that way? > > On Mon, May 30, 2016 at 10:38 PM, @lbutlr <krem...@kreme.com >> <mailto:krem...@kreme.com>> wrote: >> >> On May 30, 2016, at 11:06 PM, Shivram Krishnan <rorryk...@gmail.com >> <mailto:rorryk...@gmail.com>> wrote: >> > 2) I have set a threshold of -10 to see how spamassassin assigns a >> score for every mail. >> >> No. Do not do this >> > >