Here is my scenario. I am using SA as a oracle/ground truth for a research
project. It is generally hard to get hold of a real time mail corpus, so I
opted for a service provided by mailinator. Mailinator is a company which
provides users with disposable email ID's and it offers an API to obtain
the mails of the disposable ID's. Unfortunately it provides the mail in
JSON, and SA takes the mail in RFC 2822.

I have written a script which converts JSON to RFC 2822 (though there are a
lot of specifications on the RFC 2822 , I managed to capture most of them
just so that SA has something to work with)

I have also trained SA using sa-learn on known public corpuses like enron
etc.

I use SA, to classify the converted mails from Mailinator as HAM or SPAM.

for example, if a mail is stored in the text file mail.txt I run

spamassassin mail.txt

This returns the necessary score for me to decide if it is SPAM or not.

What do you guys suggest me to do in this case? Is there a better way to do
it?




On Tue, May 31, 2016 at 1:48 AM, Reindl Harald <h.rei...@thelounge.net>
wrote:

>
>
> Am 31.05.2016 um 08:18 schrieb Shivram Krishnan:
>
>> It is not on production. I am using this to evaluate spamassassin.
>>
>
> how will you evaluate something when you slay your setup that way?
>
> On Mon, May 30, 2016 at 10:38 PM, @lbutlr <krem...@kreme.com
>> <mailto:krem...@kreme.com>> wrote:
>>
>>     On May 30, 2016, at 11:06 PM, Shivram Krishnan <rorryk...@gmail.com
>>     <mailto:rorryk...@gmail.com>> wrote:
>>     > 2) I have set a threshold of -10 to see how spamassassin assigns a
>> score for every mail.
>>
>>     No. Do not do this
>>
>
>

Reply via email to