Thank Eric,
In the motivation part, I mention this ml approach will improve the
accuracy for the filtering and efficiency with the predictive model
building process.

I already add the data preparation process in the proposal.
Vicki

On Fri, Apr 8, 2011 at 10:47 AM, Eric Charles (JIRA)
<server-dev@james.apache.org> wrote:
>
>    [ 
> https://issues.apache.org/jira/browse/JAMES-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017477#comment-13017477
>  ]
>
> Eric Charles commented on JAMES-1216:
> -------------------------------------
>
> Vicki,
> From your mail
> "It still need to get the training dataset from manually judge data 
> first.because this machine learning algorithm still need to learn what kind
> of email is spam, do the feature analysis and build the predictive model.
> The new approach can share the spam/non spam training dataset with naive 
> Bayesian.
> "
> I would make that point clear, in your application (it may be obvious to your 
> or to people used to such matter, but better said than not)
>
> Also please make clear directly in the preamble the added-value of your new 
> solution compared the existing implementation (eg: better identification of 
> spam?, shorter learning period?, less false-positive?, open to other 
> categorization? ...)..
>
>
>> [gsoc2011] Design and implement machine learning filters and categorization 
>> for mail
>> ------------------------------------------------------------------------------------
>>
>>                 Key: JAMES-1216
>>                 URL: https://issues.apache.org/jira/browse/JAMES-1216
>>             Project: JAMES Server
>>          Issue Type: New Feature
>>            Reporter: Eric Charles
>>            Assignee: Eric Charles
>>              Labels: gsoc2011
>>
>> Context: Anti-spam functionality based on SpamAssassin is available at James 
>> (base on mailets http://james.apache.org/mailet). Bayesian mailets are also 
>> available, but not completely integrated/documented. Nothing is available to 
>> automatically categorize mail traffic per user.
>> Task: We are willing to align the existing implementation with any modern 
>> anti-spam solution based on powerfull machine learning implementation (such 
>> as apache mahout). We are also willing to extend the machine learning usage 
>> to some mail categorization (spam vs not-spam is a first category, we can 
>> extend it to any additional category we can imagine). The implementation can 
>> partially occur while spooling the mails and/or when mail is stored in 
>> mailbox.
>> Related discussions: See also discussions on mail intelligent mining on 
>> http://markmail.org/message/2bodrwvdvtfq3f2v (mahout related) and 
>> http://markmail.org/thread/pksl6csyvoeo27yh (hama related).
>> Mentor: eric at apache dot org & [fill in mentor]
>> Complexity: high
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
> For additional commands, e-mail: server-dev-h...@james.apache.org
>
>



-- 
Yu Fu
yu...@umbc.edu
443-388-6654

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to