[ 
https://issues.apache.org/jira/browse/JAMES-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017368#comment-13017368
 ] 

Robert Burrell Donkin commented on JAMES-1216:
----------------------------------------------

Emails are a semi-structured data source. This makes text mining interesting :-)

In particular, mutual information (MI) may be applicable for some classes of 
feature, and not others. In particular, MI is only (directly) useful for 
categorical features of low arity (for some value of low arity).

This suggests that we might need to experiment to see whether MI is useful on 
real email data, but be open to other approaches. So, I think that feature 
selection need to factor to allow pluggable extensibility.


> [gsoc2011] Design and implement machine learning filters and categorization 
> for mail
> ------------------------------------------------------------------------------------
>
>                 Key: JAMES-1216
>                 URL: https://issues.apache.org/jira/browse/JAMES-1216
>             Project: JAMES Server
>          Issue Type: New Feature
>            Reporter: Eric Charles
>            Assignee: Eric Charles
>              Labels: gsoc2011
>
> Context: Anti-spam functionality based on SpamAssassin is available at James 
> (base on mailets http://james.apache.org/mailet). Bayesian mailets are also 
> available, but not completely integrated/documented. Nothing is available to 
> automatically categorize mail traffic per user.
> Task: We are willing to align the existing implementation with any modern 
> anti-spam solution based on powerfull machine learning implementation (such 
> as apache mahout). We are also willing to extend the machine learning usage 
> to some mail categorization (spam vs not-spam is a first category, we can 
> extend it to any additional category we can imagine). The implementation can 
> partially occur while spooling the mails and/or when mail is stored in 
> mailbox.
> Related discussions: See also discussions on mail intelligent mining on 
> http://markmail.org/message/2bodrwvdvtfq3f2v (mahout related) and 
> http://markmail.org/thread/pksl6csyvoeo27yh (hama related).
> Mentor: eric at apache dot org & [fill in mentor]
> Complexity: high 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to