[jira] [Commented] (MAILBOX-44) [gsoc2011] Design and implement a distributed mailbox using Hadoop

Eric Charles (JIRA) Wed, 27 Apr 2011 08:44:43 -0700

    [ 
https://issues.apache.org/jira/browse/MAILBOX-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025848#comment-13025848
 ]


Eric Charles commented on MAILBOX-44:
-------------------------------------

So you build and run james, you've got hadoop setup. Cool!
The next step would be to create a maven project, declare the needed 
dependencies to james mailbox and hadoop libraries, and make a few attemps:
1. Access the mailbox (create session,...) from a java test case (see [1] and 
[2] for inspiration, I will try to commit more focused examples tomorrow)
2. Access a hadoop cluster based on Mini(MR)Cluster : these are the classes 
hadoop uses for testing without having to deploy a real cluster.

Also have a look at gora documentation. This will be useful when we will have 
to decide on how to access the hdfs files,... and don't forget to subscribe to 
hadoop and gora mailing lists.

[1] 
https://svn.apache.org/repos/asf/james/mailbox/trunk/jpa/src/test/java/org/apache/james/mailbox/jpa/JPAMailboxManagerTest.java
[2] 
https://svn.apache.org/repos/asf/james/server/trunk/container-spring/src/main/java/org/apache/james/container/spring/tool/James23Importer.java


> [gsoc2011] Design and implement a distributed mailbox using Hadoop
> ------------------------------------------------------------------
>
>                 Key: MAILBOX-44
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-44
>             Project: James Mailbox
>          Issue Type: New Feature
>            Reporter: Eric Charles
>            Assignee: Norman Maurer
>              Labels: gsoc2011
>
> Context: The mailbox subproject (http://james.apache.org/mailbox/) supports 
> maildir, SQL database (via JPA) and Java Content Repository (JCR) as 
> technology for mail storage. This flexibility is achieved thanks to a API 
> design that abstracts mail storage from the mail protocols.
> Task: We need to implement mailbox storage as a distributed system on top of 
> Hadoop HDFS. The James mailbox API will be used. A first step is to design 
> how to interact with Hadoop (native api, gora incubator at apache,...) and 
> deal with specific performance questions related to mail loading/parsing in a 
> distributed system (use map/reduce or not, use existing local lucene indexes 
> for search,...). The second step is to implement the HDFS mailbox (maildir 
> mailbox is similar because is stores mails as a file and can be an 
> inspiration). A single James server will still be deployed because we don't 
> have any distributed UID generation.
> Mentor: eric at apache dot org
> Complexity: medium 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

[jira] [Commented] (MAILBOX-44) [gsoc2011] Design and implement a distributed mailbox using Hadoop

Reply via email to