[ 
https://issues.apache.org/jira/browse/MAILBOX-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049898#comment-13049898
 ] 

Eric Charles commented on MAILBOX-44:
-------------------------------------

Hi there, and tks to Stack to join and help us in this design.

I've added on MAILBOX-72 some food for the brains.

You can see on 
https://issues.apache.org/jira/secure/attachment/12482691/Datamodel-mailbox-0.2.png
 the interfaces that the HBase store will have to implement.
There's no option there, but the implementation is really free to implement it 
as it wants.

First the tables:
- If you look at the classes, we could have Mailbox, Subscription and Message 
tables.
- A row per mailbox, subscription and message
- The unanswered question are: 1. The structure of the rowkey? - 2. Header and 
Property as separate table or as additional column to the message row.

Second the queries:
- The implemented SQL queries are on 
https://issues.apache.org/jira/browse/MAILBOX-72?focusedCommentId=13049883&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13049883
- Some are simple Get (efficient), some not.
- We will need for to use the HBase scanners (existing one, maybe also specific 
one we will have to implement).
- For the IMAP built queries (especially for search), this can lead to a full 
scan of the table (see following point)

Finally the index to help optimize the search
- solr to the rescue can help
- I like lucene on hbase on-going work, especially when it will be done :)
- In the meantime, we could rely on custom hbase scanners (inefficient due to 
full table scan)

Waiting on your feedbacks.
Tks,
- Eric

> [gsoc2011] Design and implement a distributed mailbox using Hadoop
> ------------------------------------------------------------------
>
>                 Key: MAILBOX-44
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-44
>             Project: James Mailbox
>          Issue Type: New Feature
>            Reporter: Eric Charles
>            Assignee: Norman Maurer
>              Labels: gsoc2011
>             Fix For: 0.3
>
>
> Context: The mailbox subproject (http://james.apache.org/mailbox/) supports 
> maildir, SQL database (via JPA) and Java Content Repository (JCR) as 
> technology for mail storage. This flexibility is achieved thanks to a API 
> design that abstracts mail storage from the mail protocols.
> Task: We need to implement mailbox storage as a distributed system on top of 
> Hadoop HDFS. The James mailbox API will be used. A first step is to design 
> how to interact with Hadoop (native api, gora incubator at apache,...) and 
> deal with specific performance questions related to mail loading/parsing in a 
> distributed system (use map/reduce or not, use existing local lucene indexes 
> for search,...). The second step is to implement the HDFS mailbox (maildir 
> mailbox is similar because is stores mails as a file and can be an 
> inspiration). A single James server will still be deployed because we don't 
> have any distributed UID generation.
> Mentor: eric at apache dot org
> Complexity: medium 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org

Reply via email to