[ https://issues.apache.org/jira/browse/MAILBOX-44?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049898#comment-13049898 ]
Eric Charles commented on MAILBOX-44: ------------------------------------- Hi there, and tks to Stack to join and help us in this design. I've added on MAILBOX-72 some food for the brains. You can see on https://issues.apache.org/jira/secure/attachment/12482691/Datamodel-mailbox-0.2.png the interfaces that the HBase store will have to implement. There's no option there, but the implementation is really free to implement it as it wants. First the tables: - If you look at the classes, we could have Mailbox, Subscription and Message tables. - A row per mailbox, subscription and message - The unanswered question are: 1. The structure of the rowkey? - 2. Header and Property as separate table or as additional column to the message row. Second the queries: - The implemented SQL queries are on https://issues.apache.org/jira/browse/MAILBOX-72?focusedCommentId=13049883&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13049883 - Some are simple Get (efficient), some not. - We will need for to use the HBase scanners (existing one, maybe also specific one we will have to implement). - For the IMAP built queries (especially for search), this can lead to a full scan of the table (see following point) Finally the index to help optimize the search - solr to the rescue can help - I like lucene on hbase on-going work, especially when it will be done :) - In the meantime, we could rely on custom hbase scanners (inefficient due to full table scan) Waiting on your feedbacks. Tks, - Eric > [gsoc2011] Design and implement a distributed mailbox using Hadoop > ------------------------------------------------------------------ > > Key: MAILBOX-44 > URL: https://issues.apache.org/jira/browse/MAILBOX-44 > Project: James Mailbox > Issue Type: New Feature > Reporter: Eric Charles > Assignee: Norman Maurer > Labels: gsoc2011 > Fix For: 0.3 > > > Context: The mailbox subproject (http://james.apache.org/mailbox/) supports > maildir, SQL database (via JPA) and Java Content Repository (JCR) as > technology for mail storage. This flexibility is achieved thanks to a API > design that abstracts mail storage from the mail protocols. > Task: We need to implement mailbox storage as a distributed system on top of > Hadoop HDFS. The James mailbox API will be used. A first step is to design > how to interact with Hadoop (native api, gora incubator at apache,...) and > deal with specific performance questions related to mail loading/parsing in a > distributed system (use map/reduce or not, use existing local lucene indexes > for search,...). The second step is to implement the HDFS mailbox (maildir > mailbox is similar because is stores mails as a file and can be an > inspiration). A single James server will still be deployed because we don't > have any distributed UID generation. > Mentor: eric at apache dot org > Complexity: medium -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org