[ 
https://issues.apache.org/jira/browse/MAILBOX-103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194605#comment-13194605
 ] 

Ioan Eugen Stan commented on MAILBOX-103:
-----------------------------------------

We can use ZooKeeper to implement this. Full thread: 
http://mail-archives.apache.org/mod_mbox/zookeeper-user/201201.mbox/%3CCAFvdMiCeMRxJaRg56zAFMRQSMB_oxRMzAYJ7e%3DJOQVf94Wscdg%40mail.gmail.com%3E

Use plain ZooKeeper and rely on znode version for sequence generation for both 
UID's and ModSeq.

This should scale very well with a single Zk ensemble to the number of
millions. After that we can use multiple Zk ensembles where each
ensemble should manage a shard of the mailboxes.

The first thing that comes to mind is the way Debian stores packages
[3], where they use the first letter of the package as a directory to
group all packages that start with the same name into a single
directory.
This way we can make an ensemble handle all mailboxes that start with
0-4 and another that handles 5-9. This way, considering the mailboxes
are generated uniformly, we can split the load in half so we have
horizontal scalability.

[1] 
http://zookeeper.apache.org/doc/current/zookeeperOver.html#fg_zkPerfReliability
[2] http://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview
[3] ftp://ftp.be.debian.org/debian/pool/main

                
> [gsoc2011] Design and implement Distributed UID generation
> ----------------------------------------------------------
>
>                 Key: MAILBOX-103
>                 URL: https://issues.apache.org/jira/browse/MAILBOX-103
>             Project: James Mailbox
>          Issue Type: New Feature
>          Components: hbase
>    Affects Versions: 0.4
>            Reporter: Eric Charles
>             Fix For: 0.4
>
>
> Context: IMAP4rev1 (RFC3501 requires that every message is identified by a 
> stable 32-bit Unique Identifier (UID) assigned in incremental sequence. This 
> is now achieved in James IMAP subproject (http://james.apache.org/imap) with 
> a UidProvider interface implemented in memory. This implementation does not 
> allow distributed working of the solution.
> Task: A DistributedUidProvider must be designed. The design can rely on a 
> distributed memory cache such as hazelcast , or any other solution (hadoop, 
> hbase, cassandra,...), and implemented.
> Mentor: eric at apache dot org
> Complexity: medium 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to