Re: mailbox over HDFS/HBase

2011-05-25 Thread Eric Charles
Yes, let's go this way (I will put some links in the initial GSoC JIRA). Your exams first! You will have plenty of time when they will be finished :) In first instance, I think you have to correctly have in hand the james mailbox implementations and the hbase api. mailbox-hbase implementation

Re: mailbox over HDFS/HBase

2011-05-25 Thread Robert Burrell Donkin
On Wed, May 25, 2011 at 1:10 PM, Eric Charles e...@apache.org wrote: Your exams first! You will have plenty of time when they will be finished :) +1 Robert - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For

Re: mailbox over HDFS/HBase

2011-05-25 Thread Robert Burrell Donkin
On Tue, May 24, 2011 at 8:07 PM, Ioan Eugen Stan stan.ieu...@gmail.com wrote:  (my observation) Kind of.. you often see an IMAP client todo some big FETCH on the first connect to see if there are changes in the mailbox. Like a FETCH 1:* (FLAGS) This will hopefully get improved when Apache

Re: mailbox over HDFS/HBase

2011-05-25 Thread Robert Burrell Donkin
On Tue, May 24, 2011 at 12:24 PM, Eric Charles e...@apache.org wrote: snip On 24/05/2011 07:44, Norman wrote: snip - users usually access the last 50-100 emails (my observation) Kind of.. you often see an IMAP client todo some big FETCH on the first connect to see if there are changes in

Re: mailbox over HDFS/HBase

2011-05-25 Thread Eric Charles
Yes, data transfer on the network is for now the main latency cause. However, there are still some optimization to implement regarding large queries. For now, it is batched (batch of 100 I think), but solution like you propose would be better (depending on mailbox implementation to support

Re: mailbox over HDFS/HBase

2011-05-24 Thread Eric Charles
On 24/05/2011 07:51, Norman wrote: 2. If we store each folder in a file, we may have less performance issue on read (larger file), but we face the issue that we can not alter the content (only append!!). So does not sound like an option. Well we could just have some kind of info which mails

Re: mailbox over HDFS/HBase

2011-05-24 Thread Eric Charles
See my comments inline. Tks, - Eric On 24/05/2011 07:44, Norman wrote: snip First, about email : - emails are essentially immutable. Once created they do not modify. - meta information is read/write (like the status - read/unread). maybe other stuff, I still have to get up to date. The only

Re: mailbox over HDFS/HBase

2011-05-24 Thread Eric Charles
On 24/05/2011 07:44, Norman wrote: I wrote a prototype which use cassandra for Apache James Mailbox, which is not Open-Source (yet?). It works quite well but suffer from any locking, so you need some distributed locking service like hazelcast [e]. So using NoSQL should work without probs, you

Re: mailbox over HDFS/HBase

2011-05-24 Thread Ioan Eugen Stan
(my observation) Kind of.. you often see an IMAP client todo some big FETCH on the first connect to see if there are changes in the mailbox. Like a FETCH 1:* (FLAGS) This will hopefully get improved when Apache James IMAP supports the CONDSTORE[a] and QRESYNC[b] extensions. But thats on

Re: mailbox over HDFS/HBase

2011-05-24 Thread Ioan Eugen Stan
So: - mailbox (immutable: create/read/delete/query) - message (immutable: create/read/delete/query) - message flags (create/read/update/delete/query) - subscriptions (create/read/update/delete/query) The mailbox and message datamodel is defined in [1] (please note the need Header and

mailbox over HDFS/HBase

2011-05-23 Thread Ioan Eugen Stan
Hello, I had some discussions with Eric about what will be the best way to implement the mailbox over HDFS and we agreed that it's better to inform the list about the situation. The project idea that I applied for is to implement James mailbox storage over Hadoop HDFS and one of the first steps

Re: mailbox over HDFS/HBase

2011-05-23 Thread Eric Charles
Hi, For the immutable mails: 1. if we store each mail in a file, we don't have the alter it but we face the performance issue cause reading a small file in Hadoop seems expensive (not performant). 2. If we store each folder in a file, we may have less performance issue on read (larger

Re: mailbox over HDFS/HBase

2011-05-23 Thread Norman
Hi there, comments inside... Am 24.05.2011 00:01, schrieb Ioan Eugen Stan: Hello, I had some discussions with Eric about what will be the best way to implement the mailbox over HDFS and we agreed that it's better to inform the list about the situation. The project idea that I applied for is

Re: mailbox over HDFS/HBase

2011-05-23 Thread Norman
Hi Eric, comments inside... Am 24.05.2011 06:08, schrieb Eric Charles: Hi, For the immutable mails: 1. if we store each mail in a file, we don't have the alter it but we face the performance issue cause reading a small file in Hadoop seems expensive (not performant). Seems like this,