some guidance needed

2011-05-18 Thread Ioan Eugen Stan
Hello everybody, I'm a GSoC student for this year and I will be working on James [1]. My project is to implement email storage over HDFS. I am quite new to Hadoop and associates and I am looking for some hints as to get started on the right track. I have installed a single node Hadoop instance on

Re: some guidance needed

2011-05-18 Thread Todd Lipcon
Hi Ioan, I would encourage you to look at a system like HBase for your mail backend. HDFS doesn't work well with lots of little files, and also doesn't support random update, so existing formats like Maildir wouldn't be a good fit. -Todd On Wed, May 18, 2011 at 4:02 PM, Ioan Eugen Stan wrote: >

Re: some guidance needed

2011-05-18 Thread Mark Kerzner
Ioan, I second what Todd said, even with FuseHDFS, mounting HDFS as a regular file system, it won't give you the immediate response about the file status that you need. I believe Google implemented Gmail with HBase. Here is an example of implementing a mail store with Cassandra: http://ewh.ieee.or

Re: some guidance needed

2011-05-19 Thread Ioan Eugen Stan
I have forwarded this discussion to my mentors so they are informed and I hope they will provide better input regarding email storage. > I second what Todd said, even with FuseHDFS, mounting HDFS as a regular file > system, it won't give you the immediate response about the file status that > you

Re: some guidance needed

2011-05-19 Thread Robert Burrell Donkin
On Thu, May 19, 2011 at 12:04 PM, Ioan Eugen Stan wrote: > I have forwarded this discussion to my mentors so they are informed (I've hopped onto this list so no need to remember to copy me into the thread ;-) > Eric, one of my mentors, suggested I use Gora for > this and after a quick look at

Re: some guidance needed

2011-05-23 Thread Eric Charles
Hi, Yes, we need to store immutable mails and their associated r/w metadata. I was wondering in which way a solution like the one presented on [1] can help. Twitter seems to use Protocol Buffers to store tweets. Would a solution based on Avro be a better fit for our needs (mail storage)? In