Distributed cluster filesystem on EC2

2011-08-29 Thread Dmitry Pushkarev
Dear hadoop users, Sorry for the off-topic. We're slowly migrating our hadoop cluster to EC2, and one thing that I'm trying to explore is whether we can use alternative scheduling systems like SGE with shared FS for non data intensive tasks, since they are easier to work with for lay users. One p

Re: Distributed cluster filesystem on EC2

2011-08-31 Thread Robert Evans
Dmitry, It sounds like an interesting idea, but I have not really heard of anyone doing it before. It would make for a good feature to have tiered file systems all mapped into the same namespace, but that would be a lot of work and complexity. The quick solution would be to know what data you

Re: Distributed cluster filesystem on EC2

2011-08-31 Thread Tom White
You might consider Apache Whirr (http://whirr.apache.org/) for bringing up Hadoop clusters on EC2. Cheers, Tom On Wed, Aug 31, 2011 at 8:22 AM, Robert Evans wrote: > Dmitry, > > It sounds like an interesting idea, but I have not really heard of anyone > doing it before.  It would make for a goo

Re: Distributed cluster filesystem on EC2

2011-08-31 Thread Dmitry Pushkarev
Thank you for your suggestion, I have years of experience with HDFS and if I could I'd gladly use it as filesystem, however our developers require fast random access and symlinks, so I was wondering if there are other options. I'm not sensitive too much to data locality, but we need to eliminate ne