Dear hadoop users,
Sorry for the off-topic. We're slowly migrating our hadoop cluster to EC2,
and one thing that I'm trying to explore is whether we can use alternative
scheduling systems like SGE with shared FS for non data intensive tasks,
since they are easier to work with for lay users.
One p
Dmitry,
It sounds like an interesting idea, but I have not really heard of anyone doing
it before. It would make for a good feature to have tiered file systems all
mapped into the same namespace, but that would be a lot of work and complexity.
The quick solution would be to know what data you
You might consider Apache Whirr (http://whirr.apache.org/) for
bringing up Hadoop clusters on EC2.
Cheers,
Tom
On Wed, Aug 31, 2011 at 8:22 AM, Robert Evans wrote:
> Dmitry,
>
> It sounds like an interesting idea, but I have not really heard of anyone
> doing it before. It would make for a goo
Thank you for your suggestion, I have years of experience with HDFS and if I
could I'd gladly use it as filesystem, however our developers require fast
random access and symlinks, so I was wondering if there are other options.
I'm not sensitive too much to data locality, but we need to eliminate
ne