Re: HDFS and Openstack - avoiding excessive redundancy

2011-11-14 Thread Dieter Plaetinck
Or more general: isn't using virtualized i/o counter effective when dealing with hadoop M/R? I would think that for running hadoop M/R you'd want predictable and consistent i/o on each node, not to mention your bottlenecks are usually disk i/o (and maybe CPU), so using virtualisation makes things

Re: HDFS and Openstack - avoiding excessive redundancy

2011-11-11 Thread Graeme Seaton
One advantage to using Hadoop replication though, is that it provides a greater pool of potential servers for M/R jobs to execute on. If you simply use Openstack replication it will appear to the JobTracker that a particular block only exists on a single server and should only be executed on t

Re: HDFS and Openstack - avoiding excessive redundancy

2011-11-11 Thread Dejan Menges
Replication factor for HDFS can easily be changed to 1 if you don't need it's redundancy in hdfs-site.xml Regards, Dejo Sent from my iPhone On 12. 11. 2011., at 03:58, Edmon Begoli wrote: > A question related to standing up cloud infrastructure for running > Hadoop/HDFS. > > We are building

HDFS and Openstack - avoiding excessive redundancy

2011-11-11 Thread Edmon Begoli
A question related to standing up cloud infrastructure for running Hadoop/HDFS. We are building up an infrastructure using Openstack which has its own storage management redundancy. We are planning to use Openstack to instantiate Hadoop nodes (HDFS, M/R tasks, Hive, HBase) on demand. The problem