Or more general:
isn't using virtualized i/o counter effective when dealing with hadoop M/R?
I would think that for running hadoop M/R you'd want predictable and consistent
i/o on each node,
not to mention your bottlenecks are usually disk i/o (and maybe CPU), so using
virtualisation makes
things
One advantage to using Hadoop replication though, is that it provides a
greater pool of potential servers for M/R jobs to execute on. If you
simply use Openstack replication it will appear to the JobTracker that a
particular block only exists on a single server and should only be
executed on t
Replication factor for HDFS can easily be changed to 1 if you don't need it's
redundancy in hdfs-site.xml
Regards,
Dejo
Sent from my iPhone
On 12. 11. 2011., at 03:58, Edmon Begoli wrote:
> A question related to standing up cloud infrastructure for running
> Hadoop/HDFS.
>
> We are building
A question related to standing up cloud infrastructure for running Hadoop/HDFS.
We are building up an infrastructure using Openstack which has its own
storage management redundancy.
We are planning to use Openstack to instantiate Hadoop nodes (HDFS,
M/R tasks, Hive, HBase)
on demand.
The problem