On 05-Jun-2013, at 12:13 AM, David Ortiz <[email protected]<mailto:[email protected]>> wrote:
Hello, Has anyone tried running a hadoop cluster in a cloudstack environment? I have set one up, but I am finding that I am having some IO contention between slave nodes on each host since they all share one local storage pool. As I understand it, there is not currently a method for using multiple local storage pools with VMs through cloudstack. Has anyone found a workaround for this by any chance? Hi David, Have you seen Seb's http://www.slideshare.net/sebastiengoasguen/cloudstack-and-bigdata slides yet? In my experience running Hadoop (100+ nodes) on traditional servers, its going to be really hard to scale up Hadoop workloads using local storage and HDFS on a cloud. I ran out of IOPS very quickly. There was enough CPU headroom but could not add more slots as disk became the bottleneck. Every time there was a node/disk failure, rebalancing was a nightmare with a 3x HDFS replication factor. If I were to run Hadoop on an IaaS cloud, I would do it very similar to Amazon AWS EMR - instances backed by a "Storage As A Service" layer (S3) for big data instead of HDFS. The system would work as below: - Create a dedicated big data storage tier using a distributed filesystem like Gluster/Ceph/Isilon. Most of the vendors now provide S3 compat connectors for Hadoop. http://ceph.com/docs/master/cephfs/hadoop/ http://gluster.org/community/documentation/index.php/Hadoop http://www.emc.com/big-data/scale-out-storage-hadoop.htm - Hadoop instances are spun up on bare metal or on hypervisors. The service offerings for "big data" instances could will run on dedicated hypervisors (via tags) with high bandwidth network connectivity to the storage service. - Hadoop instances use Local storage for run time data. - Hadoop VMs connect to the storage tier via connectors for permanent storage Benefits: - Spinning up/down VMs don't cause HDFS rebalancing as there is no HDFS anywhere. - Scale out VMs independently of storage. Add more spindles / nodes to the storage cluster to scale out IOPS and capacity - Easy upgrade of Hadoop releases without risk to data Regards. @shankerbalan -- Shanker Balan Managing Consultant [cid:E7CE8425-E245-4C99-B967-713DF2967392@local] M: +91 98860 60539 [email protected]<mailto:[email protected]> | www.shapeblue.com<http://www.shapeblue.com> | Twitter:@shapeblue ShapeBlue India, 22nd floor, Unit 2201A, World Trade Centre, Bangalore - 560 055 This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of Shape Blue Ltd or related companies. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. Shape Blue Ltd is a company incorporated in England & Wales. ShapeBlue Services India LLP is operated under license from Shape Blue Ltd. ShapeBlue is a registered trademark.
