RE: Hadoop cluster running in cloudstack

David Ortiz Thu, 06 Jun 2013 11:16:05 -0700

Chiradeep,
     Currently I am working with KVM hypervisor nodes.  The use case of having 
4 spindles and assigning one to each node is exactly what I would like to do.  
For the moment I have all four spindles configured in a RAID with the 
cloudstack local storage pointed at it.
Shanker,
      I had not seen that slideshow yet, so thank you for pointing me to it.  
As of now, the hadoop resources I am using are statically allocated between 4 
hosts.  As it stands now, I am constrained to those resources without the 
ability to add any additional storage cluster (or additional storage to my 
current shared storage appliance), or additional nodes.  Fortunately, my use 
cases don't require any kind of reallocation of the hadoop nodes.  It's more 
clients for the cluster as well as web service nodes that run clients that are 
being dynamically spun up and down.  I have found that I can get through my 
jobs alright, they just take a lot of extra time to run since I have the 
storage acting as a bottleneck right now.
Thanks,     David Ortiz


> From: [email protected]
> Subject: Re: Hadoop cluster running in cloudstack
> Date: Thu, 6 Jun 2013 10:23:50 -0400
> To: [email protected]
> 
> 
> On Jun 6, 2013, at 4:05 AM, Shanker Balan <[email protected]> wrote:
> 
> > On 05-Jun-2013, at 12:13 AM, David Ortiz <[email protected]> wrote:
> > 
> >> Hello,
> >>    Has anyone tried running a hadoop cluster in a cloudstack environment?  
> >> I have set one up, but I am finding that I am having some IO contention 
> >> between slave nodes on each host since they all share one local storage 
> >> pool.  As I understand it, there is not currently a method for using 
> >> multiple local storage pools with VMs through cloudstack.  Has anyone 
> >> found a workaround for this by any chance?
> > 
> > 
> > Hi David,
> > 
> > Have you seen Seb's 
> > http://www.slideshare.net/sebastiengoasguen/cloudstack-and-bigdata slides 
> > yet?
> 
> As a quick disclaimer, the various configurations I highlight in this deck 
> are a bit hand wavy and I did not test them. I just made a guess about how 
> one might want to use the baremetal functionality in cloudstack. The main 
> distinction being between using a "big data" store as storage backends of 
> cloudstack and using cloudstack to provision a bigdata store on-demand.
> 
> -sebastien
> 
> > 
> > In my experience running Hadoop (100+ nodes) on traditional servers, its 
> > going to be really hard to scale up Hadoop workloads using local storage 
> > and HDFS on a cloud.
> > 
> > I ran out of IOPS very quickly. There was enough CPU headroom but could not 
> > add more slots as disk became the bottleneck. Every time there was a 
> > node/disk failure, rebalancing was a nightmare with a 3x HDFS replication 
> > factor. 
> > 
> > If I were to run Hadoop on an IaaS cloud, I would do it very similar to 
> > Amazon AWS EMR - instances backed by a "Storage As A Service" layer (S3) 
> > for big data instead of HDFS.
> > 
> > The system would work as below:
> > 
> > - Create a dedicated big data storage tier using a distributed filesystem 
> > like Gluster/Ceph/Isilon. Most of the vendors now provide S3 compat 
> > connectors for Hadoop.
> > 
> > http://ceph.com/docs/master/cephfs/hadoop/
> > http://gluster.org/community/documentation/index.php/Hadoop
> > http://www.emc.com/big-data/scale-out-storage-hadoop.htm
> > 
> > - Hadoop instances are spun up on bare metal or on hypervisors. The service 
> > offerings for "big data" instances could will run on dedicated hypervisors 
> > (via tags) with high bandwidth network connectivity to the storage service.
> > 
> > - Hadoop instances use Local storage for run time data.
> > 
> > - Hadoop VMs connect to the storage tier via connectors for permanent 
> > storage
> > 
> > Benefits:
> > 
> > - Spinning up/down VMs don't cause HDFS rebalancing as there is no HDFS 
> > anywhere.
> > 
> > - Scale out VMs independently of storage. Add more spindles / nodes to the 
> > storage cluster to scale out IOPS and capacity
> > 
> > - Easy upgrade of Hadoop releases without risk to data
> > 
> > Regards.
> > @shankerbalan
> > 
> > -- 
> > Shanker Balan
> > Managing Consultant
> > 
> > 
> > 
> > M: +91 98860 60539
> > [email protected] | www.shapeblue.com | Twitter:@shapeblue
> > ShapeBlue India, 22nd floor, Unit 2201A, World Trade Centre, Bangalore - 
> > 560 055
> > 
> > This email and any attachments to it may be confidential and are intended 
> > solely for the use of the individual to whom it is addressed. Any views or 
> > opinions expressed are solely those of the author and do not necessarily 
> > represent those of Shape Blue Ltd or related companies. If you are not the 
> > intended recipient of this email, you must neither take any action based 
> > upon its contents, nor copy or show it to anyone. Please contact the sender 
> > if you believe you have received this email in error. Shape Blue Ltd is a 
> > company incorporated in England & Wales. ShapeBlue Services India LLP is 
> > operated under license from Shape Blue Ltd. ShapeBlue is a registered 
> > trademark.
>

RE: Hadoop cluster running in cloudstack

Reply via email to