Re: recommendation on HDDs

2011-02-18 Thread Shrinivas Joshi
There seems to be a wiki page already intended for capturing information on disks in Hadoop environment. http://wiki.apache.org/hadoop/DiskSetup Do we just want to link the thread on HDD recommendations from this wiki page? -Shrinivas On Tue, Feb 15, 2011 at 11:48 AM, zGreenfelder

Re: recommendation on HDDs

2011-02-15 Thread Shrinivas Joshi
Thanks much to all who shared their inputs. This really helps. It would be nice to have a wiki page collecting all this good information. I will check with that. We are definitely going with large capacity disks (= 1TB). -Shrinivas On Sat, Feb 12, 2011 at 1:22 PM, Ted Dunning

Re: recommendation on HDDs

2011-02-15 Thread Ted Dunning
Good idea! Would you like to create the nucleus of such a page? (there might already be something like that) On Tue, Feb 15, 2011 at 8:49 AM, Shrinivas Joshi jshrini...@gmail.comwrote: It would be nice to have a wiki page collecting all this good information.

Re: recommendation on HDDs

2011-02-15 Thread zGreenfelder
untopposing everything. Since the OP believes that their requirement is 1TB per node... a single 2TB would be the best choice. It allows for additional space and you really shouldn't be too worried about disk i/o being your bottleneck. The original poster also seemed somewhat interested

Re: recommendation on HDDs

2011-02-14 Thread Steve Loughran
On 10/02/11 22:25, Michael Segel wrote: Shrinivas, Assuming you're in the US, I'd recommend the following: Go with 2TB 7200 SATA hard drives. (Not sure what type of hardware you have) What we've found is that in the data nodes, there's an optimal configuration that balances price versus

Re: recommendation on HDDs

2011-02-14 Thread Steve Loughran
On 12/02/11 16:26, Michael Segel wrote: All, I'd like to clarify somethings... First the concept is to build out a cluster of commodity hardware. So when you do your shopping you want to get the most bang for your buck. That is the 'sweet spot' that I'm talking about. When you look at your

RE: recommendation on HDDs

2011-02-14 Thread Michael Segel
and go get my first cup of coffee. :-) -Mike Date: Mon, 14 Feb 2011 11:23:13 + From: ste...@apache.org To: common-user@hadoop.apache.org Subject: Re: recommendation on HDDs On 12/02/11 16:26, Michael Segel wrote: All, I'd like to clarify somethings... First the concept

Re: recommendation on HDDs

2011-02-12 Thread Edward Capriolo
of the drives, then add to cluster as 'new' node. Just my $0.02 cents. HTH -Mike Date: Thu, 10 Feb 2011 15:47:16 -0600 Subject: Re: recommendation on HDDs From: jshrini...@gmail.com To: common-user@hadoop.apache.org Hi Ted, Chris, Much appreciate your quick

RE: recommendation on HDDs

2011-02-12 Thread Michael Segel
be the best choice. It allows for additional space and you really shouldn't be too worried about disk i/o being your bottleneck. HTH -Mike Date: Sat, 12 Feb 2011 10:42:50 -0500 Subject: Re: recommendation on HDDs From: edlinuxg...@gmail.com To: common-user@hadoop.apache.org On Fri, Feb 11

Re: recommendation on HDDs

2011-02-12 Thread James Seigel
-Mike Date: Thu, 10 Feb 2011 15:47:16 -0600 Subject: Re: recommendation on HDDs From: jshrini...@gmail.com To: common-user@hadoop.apache.org Hi Ted, Chris, Much appreciate your quick reply. The reason why we are looking for smaller capacity drives is because we are not anticipating

Re: recommendation on HDDs

2011-02-12 Thread Ted Dunning
The original poster also seemed somewhat interested in disk bandwidth. That is facilitated by having more than on disk in the box. On Sat, Feb 12, 2011 at 8:26 AM, Michael Segel michael_se...@hotmail.comwrote: Since the OP believes that their requirement is 1TB per node... a single 2TB would

Re: recommendation on HDDs

2011-02-11 Thread Shrinivas Joshi
as 'new' node. Just my $0.02 cents. HTH -Mike Date: Thu, 10 Feb 2011 15:47:16 -0600 Subject: Re: recommendation on HDDs From: jshrini...@gmail.com To: common-user@hadoop.apache.org Hi Ted, Chris, Much appreciate your quick reply. The reason why we are looking for smaller capacity

recommendation on HDDs

2011-02-10 Thread Shrinivas Joshi
What would be a good hard drive for a 7 node cluster which is targeted to run a mix of IO and CPU intensive Hadoop workloads? We are looking for around 1 TB of storage on each node distributed amongst 4 or 5 disks. So either 250GB * 4 disks or 160GB * 5 disks. Also it should be less than 100$ each

Re: recommendation on HDDs

2011-02-10 Thread Ted Dunning
Get bigger disks. Data only grows and having extra is always good. You can get 2TB drives for $100 and 1TB for $75. As far as transfer rates are concerned, any 3GB/s SATA drive is going to be about the same (ish). Seek times will vary a bit with rotation speed, but with Hadoop, you will be