Re: Hbase Hardware requirement

2011-06-09 Thread Ted Dunning
On Fri, Jun 10, 2011 at 3:58 AM, Michel Segel wrote: > Not to mention you don't get a linear boost w port bonding. > Well, you don't get linear boost with switch level bonding. You can get it, however. > You have to be careful on hardware recommendations because there are > pricing sweet spots

Re: Hbase Hardware requirement

2011-06-09 Thread Michel Segel
Expensive is relative and with the latest intel hardware release you're starting to see 10gbe on the motherboard. Not to mention you don't get a linear boost w port bonding. You have to be careful on hardware recommendations because there are pricing sweet spots and technology changes. Sent fr

Re: Hbase Hardware requirement

2011-06-09 Thread M. C. Srivas
Ensure enough networking bandwidth to match your drive-bandwidth, otherwise your compaction rates are going to be abysmal. 10 GigE ports are expensive, so consider 2 x 1GigE per box (or even 4 x 1GigE if you can get that many on-board NICs). On Wed, Jun 8, 2011 at 8:49 AM, Andrew Purtell wrote:

Re: Hbase Hardware requirement

2011-06-08 Thread Andrew Purtell
> From: Ted Dunning > Lots of people are moving towards more spindles per box to > increase IOP/s > > This is particular important for cases where the working > set gets pushed out of memory. Indeed. Our spec is more like 12x 500 GB SATA disks, to push IOPS and more evenly balance CPUs (fast du

RE: Hbase Hardware requirement

2011-06-07 Thread Buttler, David
07, 2011 1:11 AM To: hbase-u...@hadoop.apache.org Subject: Hbase Hardware requirement Dear friends, Please suggest a standard hardware configuration for hbase cluster which is going to be used to pull and store a lot of data. -- Thanks, Shah

Re: Hbase Hardware requirement

2011-06-07 Thread Jack Levin
Sandy Bridge and SolarFlare are changing some of the design > considerations. > >> Date: Tue, 7 Jun 2011 10:32:58 +0200 >> Subject: Re: Hbase Hardware requirement >> From: timrobertson...@gmail.com >> To: user@hbase.apache.org >> >> http://www.cloudera.com/blo

RE: Hbase Hardware requirement

2011-06-07 Thread Michael Segel
And even that recommendation isn't right. ;-) I think Sandy Bridge and SolarFlare are changing some of the design considerations. > Date: Tue, 7 Jun 2011 10:32:58 +0200 > Subject: Re: Hbase Hardware requirement > From: timrobertson...@gmail.com > To: user@hbase.apa

Re: Hbase Hardware requirement

2011-06-07 Thread Ted Dunning
Lots of people are moving towards more spindles per box to increase IOP/s This is particular important for cases where the working set gets pushed out of memory. On Tue, Jun 7, 2011 at 1:32 AM, Tim Robertson wrote: > > http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic

Re: Hbase Hardware requirement

2011-06-07 Thread Tim Robertson
http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/ "4 1TB hard disks in a JBOD (Just a Bunch Of Disks) configuration 2 quad core CPUs, running at least 2-2.5GHz 16-24GBs of RAM (24-32GBs if you’re considering HBase) Gigabit Ethernet" HTH, Tim

Hbase Hardware requirement

2011-06-07 Thread Shahnawaz Saifi
Dear friends, Please suggest a standard hardware configuration for hbase cluster which is going to be used to pull and store a lot of data. -- Thanks, Shah

RE: Hadoop/HBase hardware requirement

2010-11-22 Thread Michael Segel
task and then you have 1GB for datanode and 1GB for Task Tracker. The you'd have some head room to run 1024MB per task, or more if you end up with larger jobs. HTH -Mike > Date: Mon, 22 Nov 2010 15:50:05 +0200 > Subject: Re: Hadoop/HBase hardware requirement > From: li...@inf

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Lior Schachter
I believe our bottleneck is currently with memory. Most of the CPU is dedicated to parsing and working with gzip files (we don't have heavy computational tasks). But on the other hand, more memory and disk mean we can run more M/R and scans which needs more CPU Lior On Mon, Nov 22, 2010 at 3

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Lars George
Hi Lior, Depends on your load, is it IO or CPU bound? Sounds like IO or Disk from the above, right? I would opt for the more machines! This will spread the load better across the cluster. And you can always add more disks in v2 of your setup. Lars On Mon, Nov 22, 2010 at 1:56 PM, Lior Schachter

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Lior Schachter
Hi Lars, I agree with every sentence you wrote (and that's why we chose HBase). However, from a managerial point-of-view the question of the initial investment is very important (specially when considering a new technology). Lior p.s. The price is in USD On Mon, Nov 22, 2010 at 2:43 PM, La

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Lior Schachter
And another more concrete question: lets say that on every machine with two quad core CPUs, 4T and 16GB I can buy 2 machines with one quad, 2T, 16GB. Which configuration should I choose ? Lior On Mon, Nov 22, 2010 at 2:27 PM, Lior Schachter wrote: > Hi all, Thanks for your input and assistance

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Lars George
Hi Lior, I can only hope you state this in Schekel! But 20 nodes with Hadoop can do quite a lot and you cannot compare a single Oracle box with a 20 node Hadoop cluster as they serve slightly different use-cases. You need to make a commitment to what you want to achieve with HBase and that growth

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Lior Schachter
Hi all, Thanks for your input and assistance. >From your answers I understand that: 1. more is better but our configuration might work. 2. there are small tweaks we can do that will improve our configuration (like having 4x500GB disks). 3. use monitoring (like Ganglia) to find the bottlenecks. F

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Lars George
Oleg, Do you have Ganglia or some other graphing tool running against the cluster? It gives you metrics that are crucial here, for example the load on Hadoop and its DataNodes as well as insertion rates etc. on HBase. What is also interesting is the compaction queue to see if the cluster is going

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Oleg Ruchovets
On Sun, Nov 21, 2010 at 10:39 PM, Krishna Sankar wrote: > Oleg & Lior, > > Couple of questions & couple of suggestions to ponder: > A) When you say 20 Name Servers, I assume you are talking about 20 Task > Servers > Yes > B) What type are your M/R jobs ? Compute Intensive vs. storage intensiv

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Oleg Ruchovets
Yes , I am fully agree with that , but As I understand there is a limitation : Bulk load works only with one column family. It is not my case. Can you advice me workaround how to use bulk load with multi column families? Thanks Oleg. > Are these insertions the output of the MR jobs? > > If s

Re: Hadoop/HBase hardware requirement

2010-11-22 Thread Oleg Ruchovets
On Sun, Nov 21, 2010 at 9:55 PM, Jean-Daniel Cryans wrote: I'm unclear about the 2TB disk thing, is it 1x2TB or 2x1TB or 4x500GB? I hope it's the last one, as you want to have as many spindles as possible. We have 2X1TB , I would prefer 24GB to 16, this is what we run on and it works like a char

Re: Hadoop/HBase hardware requirement

2010-11-21 Thread Todd Lipcon
On Sun, Nov 21, 2010 at 5:53 AM, Oleg Ruchovets wrote: > Hi all, > After testing HBase for few months with very light configurations (5 > machines, 2 TB disk, 8 GB RAM), we are now planing for production. > Our Load - > 1) 50GB log files to process per day by Map/Reduce jobs. > 2) Insert 4-5GB t

Re: Hadoop/HBase hardware requirement

2010-11-21 Thread Krishna Sankar
Oleg & Lior, Couple of questions & couple of suggestions to ponder: A) When you say 20 Name Servers, I assume you are talking about 20 Task Servers B) What type are your M/R jobs ? Compute Intensive vs. storage intensive ? C) What is your Data growth ? D) With the current jobs, are you saturat

Re: Hadoop/HBase hardware requirement

2010-11-21 Thread Jean-Daniel Cryans
I'm unclear about the 2TB disk thing, is it 1x2TB or 2x1TB or 4x500GB? I hope it's the last one, as you want to have as many spindles as possible. I would prefer 24GB to 16, this is what we run on and it works like a charm, and gives more room for those memory hungry jobs. What kind of stability i

Hadoop/HBase hardware requirement

2010-11-21 Thread Oleg Ruchovets
Hi all, After testing HBase for few months with very light configurations (5 machines, 2 TB disk, 8 GB RAM), we are now planing for production. Our Load - 1) 50GB log files to process per day by Map/Reduce jobs. 2) Insert 4-5GB to 3 tables in hbase. 3) Run 10-20 scans per day (scanning about 20 r