We've also recently updated http://hbase.apache.org/book/ops.capacity.htmlwhich contains similar numbers, and some more details on the items to consider for sizing.
Enis On Sat, Feb 8, 2014 at 10:12 PM, Ramu M S <ramu.ma...@gmail.com> wrote: > Thanks Lars. > > We were in the process of building our HBase cluster. Much smaller size > though. This discussion helped a lot to us as well. > > Regards, > Ramu > On Feb 9, 2014 11:06 AM, "lars hofhansl" <la...@apache.org> wrote: > > > In a year or two you won't be able to buy 1T or even 2T disks cheaply. > > More spindles are good more cores are good too. This is a fuzzy art. > > > > A hard fact is that HBase cannot (at the moment) handle more than 8-10T > > per server with HBase, you'd just have extra disks for IOPS. > > You won't be happy if you expect each server to store 24T. > > > > I would go with more and smaller servers. Some people run two > > RegionServers on a single machine, but that is not a well explored option > > at this point (up to recently it needed an HBase patch to work). > > > > You *definitely* have to do some benchmarking with your usecase. You > might > > be able to get away with fewer servers, you need to test for that. > > > > -- Lars > > > > > > > > > > ________________________________ > > From: Ramu M S <ramu.ma...@gmail.com> > > To: user@hbase.apache.org > > Sent: Saturday, February 8, 2014 12:10 AM > > Subject: Re: Regarding Hardware configuration for HBase cluster > > > > > > Lars, > > > > What about high density storage servers that has capacity of up to 24 > > drives. There were also some recommendations in few blogs about having 1 > > core per disk. > > > > 1TB disks have slight price difference compared to 600 GB. With > > negotiations it'll be as low as 50$. Also price difference between 8 core > > and 12 core processors is very less, 200-300$. > > > > Do you think having 20-24 cores and 24 1TB disks will also be an option? > > > > Regards, > > Ramu > > > > On Feb 8, 2014 11:19 AM, "lars hofhansl" <la...@apache.org> wrote: > > > > > Let's not refer to our users in the third person. It's not polite :) > > > > > > Suresh, > > > > > > I wrote something up about RegionServer sizing here: > > > > > > http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html > > > > > > For your load I would guess that you'd need about 100 servers. > > > > > > That would: > > > 1. have 8TB/server > > > 2. 30m rows/day/server > > > 3. 30GB/day/server > > > > > > You not expect a single server to be able to absorb more than > 10000rows/s > > > or 40mb/s, whatever is less. > > > > > > The machines I'd size as follows: > > > 12-16 cores, HT, 1.8GHz-2.4GHz (more is better) > > > 32-96GB ram > > > 6-12 drives (more spindles are better to absorb the write load) > > > 10ge NICs and TopOfRack switches > > > > > > Now, this is only a *rough guideline* and obviously you'd have perform > > > your own tests and this would only scale across if the machines if your > > > keys are sufficiently distributed. > > > The details also depend on how compressable your data is and your exact > > > access patterns (read patters, spiky write load, etc) > > > Start with 10 data nodes and appropriately scaled down load and see how > > it > > > works. > > > > > > Vladimir is right here, you probably want to seek professional help. > > > > > > -- Lars > > > > > > > > > > > > > > > ________________________________ > > > From: Vladimir Rodionov <vrodio...@carrieriq.com> > > > To: "user@hbase.apache.org" <user@hbase.apache.org> > > > Sent: Friday, February 7, 2014 10:29 AM > > > Subject: RE: Regarding Hardware configuration for HBase cluster > > > > > > > > > This guy is building system of a scale of Yahoo and asking user group > how > > > to size the cluster. > > > Few people here can give him advice based on their experience and I am > > not > > > one of them. I can > > > only speculate on "how many nodes will we need to consume 3TB/3B > records > > > daily". > > > > > > For this scale of a system its better to go to Cloudera/IBM/HW, and not > > to > > > try to build it yourself, > > > especially when you ask questions on user group (not answer them). > > > > > > Best regards, > > > Vladimir Rodionov > > > Principal Platform Engineer > > > Carrier IQ, www.carrieriq.com > > > e-mail: vrodio...@carrieriq.com > > > > > > ________________________________________ > > > > > > From: Ted Yu [yuzhih...@gmail.com] > > > Sent: Friday, February 07, 2014 6:27 AM > > > To: user@hbase.apache.org > > > Cc: user@hbase.apache.org > > > Subject: Re: Regarding Hardware configuration for HBase cluster > > > > > > Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes? > > > > > > Cheers > > > > > > On Feb 6, 2014, at 8:47 PM, suresh babu <bigdatac...@gmail.com> wrote: > > > > > > > Hi Stana, > > > > > > > > We are trying to find out how many data nodes (including hardware > > > > configuration detail)should be configured or setup for this > requirement > > > > > > > > -suresh > > > > > > > > On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote: > > > > > > > >> HI suresh babu : > > > >> > > > >> how many data nodes do you have? > > > >> > > > >> > > > >> 2014-02-07 suresh babu <bigdatac...@gmail.com <javascript:;>>: > > > >> > > > >>> refreshing the thread, > > > >>> > > > >>> Can you please suggest any inputs for the hardware > configuration(for > > > the > > > >>> below mentioned use case). > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu < > bigdatac...@gmail.com> > > > >>> wrote: > > > >>> > > > >>>> Please find the data requirements for our use case below : > > > >>>> > > > >>>> Raw data processing > > > >>>> ---------------------------------- > > > >>>> 1. Data is populated into hdfs , after etl around 3 billion puts > per > > > >> day > > > >>>> in to hbase > > > >>>> > > > >>>> 2. Oldest data after X days to be deleted from hbase > > > >>>> > > > >>>> Aggregates processing > > > >>>> ---------------------------------- > > > >>>> 3 billion reads per day ... Large scan or reads > > > >>>> > > > >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R > > jobs > > > >>>> Hive queries in future, but not of immediate focus > > > >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" < > > vrodio...@carrieriq.com > > > > > > > >>>> wrote: > > > >>>> > > > >>>>> Yes, > > > >>>>> > > > >>>>> 1. What is the expected avg and peak load in > > > >>> writes/updates/deletes/reads? > > > >>>>> 2. What is the average size of a KV? > > > >>>>> 3. Reads/small scans/medium/large scan %% > > > >>>>> 4. Do you plan M/R jobs, Hive query? > > > >>>>> > > > >>>>> > > > >>>>> Best regards, > > > >>>>> Vladimir Rodionov > > > >>>>> Principal Platform Engineer > > > >>>>> Carrier IQ, www.carrieriq.com > > > >>>>> e-mail: vrodio...@carrieriq.com > > > >>>>> > > > >>>>> ________________________________________ > > > >>>>> From: Nick Xie [nick.xie.had...@gmail.com] > > > >>>>> Sent: Tuesday, February 04, 2014 10:02 AM > > > >>>>> To: user@hbase.apache.org > > > >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster > > > >>>>> > > > >>>>> I guess you'd better describe a little bit more about your > > > >> applications. > > > >>>>> Does the data increase over the time at all? > > > >>>>> > > > >>>>> Nick > > > >>>>> > > > >>>>> > > > >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu < > bigdatac...@gmail.com > > > > > > >>>>> wrote: > > > >>>>> > > > >>>>>> Hi folks, > > > >>>>>> > > > >>>>>> We are trying to setup HBase cluster for the following > > requirement: > > > >>>>>> > > > >>>>>> We have to maintain data of size around 800TB, > > > >>>>>> > > > >>>>>> For the above requirement,please suggest me the best hardware > > > >>>>> configuration > > > >>>>>> details like > > > >>>>>> > > > >>>>>> 1)how many disks to consider for machine and the capacity of > > disks > > > >>> ,for > > > >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk > > > >>>>>> > > > >>>>>> 2) which compression method is suited for production > environment , > > > >>>>> space is > > > >>>>>> not a major limitation , but speed is of prime concern for my > use > > > >> case > > > >>>>>> > > > >>>>>> 3) how many CPU Cores should be configured for each > node/machine ? > > > >> Or > > > >>>>>> ideal ratio of number of cores to the number of disks,for > example > > > >>>>>> 1core/1disk ? > > > >>>>>> > > > >>>>>> Regards, > > > >>>>>> Kaushik > > > >>>>> > > > >>>>> Confidentiality Notice: The information contained in this > message, > > > >>>>> including any attachments hereto, may be confidential and is > > intended > > > >>> to be > > > >>>>> read only by the individual or entity to whom this message is > > > >>> addressed. If > > > >>>>> the reader of this message is not the intended recipient or an > > agent > > > >> or > > > >>>>> designee of the intended recipient, please note that any review, > > use, > > > >>>>> disclosure or distribution of this message or its attachments, in > > any > > > >>> form, > > > >>>>> is strictly prohibited. If you have received this message in > > error, > > > >>> please > > > >>>>> immediat-- > > > >> Best Regards > > > >> > > > >> 亦思科技 is-land Systems Inc. > > > >> Tel:03-5630345 Ext.14 > > > >> Fax:03-5631345 > > > >> e-MAIL:st...@is-land.com.tw <javascript:;> > > > >> > > > >> 何永安 Yung An He > > > >> > > > > > > Confidentiality Notice: The information contained in this message, > > > including any attachments hereto, may be confidential and is intended > to > > be > > > read only by the individual or entity to whom this message is > addressed. > > If > > > the reader of this message is not the intended recipient or an agent or > > > designee of the intended recipient, please note that any review, use, > > > disclosure or distribution of this message or its attachments, in any > > form, > > > is strictly prohibited. If you have received this message in error, > > please > > > immediately notify the sender and/or notificati...@carrieriq.com and > > > delete or destroy any copy of this message and its attachments. >