We've also recently updated
http://hbase.apache.org/book/ops.capacity.htmlwhich contains similar
numbers, and some more details on the items to
consider for sizing.

Enis



On Sat, Feb 8, 2014 at 10:12 PM, Ramu M S <ramu.ma...@gmail.com> wrote:

> Thanks Lars.
>
> We were in the process of building our HBase cluster. Much smaller size
> though. This discussion helped a lot to us as well.
>
> Regards,
> Ramu
> On Feb 9, 2014 11:06 AM, "lars hofhansl" <la...@apache.org> wrote:
>
> > In a year or two you won't be able to buy 1T or even 2T disks cheaply.
> > More spindles are good more cores are good too. This is a fuzzy art.
> >
> > A hard fact is that HBase cannot (at the moment) handle more than 8-10T
> > per server with HBase, you'd  just have extra disks for IOPS.
> > You won't be happy if you expect each server to store 24T.
> >
> > I would go with more and smaller servers. Some people run two
> > RegionServers on a single machine, but that is not a well explored option
> > at this point (up to recently it needed an HBase patch to work).
> >
> > You *definitely* have to do some benchmarking with your usecase. You
> might
> > be able to get away with fewer servers, you need to test for that.
> >
> > -- Lars
> >
> >
> >
> >
> > ________________________________
> >  From: Ramu M S <ramu.ma...@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Saturday, February 8, 2014 12:10 AM
> > Subject: Re: Regarding Hardware configuration for HBase cluster
> >
> >
> > Lars,
> >
> > What about high density storage servers that has capacity of up to 24
> > drives. There were also some recommendations in few blogs about having 1
> > core per disk.
> >
> > 1TB disks have slight price difference compared to 600 GB. With
> > negotiations it'll be as low as 50$. Also price difference between 8 core
> > and 12 core processors is very less, 200-300$.
> >
> > Do you think having 20-24 cores and 24 1TB disks will also be an option?
> >
> > Regards,
> > Ramu
> >
> > On Feb 8, 2014 11:19 AM, "lars hofhansl" <la...@apache.org> wrote:
> >
> > > Let's not refer to our users in the third person. It's not polite :)
> > >
> > > Suresh,
> > >
> > > I wrote something up about RegionServer sizing here:
> > >
> >
> http://hadoop-hbase.blogspot.com/2013/01/hbase-region-server-memory-sizing.html
> > >
> > > For your load I would guess that you'd need about 100 servers.
> > >
> > > That would:
> > > 1. have 8TB/server
> > > 2. 30m rows/day/server
> > > 3. 30GB/day/server
> > >
> > > You not expect a single server to be able to absorb more than
> 10000rows/s
> > > or 40mb/s, whatever is less.
> > >
> > > The machines I'd size as follows:
> > > 12-16 cores, HT, 1.8GHz-2.4GHz (more is better)
> > > 32-96GB ram
> > > 6-12 drives (more spindles are better to absorb the write load)
> > > 10ge NICs and TopOfRack switches
> > >
> > > Now, this is only a *rough guideline* and obviously you'd have perform
> > > your own tests and this would only scale across if the machines if your
> > > keys are sufficiently distributed.
> > > The details also depend on how compressable your data is and your exact
> > > access patterns (read patters, spiky write load, etc)
> > > Start with 10 data nodes and appropriately scaled down load and see how
> > it
> > > works.
> > >
> > > Vladimir is right here, you probably want to seek professional help.
> > >
> > > -- Lars
> > >
> > >
> > >
> > >
> > > ________________________________
> > >  From: Vladimir Rodionov <vrodio...@carrieriq.com>
> > > To: "user@hbase.apache.org" <user@hbase.apache.org>
> > > Sent: Friday, February 7, 2014 10:29 AM
> > > Subject: RE: Regarding Hardware configuration for HBase cluster
> > >
> > >
> > > This guy is building system of a scale of Yahoo and asking user group
> how
> > > to size the cluster.
> > > Few people here can give him advice based on their experience and I am
> > not
> > > one of them. I can
> > > only speculate on "how many nodes will we need to consume 3TB/3B
> records
> > > daily".
> > >
> > > For this scale of a system its better to go to Cloudera/IBM/HW, and not
> > to
> > > try to build it yourself,
> > > especially when you ask questions on user group (not answer them).
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodio...@carrieriq.com
> > >
> > > ________________________________________
> > >
> > > From: Ted Yu [yuzhih...@gmail.com]
> > > Sent: Friday, February 07, 2014 6:27 AM
> > > To: user@hbase.apache.org
> > > Cc: user@hbase.apache.org
> > > Subject: Re: Regarding Hardware configuration for HBase cluster
> > >
> > > Have you read http://www.slideshare.net/larsgeorge/hbase-sizing-notes?
> > >
> > > Cheers
> > >
> > > On Feb 6, 2014, at 8:47 PM, suresh babu <bigdatac...@gmail.com> wrote:
> > >
> > > > Hi Stana,
> > > >
> > > > We are trying to find out how many data nodes (including hardware
> > > > configuration detail)should be configured or setup for this
> requirement
> > > >
> > > > -suresh
> > > >
> > > > On Friday, February 7, 2014, stana <st...@is-land.com.tw> wrote:
> > > >
> > > >> HI suresh babu :
> > > >>
> > > >> how many data nodes do you have?
> > > >>
> > > >>
> > > >> 2014-02-07 suresh babu <bigdatac...@gmail.com <javascript:;>>:
> > > >>
> > > >>> refreshing the thread,
> > > >>>
> > > >>> Can you please  suggest any inputs for the hardware
> configuration(for
> > > the
> > > >>> below mentioned use case).
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Wed, Feb 5, 2014 at 10:31 AM, suresh babu <
> bigdatac...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Please find the data requirements for our use case below :
> > > >>>>
> > > >>>> Raw data processing
> > > >>>> ----------------------------------
> > > >>>> 1. Data is populated into hdfs , after etl around 3 billion puts
> per
> > > >> day
> > > >>>> in to hbase
> > > >>>>
> > > >>>> 2. Oldest data after X days to be deleted from hbase
> > > >>>>
> > > >>>> Aggregates processing
> > > >>>> ----------------------------------
> > > >>>> 3 billion reads per day ... Large scan or reads
> > > >>>>
> > > >>>> KV size around 1 KB Daily Processing, raw and aggregates, via M/R
> > jobs
> > > >>>> Hive queries in future, but not of immediate focus
> > > >>>> On Feb 5, 2014 12:48 AM, "Vladimir Rodionov" <
> > vrodio...@carrieriq.com
> > > >
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Yes,
> > > >>>>>
> > > >>>>> 1. What is the expected avg and peak load in
> > > >>> writes/updates/deletes/reads?
> > > >>>>> 2. What is the average size of a KV?
> > > >>>>> 3. Reads/small scans/medium/large scan %%
> > > >>>>> 4. Do you plan M/R jobs, Hive query?
> > > >>>>>
> > > >>>>>
> > > >>>>> Best regards,
> > > >>>>> Vladimir Rodionov
> > > >>>>> Principal Platform Engineer
> > > >>>>> Carrier IQ, www.carrieriq.com
> > > >>>>> e-mail: vrodio...@carrieriq.com
> > > >>>>>
> > > >>>>> ________________________________________
> > > >>>>> From: Nick Xie [nick.xie.had...@gmail.com]
> > > >>>>> Sent: Tuesday, February 04, 2014 10:02 AM
> > > >>>>> To: user@hbase.apache.org
> > > >>>>> Subject: Re: Regarding Hardware configuration for HBase cluster
> > > >>>>>
> > > >>>>> I guess you'd better describe a little bit more about your
> > > >> applications.
> > > >>>>> Does the data increase over the time at all?
> > > >>>>>
> > > >>>>> Nick
> > > >>>>>
> > > >>>>>
> > > >>>>> On Tue, Feb 4, 2014 at 5:22 AM, suresh babu <
> bigdatac...@gmail.com
> > >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi folks,
> > > >>>>>>
> > > >>>>>> We are trying to setup HBase cluster for the following
> > requirement:
> > > >>>>>>
> > > >>>>>> We have to maintain data of size around 800TB,
> > > >>>>>>
> > > >>>>>> For the above requirement,please suggest me the best hardware
> > > >>>>> configuration
> > > >>>>>> details like
> > > >>>>>>
> > > >>>>>> 1)how many disks to consider for machine and the  capacity of
> > disks
> > > >>> ,for
> > > >>>>>> example, 16/24 disks per node with 1/2TB capacity per each disk
> > > >>>>>>
> > > >>>>>> 2) which compression method is suited for production
> environment ,
> > > >>>>> space is
> > > >>>>>> not a major limitation , but speed is of prime concern for my
> use
> > > >> case
> > > >>>>>>
> > > >>>>>> 3) how many CPU Cores should be configured for each
> node/machine ?
> > > >> Or
> > > >>>>>> ideal ratio of number of cores to the number of disks,for
> example
> > > >>>>>> 1core/1disk ?
> > > >>>>>>
> > > >>>>>> Regards,
> > > >>>>>> Kaushik
> > > >>>>>
> > > >>>>> Confidentiality Notice:  The information contained in this
> message,
> > > >>>>> including any attachments hereto, may be confidential and is
> > intended
> > > >>> to be
> > > >>>>> read only by the individual or entity to whom this message is
> > > >>> addressed. If
> > > >>>>> the reader of this message is not the intended recipient or an
> > agent
> > > >> or
> > > >>>>> designee of the intended recipient, please note that any review,
> > use,
> > > >>>>> disclosure or distribution of this message or its attachments, in
> > any
> > > >>> form,
> > > >>>>> is strictly prohibited.  If you have received this message in
> > error,
> > > >>> please
> > > >>>>> immediat--
> > > >> Best Regards
> > > >>
> > > >> 亦思科技  is-land Systems Inc.
> > > >> Tel:03-5630345 Ext.14
> > > >> Fax:03-5631345
> > > >> e-MAIL:st...@is-land.com.tw <javascript:;>
> > > >>
> > > >> 何永安 Yung An He
> > > >>
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or notificati...@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
>

Reply via email to