Re: Cluster sizing guidelines

Andrew Purtell Sat, 19 Jul 2014 09:53:27 -0700

The i2.8xlarge and hs1.8xlarge EC2 instance types would provide opportunity for 
testing what really happens today when you attempt a high density storage 
architecture with HDFS and HBase. The hs1 type has 24 spinning disks. I think 
the i2.8xlarge better represents near-future challenges in effective 
utilization: it has 8 x 800 GB SSD and 244 GB of RAM. Would be hard to get 
ahold of and very expensive to operate though.



> On Jul 19, 2014, at 1:32 AM, lars hofhansl <la...@apache.org> wrote:
> 
> Yeah. Right direction. Correct on 3 counts. Should have read all email before 
> I replied to your earlier one.
> 
> 
> 
> 
> ________________________________
> From: Amandeep Khurana <ama...@gmail.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org> 
> Sent: Thursday, July 17, 2014 11:36 AM
> Subject: Re: Cluster sizing guidelines
> 
> 
>> On Wed, Jul 16, 2014 at 2:32 PM, Andrew Purtell <apurt...@apache.org> wrote:
>> 
>> Those questions don't have pat answers. HBase has a few interesting load
>> dependent tunables and the ceiling you'll encounter depends as much on the
>> characteristics of the nodes (particularly, block devices) and the network,
>> not merely the software.
>> 
>> We can certainly, through experimentation, establish upper bounds on perf,
>> optimizing either for throughput at a given payload size or latency within
>> a given bound (your questions #1 and #2). I. e. using now-typical systems
>> with 32 cores, 64-128 GB of RAM (and a fair amount allocated to bucket
>> cache), and 2-4 solid state volumes, and a 10ge network, here are plots of
>> the measured upper bound of metric M on the y-axis over number of slave
>> cluster nodes on the X axis.
> 
> Agreed. I'm trying to figure out what guidelines we can establish for a
> given hardware profile.
> 
> From what I've seen and understood so far, it's a balancing act between the
> following factors for any given type of hardware:
> 
> 1. Write throughput. You are basically bottlenecked on the WAL in this case.
> 2. Read latency. You want to keep as much in memory across if the
> requirements demand low latency. How does off-heap cache play in here and
> what are our experiences in using that in production?
> 3. Total storage requirement. What's the amount of data you can store per
> node? 12x3TB drives are becoming more common but can HBase leverage that
> level of storage density? 40GB regions * 100 regions per server (max) gets
> you to 4TB. Replicated, that becomes 12TB. This is pretty much the max load
> you want to put on a single server from a memory stand point to achieve
> high write throughput or low read latency (factors #1 and #2).
> 
> Am I thinking in the right direction here?
> 
> 
> 
> 
> 
>> 
>> Open questions:
>> 1. Which measurement tool and test automation?
>> 2. Where can we get ~100 decent nodes for a realistic assessment?
>> 3. Who's going to fund the test dev and testbed?
>> 
>> 
>> 
>> On Wed, Jul 16, 2014 at 1:41 PM, Amandeep Khurana <ama...@gmail.com>
>> wrote:
>> 
>>> Thanks Lars.
>>> 
>>> I'm curious how we'd answer questions like:
>>> 1. How many nodes do I need to sustain a write throughput of N reqs/sec
>>> with payload of size M KB?
>>> 2. How many nodes do I need to sustain a read throughput of N reqs/sec
>> with
>>> payload of size M KB with a latency of X ms per read.
>>> 3. How many nodes do I need to store N TB of total data with one of the
>>> above constraints?
>>> 
>>> This goes into looking at the bottlenecks that need to be taken into
>>> account during write and read times and also the max number of regions
>> and
>>> region size that a single region server can host.
>>> 
>>> What are your thoughts on this?
>>> 
>>> -Amandeep
>>> 
>>> 
>>>> On Wed, Jul 16, 2014 at 9:06 AM, lars hofhansl <la...@apache.org> wrote:
>>>> 
>>>> This is a somewhat fuzzy art.
>>>> 
>>>> Some points to consider:
>>>> 1. All data is replicated three ways. Or in other words, if you run
>> three
>>>> RegionServer/Datanodes each machine will get 100% of the writes. If you
>>> run
>>>> 6, each gets 50% of the writes. From that aspect HBase clusters with
>> less
>>>> than 9 RegionServers are not really useful.
>>>> 2. As for the machines themselves. Just go with any reasonable machine,
>>>> and pick the cheapest you can find. At least 8 cores, at least 32GB of
>>> RAM,
>>>> at least 6 disks, no RAID needed. (we have machines with 12 cores in 2
>>>> sockets, 96GB of RAM, 6 4TB drives, no HW RAID). HBase is not yet well
>>>> tuned for SSDs.
>>>> 
>>>> 
>>>> You also carefully need to consider your network topology. With HBase
>>>> you'll see quite some east-west traffic (i.e. between racks). 10ge is
>>> good
>>>> if you have it. We have 1ge everywhere so far, and we found this is a
>>>> single most bottleneck for write performance.
>>>> 
>>>> 
>>>> Also see this blog post about HBase memory sizing (shameless plug):
>> http://hadoop-hbase.blogspot.de/2013/01/hbase-region-server-memory-sizing.html
>>>> 
>>>> 
>>>> I'm planning a blog post about this topic with more details.
>>>> 
>>>> 
>>>> -- Lars
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>>   From: Amandeep Khurana <ama...@gmail.com>
>>>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>>>> Sent: Tuesday, July 15, 2014 10:48 PM
>>>> Subject: Cluster sizing guidelines
>>>> 
>>>> 
>>>> Hi
>>>> 
>>>> How do users usually go about sizing HBase clusters? What are the
>> factors
>>>> you take into account? What are typical hardware profiles you run with?
>>> Any
>>>> data points you can share would help.
>>>> 
>>>> Thanks
>>>> Amandeep
>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>     - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)

Re: Cluster sizing guidelines

Reply via email to