The i2.8xlarge and hs1.8xlarge EC2 instance types would provide opportunity for testing what really happens today when you attempt a high density storage architecture with HDFS and HBase. The hs1 type has 24 spinning disks. I think the i2.8xlarge better represents near-future challenges in effective utilization: it has 8 x 800 GB SSD and 244 GB of RAM. Would be hard to get ahold of and very expensive to operate though.
> On Jul 19, 2014, at 1:32 AM, lars hofhansl <la...@apache.org> wrote: > > Yeah. Right direction. Correct on 3 counts. Should have read all email before > I replied to your earlier one. > > > > > ________________________________ > From: Amandeep Khurana <ama...@gmail.com> > To: "user@hbase.apache.org" <user@hbase.apache.org> > Sent: Thursday, July 17, 2014 11:36 AM > Subject: Re: Cluster sizing guidelines > > >> On Wed, Jul 16, 2014 at 2:32 PM, Andrew Purtell <apurt...@apache.org> wrote: >> >> Those questions don't have pat answers. HBase has a few interesting load >> dependent tunables and the ceiling you'll encounter depends as much on the >> characteristics of the nodes (particularly, block devices) and the network, >> not merely the software. >> >> We can certainly, through experimentation, establish upper bounds on perf, >> optimizing either for throughput at a given payload size or latency within >> a given bound (your questions #1 and #2). I. e. using now-typical systems >> with 32 cores, 64-128 GB of RAM (and a fair amount allocated to bucket >> cache), and 2-4 solid state volumes, and a 10ge network, here are plots of >> the measured upper bound of metric M on the y-axis over number of slave >> cluster nodes on the X axis. > > Agreed. I'm trying to figure out what guidelines we can establish for a > given hardware profile. > > From what I've seen and understood so far, it's a balancing act between the > following factors for any given type of hardware: > > 1. Write throughput. You are basically bottlenecked on the WAL in this case. > 2. Read latency. You want to keep as much in memory across if the > requirements demand low latency. How does off-heap cache play in here and > what are our experiences in using that in production? > 3. Total storage requirement. What's the amount of data you can store per > node? 12x3TB drives are becoming more common but can HBase leverage that > level of storage density? 40GB regions * 100 regions per server (max) gets > you to 4TB. Replicated, that becomes 12TB. This is pretty much the max load > you want to put on a single server from a memory stand point to achieve > high write throughput or low read latency (factors #1 and #2). > > Am I thinking in the right direction here? > > > > > >> >> Open questions: >> 1. Which measurement tool and test automation? >> 2. Where can we get ~100 decent nodes for a realistic assessment? >> 3. Who's going to fund the test dev and testbed? >> >> >> >> On Wed, Jul 16, 2014 at 1:41 PM, Amandeep Khurana <ama...@gmail.com> >> wrote: >> >>> Thanks Lars. >>> >>> I'm curious how we'd answer questions like: >>> 1. How many nodes do I need to sustain a write throughput of N reqs/sec >>> with payload of size M KB? >>> 2. How many nodes do I need to sustain a read throughput of N reqs/sec >> with >>> payload of size M KB with a latency of X ms per read. >>> 3. How many nodes do I need to store N TB of total data with one of the >>> above constraints? >>> >>> This goes into looking at the bottlenecks that need to be taken into >>> account during write and read times and also the max number of regions >> and >>> region size that a single region server can host. >>> >>> What are your thoughts on this? >>> >>> -Amandeep >>> >>> >>>> On Wed, Jul 16, 2014 at 9:06 AM, lars hofhansl <la...@apache.org> wrote: >>>> >>>> This is a somewhat fuzzy art. >>>> >>>> Some points to consider: >>>> 1. All data is replicated three ways. Or in other words, if you run >> three >>>> RegionServer/Datanodes each machine will get 100% of the writes. If you >>> run >>>> 6, each gets 50% of the writes. From that aspect HBase clusters with >> less >>>> than 9 RegionServers are not really useful. >>>> 2. As for the machines themselves. Just go with any reasonable machine, >>>> and pick the cheapest you can find. At least 8 cores, at least 32GB of >>> RAM, >>>> at least 6 disks, no RAID needed. (we have machines with 12 cores in 2 >>>> sockets, 96GB of RAM, 6 4TB drives, no HW RAID). HBase is not yet well >>>> tuned for SSDs. >>>> >>>> >>>> You also carefully need to consider your network topology. With HBase >>>> you'll see quite some east-west traffic (i.e. between racks). 10ge is >>> good >>>> if you have it. We have 1ge everywhere so far, and we found this is a >>>> single most bottleneck for write performance. >>>> >>>> >>>> Also see this blog post about HBase memory sizing (shameless plug): >> http://hadoop-hbase.blogspot.de/2013/01/hbase-region-server-memory-sizing.html >>>> >>>> >>>> I'm planning a blog post about this topic with more details. >>>> >>>> >>>> -- Lars >>>> >>>> >>>> >>>> ________________________________ >>>> From: Amandeep Khurana <ama...@gmail.com> >>>> To: "user@hbase.apache.org" <user@hbase.apache.org> >>>> Sent: Tuesday, July 15, 2014 10:48 PM >>>> Subject: Cluster sizing guidelines >>>> >>>> >>>> Hi >>>> >>>> How do users usually go about sizing HBase clusters? What are the >> factors >>>> you take into account? What are typical hardware profiles you run with? >>> Any >>>> data points you can share would help. >>>> >>>> Thanks >>>> Amandeep >> >> >> >> -- >> Best regards, >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White)