We are in the process of evaluating hbase in an effort to switch from a different nosql solution. Performance is of course an important part of our evaluation. We are a python shop and we are very worried that we can not get any real performance out of hbase using thrift (and must drop down to java). We are aware of the various lower level options for bulk insert or java based inserts with turning off WAL etc. but none of these are available to us in python so are not part of our evaluation. We have a 10 node cluster (24gb, 6 x 1TB, 16 core) that we setting up as data/region nodes, and we are looking for suggestions on configuration as well as benchmarks in terms of expectations of performance. Below are some specific questions. I realize there are a million factors that help determine specific performance numbers, so any examples of performance from running clusters would be great as examples of what can be done. Again thrift seems to be our "problem" so non java based solutions are preferred (do any non java based shops run large scale hbase clusters?). Our total production cluster size is estimated to be 50TB.
Our data model is 3 CFs, one primary and 2 secondary indexes. All writes go to all 3 CFs and are grouped as a batch of row mutations which should avoid row locking issues. What heap size is recommended for master, and for region servers (24gb ram)? What other settings can/should be tweaked in hbase to optimize performance (we have looked at the wiki page)? What is a good batch size for writes? We will start with 10k values/batch. How many concurrent writers/readers can a single data node handle with evenly distributed load? Are there settings specific to this? What is "very good" read/write latency for a single put/get in hbase using thrift? What is "very good" read/write throughput per node in hbase using thrift? We are looking to get performance numbers in the range of 10k aggregate inserts/sec/node and read latency < 30ms/read with 3-4 concurrent readers/node. Can our expectations be met with hbase through thrift? Can they be met with hbase through java? Thanks in advance for any help, examples, or recommendations that you can provide! Wayne
