We have been testing random reads and from a 6 node cluster (1NN, 5DN, 1HM, 5RS each with 48G, 5 disks) right now seeing a throughput of 1100 per sec per node. Most of the configs are default, except 4G for RS, *handler.count and gc ( http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/). The actual data is 300 byte records, each of which can have upwards of 1000 versions keyed on id which is random/uuid based. Using LZO compression. There are 300M records with distribution almost even across RS which are pre-created and bulk loaded.
I have read the below related posts, but want to see if there is any new input in regards to: - improving random read performance in general - if we have more disks or say double them, can we expect a much higher throughput per node? We tried to drive more concurrent reads through current 5 nodes/5 disks but don't see much change. Looks like RS queue is growing and as does CPU usage... anything we can tweak in RS to let more reads go through? or is this the best we can get? Thanks. Related threads in one place: http://hbase.apache.org/book.html#performance http://wiki.apache.org/hadoop/PerformanceTuning http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation http://search-hadoop.com/m/Umugv1srG951&subj=Re+How+to+improve+random+read+latency+ https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/e6024f24a54d6aed http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/11539 http://www.mail-archive.com/[email protected]/msg06714.html http://hstack.org/hbase-performance-testing/ http://groups.google.com/group/nosql-databases/browse_thread/thread/dc5b30a6863fe42d/b1d54a5b7f99c3fb http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/8653 http://cloudepr.blogspot.com/2009/08/hbase-0200-performance-evaluation.html http://apache-hbase.679495.n3.nabble.com/Re-jira-Created-HBASE-3303-Lower-hbase-regionserver-handler-count-from-25-back-to-10-td2008741.html
