We have been testing random reads and from a 6 node cluster (1NN, 5DN, 1HM,
5RS each with 48G, 5 disks) right now seeing a throughput of 1100 per sec
per node. Most of the configs are default, except 4G for RS, *handler.count
and gc (
http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/).
The actual data is 300 byte records, each of which can have upwards of 1000
versions keyed on id which is random/uuid based. Using LZO compression.
There are 300M records with distribution almost even across RS which are
pre-created and bulk loaded.


I have read the below related posts, but want to see if there is any new
input in regards to:

- improving random read performance in general

- if we have more disks or say double them, can we expect a much higher
throughput per node? We tried to drive more concurrent reads through current
5 nodes/5 disks but don't see much change. Looks like RS queue is growing
and as does CPU usage... anything we can tweak in RS to let more reads go
through? or is this the best we can get?


Thanks.



Related threads in one place:

http://hbase.apache.org/book.html#performance

http://wiki.apache.org/hadoop/PerformanceTuning

http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation

http://search-hadoop.com/m/Umugv1srG951&subj=Re+How+to+improve+random+read+latency+

https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/e6024f24a54d6aed

http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/11539

http://www.mail-archive.com/[email protected]/msg06714.html

http://hstack.org/hbase-performance-testing/

http://groups.google.com/group/nosql-databases/browse_thread/thread/dc5b30a6863fe42d/b1d54a5b7f99c3fb

http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/8653

http://cloudepr.blogspot.com/2009/08/hbase-0200-performance-evaluation.html

http://apache-hbase.679495.n3.nabble.com/Re-jira-Created-HBASE-3303-Lower-hbase-regionserver-handler-count-from-25-back-to-10-td2008741.html

Reply via email to