Hi Mohit, 1. When talking about particular table:
For viewing rows distribution you can check out how regions are distributed. And each region defined by the start/stop key, so depending on your key format, etc. you can see which records go into each region. You can see the regions distribution in web ui as Adrien mentioned. It may also be handy for you to query .META. table [1] which holds regions info. In cases when you use random keys or when you just not sure how data is distributed in key buckets (which are regions), you may also want to look at HBase data on HDFS [2]. Since data is stored for each region separately, you can see the size on the HDFS each one occupies. 2. When talking about whole cluster, it makes sense to use cluster monitoring tool [3], to find out more about overall load distribution, regions of multiple tables distribution, requests amount, and many more such things. And of course, you can use HBase Java API to fetch some data of the cluster state as well. I guess you should start looking at it from HBaseAdmin class. Alex Baranau ------ Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] hbase(main):001:0> scan '.META.', {LIMIT=>1, STARTROW=>"mytable,,"} ROW COLUMN+CELL mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:regioninfo, timestamp=1341279432625, value=REGION => {NAME => 'mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.', STARTKEY => 'chicago', ENDKEY => 'new_york', ENCODED => fd61cd7ef426d2f233a4cd7e8b73845, TABLE => {{NAME => 'mytable', FAMILIES => [{NAME => 'job', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:server, timestamp=1341279432673, value=myserver:60020 mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845. column=info:serverstartcode, timestamp=1341279432673, value=1341267474257 1 row(s) in 0.1980 seconds [2] ubuntu@ip-10-80-47-73:~$ sudo -u hdfs hadoop fs -du /hbase/mytable Found 130 items 3397 hdfs://hbase.master/hbase/mytable /02925d3c335bff7e273f392324f16dca 2682163424 hdfs://hbase.master/hbase/mytable /03231b8ae2b73317c4858b1a85c09ad2 1038862956 hdfs://hbase.master/hbase/mytable /04f911571593e931a9a3d9e2a6616236 1039181555 hdfs://hbase.master/hbase/mytable /0a177633196cae7b158836181d69dc0f 1076888812 hdfs://hbase.master/hbase/mytable /0d52fc477c41a9a236803234d44c7c06 [3] You can get data from JMX directly using any tool you like or use: * Ganglia * SPM monitoring ( http://sematext.com/spm/hbase-performance-monitoring/index.html) * others On Wed, Jul 25, 2012 at 1:59 AM, Adrien Mogenet <adrien.moge...@gmail.com>wrote: > From the web-interface, you can have such statistics when viewing the > details of a table. > You can also develop your own "balance viewer" through the HBase API (list > of RS, regions, storeFiles, their size, etc.) > > On Wed, Jul 25, 2012 at 7:32 AM, Mohit Anchlia <mohitanch...@gmail.com > >wrote: > > > Is there an easy way to tell how my nodes are balanced and how the rows > are > > distributed in the cluster? > > > > > > -- > Adrien Mogenet > 06.59.16.64.22 > http://www.mogenet.me > -- Alex Baranau ------ Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr