Re: Row distribution

Alex Baranau Wed, 25 Jul 2012 06:53:43 -0700

Hi Mohit,

1. When talking about particular table:

For viewing rows distribution you can check out how regions are
distributed. And each region defined by the start/stop key, so depending on
your key format, etc. you can see which records go into each region. You
can see the regions distribution in web ui as Adrien mentioned. It may also
be handy for you to query .META. table [1] which holds regions info.

In cases when you use random keys or when you just not sure how data is
distributed in key buckets (which are regions), you may also want to look
at HBase data on HDFS [2]. Since data is stored for each region separately,
you can see the size on the HDFS each one occupies.

2. When talking about whole cluster, it makes sense to use cluster
monitoring tool [3], to find out more about overall load distribution,
regions of multiple tables distribution, requests amount, and many more
such things.

And of course, you can use HBase Java API to fetch some data of the cluster
state as well. I guess you should start looking at it from HBaseAdmin class.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

[1]

hbase(main):001:0> scan '.META.', {LIMIT=>1, STARTROW=>"mytable,,"}
ROW
COLUMN+CELL

 mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.
 column=info:regioninfo, timestamp=1341279432625, value=REGION => {NAME =>
'mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.', STARTKEY =>
'chicago', ENDKEY => 'new_york', ENCODED =>
fd61cd7ef426d2f233a4cd7e8b73845, TABLE => {{NAME => 'mytable', FAMILIES =>
[{NAME => 'job', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}

 mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.
 column=info:server, timestamp=1341279432673, value=myserver:60020

 mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.
 column=info:serverstartcode, timestamp=1341279432673, value=1341267474257

1 row(s) in 0.1980 seconds

[2]

ubuntu@ip-10-80-47-73:~$ sudo -u hdfs hadoop fs -du /hbase/mytable
Found 130 items
3397        hdfs://hbase.master/hbase/mytable
/02925d3c335bff7e273f392324f16dca
2682163424  hdfs://hbase.master/hbase/mytable
/03231b8ae2b73317c4858b1a85c09ad2
1038862956  hdfs://hbase.master/hbase/mytable
/04f911571593e931a9a3d9e2a6616236
1039181555  hdfs://hbase.master/hbase/mytable
/0a177633196cae7b158836181d69dc0f
1076888812  hdfs://hbase.master/hbase/mytable
/0d52fc477c41a9a236803234d44c7c06

[3]
You can get data from JMX directly using any tool you like or use:
* Ganglia
* SPM monitoring (
http://sematext.com/spm/hbase-performance-monitoring/index.html)
* others

On Wed, Jul 25, 2012 at 1:59 AM, Adrien Mogenet <adrien.moge...@gmail.com>wrote:

> From the web-interface, you can have such statistics when viewing the
> details of a table.
> You can also develop your own "balance viewer" through the HBase API (list
> of RS, regions, storeFiles, their size, etc.)
>
> On Wed, Jul 25, 2012 at 7:32 AM, Mohit Anchlia <mohitanch...@gmail.com
> >wrote:
>
> > Is there an easy way to tell how my nodes are balanced and how the rows
> are
> > distributed in the cluster?
> >
>
>
>
> --
> Adrien Mogenet
> 06.59.16.64.22
> http://www.mogenet.me
>

-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Re: Row distribution

Reply via email to