For #1, please look at the following method in HTable.java :

  public NavigableMap<HRegionInfo, ServerName> getRegionLocations() throws
IOException {

Cheers

On Fri, Aug 26, 2016 at 3:06 PM, Manish Maheshwari <mylogi...@gmail.com>
wrote:

> Thanks Rahul.
>
> 1 - I understand the idea of listing the usage on each of the disks that we
> have HBase running on for that table. However how do I map the Nodes to
> Regions. I looked at RegionLocator - getStartEndKeys. But these just give
> me the values and not the Hostnames where each region is currently running.
> Is there a way to map the Region to the Node?
>
> 2 - Some of our row sizes vary quite a bit depending on the number of
> updates to the row. This will give us a rough idea of the size of the
> Region, but not the number of Rows. Is there a way to get both.. Apologies
> if I am bothering too much..
>
> Thanks,
> Manish
>
>
>
>
>
> On Fri, Aug 26, 2016 at 12:21 PM, rahul gidwani <rahul.gidw...@gmail.com>
> wrote:
>
> > If you want to see which regionservers are currently hot, then jmx would
> be
> > the best way to get that data.
> >
> > If you want to see overall what is hot, you can do this without the use
> of
> > a scan (it will be a pretty decent estimate)
> >
> > you can do:
> >
> > hdfs dfs -du /hbase/data/default/<table_you_care_about>/
> >
> > with that data you can create a Map<EncodedRegionName, SizeInBytes>
> >
> > Then you can use the RegionLocator to find which region resides on which
> > machine.
> >
> > That will tell you the overall skew of your data in terms of raw bytes.
> >
> > Should be a pretty decent estimate and a lot faster than scanning your
> > table provided your table / cluster is sufficiently large.
> >
> > hope that helps.
> > rahul
> >
> > On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > Have you looked at /jmx endpoint on the servers ?
> > > Below is a sample w.r.t. the metrics that would be of interest to you:
> > >
> > >
> > > "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505
> > > 51_metric_appendCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_num_ops"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_min"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_max"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_mean"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_median"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_75th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_95th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576
> > > ad_metric_scanNext_99th_percentile"
> > > : 0.0,
> > >
> > >
> > > "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936
> > > ab_metric_deleteCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62
> > > 15_metric_deleteCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7
> > > 15_metric_appendCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_num_ops"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_min"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_max"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_mean"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_median"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_75th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_95th_percentile"
> > > : 0.0,
> > >
> > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21
> > > 86_metric_get_99th_percentile"
> > > : 0.0,
> > >
> > >
> > > "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76
> > > d2_metric_mutateCount"
> > > : 0,
> > >
> > > "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c
> > > be_metric_incrementCount"
> > > : 0,
> > >
> > > On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari <
> mylogi...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > I understand the region crash/migration/splitting impact. Currently
> we
> > > have
> > > > hotspotting on few region servers. I am trying to collect the row
> stats
> > > at
> > > > region server and region levels to see how bad the skew of the data
> is.
> > > >
> > > > Manish
> > > >
> > > > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu <yuzhih...@gmail.com>
> wrote:
> > > >
> > > > > Can you elaborate on your use case ?
> > > > >
> > > > > Suppose row A is on server B, after you retrieve row A, the region
> > for
> > > > row
> > > > > A gets moved to server C (load balancer or server crash). Server B
> > > would
> > > > no
> > > > > longer be relevant.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari <
> > > mylogi...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I looked at the HBase Count functionality to count rows in a
> Table.
> > > Is
> > > > > > there a way that we can count the number of rows in Regions &
> > Region
> > > > > > Servers? When we use a HBase scan, we dont get the Region ID or
> > > Region
> > > > > > Server of the row. Is there a way to do this via Scans?
> > > > > >
> > > > > > Thanks,
> > > > > > Manish
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to