For #1, please look at the following method in HTable.java : public NavigableMap<HRegionInfo, ServerName> getRegionLocations() throws IOException {
Cheers On Fri, Aug 26, 2016 at 3:06 PM, Manish Maheshwari <mylogi...@gmail.com> wrote: > Thanks Rahul. > > 1 - I understand the idea of listing the usage on each of the disks that we > have HBase running on for that table. However how do I map the Nodes to > Regions. I looked at RegionLocator - getStartEndKeys. But these just give > me the values and not the Hostnames where each region is currently running. > Is there a way to map the Region to the Node? > > 2 - Some of our row sizes vary quite a bit depending on the number of > updates to the row. This will give us a rough idea of the size of the > Region, but not the number of Rows. Is there a way to get both.. Apologies > if I am bothering too much.. > > Thanks, > Manish > > > > > > On Fri, Aug 26, 2016 at 12:21 PM, rahul gidwani <rahul.gidw...@gmail.com> > wrote: > > > If you want to see which regionservers are currently hot, then jmx would > be > > the best way to get that data. > > > > If you want to see overall what is hot, you can do this without the use > of > > a scan (it will be a pretty decent estimate) > > > > you can do: > > > > hdfs dfs -du /hbase/data/default/<table_you_care_about>/ > > > > with that data you can create a Map<EncodedRegionName, SizeInBytes> > > > > Then you can use the RegionLocator to find which region resides on which > > machine. > > > > That will tell you the overall skew of your data in terms of raw bytes. > > > > Should be a pretty decent estimate and a lot faster than scanning your > > table provided your table / cluster is sufficiently large. > > > > hope that helps. > > rahul > > > > On Fri, Aug 26, 2016 at 12:11 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > Have you looked at /jmx endpoint on the servers ? > > > Below is a sample w.r.t. the metrics that would be of interest to you: > > > > > > > > > "Namespace_default_table_x_region_6659ba3fe42b4a196daaba9306b505 > > > 51_metric_appendCount" > > > : 0, > > > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > > ad_metric_scanNext_num_ops" > > > : 0, > > > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > > ad_metric_scanNext_min" > > > : 0, > > > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > > ad_metric_scanNext_max" > > > : 0, > > > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > > ad_metric_scanNext_mean" > > > : 0.0, > > > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > > ad_metric_scanNext_median" > > > : 0.0, > > > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > > ad_metric_scanNext_75th_percentile" > > > : 0.0, > > > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > > ad_metric_scanNext_95th_percentile" > > > : 0.0, > > > > > > "Namespace_default_table_x_region_f9965e20458e7dbf3d4d5b439ae576 > > > ad_metric_scanNext_99th_percentile" > > > : 0.0, > > > > > > > > > "Namespace_default_table_x_region_823a39a250e81f45e5ef493740d936 > > > ab_metric_deleteCount" > > > : 0, > > > > > > "Namespace_default_table_x_region_30b82db17b64a83d4aeda9dbd40d62 > > > 15_metric_deleteCount" > > > : 0, > > > > > > "Namespace_default_table_x_region_c6db2e650b3025aa82032b0e0aa8b7 > > > 15_metric_appendCount" > > > : 0, > > > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > > 86_metric_get_num_ops" > > > : 0, > > > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > > 86_metric_get_min" > > > : 0, > > > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > > 86_metric_get_max" > > > : 0, > > > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > > 86_metric_get_mean" > > > : 0.0, > > > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > > 86_metric_get_median" > > > : 0.0, > > > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > > 86_metric_get_75th_percentile" > > > : 0.0, > > > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > > 86_metric_get_95th_percentile" > > > : 0.0, > > > > > > "Namespace_default_table_x_region_94db4fcd7cabc28c406681f172df21 > > > 86_metric_get_99th_percentile" > > > : 0.0, > > > > > > > > > "Namespace_default_table_x_region_5a1fe60f6267c98055b334784e6d76 > > > d2_metric_mutateCount" > > > : 0, > > > > > > "Namespace_default_table_x_region_66bbec5f7e136b226a19b5fdf9f17c > > > be_metric_incrementCount" > > > : 0, > > > > > > On Fri, Aug 26, 2016 at 11:59 AM, Manish Maheshwari < > mylogi...@gmail.com > > > > > > wrote: > > > > > > > Hi Ted, > > > > > > > > I understand the region crash/migration/splitting impact. Currently > we > > > have > > > > hotspotting on few region servers. I am trying to collect the row > stats > > > at > > > > region server and region levels to see how bad the skew of the data > is. > > > > > > > > Manish > > > > > > > > On Fri, Aug 26, 2016 at 10:19 AM, Ted Yu <yuzhih...@gmail.com> > wrote: > > > > > > > > > Can you elaborate on your use case ? > > > > > > > > > > Suppose row A is on server B, after you retrieve row A, the region > > for > > > > row > > > > > A gets moved to server C (load balancer or server crash). Server B > > > would > > > > no > > > > > longer be relevant. > > > > > > > > > > Cheers > > > > > > > > > > On Fri, Aug 26, 2016 at 10:07 AM, Manish Maheshwari < > > > mylogi...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I looked at the HBase Count functionality to count rows in a > Table. > > > Is > > > > > > there a way that we can count the number of rows in Regions & > > Region > > > > > > Servers? When we use a HBase scan, we dont get the Region ID or > > > Region > > > > > > Server of the row. Is there a way to do this via Scans? > > > > > > > > > > > > Thanks, > > > > > > Manish > > > > > > > > > > > > > > > > > > > > >