Cycling old bits: http://search-hadoop.com/m/YGbb3E2a71UVLBK&subj=Re+HBase+Count+Rows+in+Regions+and+Region+Servers
You can use /jmx to inspect regions and find the hotspot. On Mon, Aug 29, 2016 at 7:29 AM, Manish Maheshwari <mylogi...@gmail.com> wrote: > Hi Dima, > > Thanks for the suggestion. We can load the data in heap, but Hbase makes it > easier for one to write and another to read. With heap we need to build a > process to handle both processes and also write to log so as to not lose > the updates in case of process failure. > > Thanks > Manish > > On Aug 29, 2016 2:18 PM, "Dima Spivak" <dspi...@cloudera.com> wrote: > > > (Though if it is only 7 GB, why not just store it in memory?) > > > > On Sunday, August 28, 2016, Dima Spivak <dspi...@cloudera.com> wrote: > > > > > If your data can all fit on one machine, HBase is not the best choice. > I > > > think you'd be better off using a simpler solution for small data and > > leave > > > HBase for use cases that require proper clusters. > > > > > > On Sunday, August 28, 2016, Manish Maheshwari <mylogi...@gmail.com > > > <javascript:_e(%7B%7D,'cvml','mylogi...@gmail.com');>> wrote: > > > > > >> We dont want to invest into another DB like Dynamo, Cassandra and > > Already > > >> are in the Hadoop Stack. Managing another DB would be a pain. Why > HBase > > >> over RDMS, is because we call HBase via Spark Streaming to lookup the > > >> keys. > > >> > > >> Manish > > >> > > >> On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak <dspi...@cloudera.com> > > >> wrote: > > >> > > >> > Hey Manish, > > >> > > > >> > Just to ask the naive question, why use HBase if the data fits into > > >> such a > > >> > small table? > > >> > > > >> > On Sunday, August 28, 2016, Manish Maheshwari <mylogi...@gmail.com> > > >> wrote: > > >> > > > >> > > Hi, > > >> > > > > >> > > We have a scenario where HBase is used like a Key Value Database > to > > >> map > > >> > > Keys to Regions. We have over 5 Million Keys, but the table size > is > > >> less > > >> > > than 7 GB. The read volume is pretty high - About 50x of the > > >> put/delete > > >> > > volume. This causes hot spotting on the Data Node and the region > is > > >> not > > >> > > split. We cannot change the maxregionsize parameter as that will > > >> impact > > >> > > other tables too. > > >> > > > > >> > > Our idea is to manually inspect the row key ranges and then split > > the > > >> > > region manually and assign them to different region servers. We > will > > >> > > continue to then monitor the rows in one region to see if needs to > > be > > >> > > split. > > >> > > > > >> > > Any experience of doing this on HBase. Is this a recommended > > approach? > > >> > > > > >> > > Thanks, > > >> > > Manish > > >> > > > > >> > > > >> > > > >> > -- > > >> > -Dima > > >> > > > >> > > > > > > > > > -- > > > -Dima > > > > > > > > > > -- > > -Dima > > >