Re: HBase for Small Key Value Tables

2016-08-30 Thread Ted Yu
You don't need to rebuild hbase. Just add entry in hbase-site.xml for the following config: > hbase.master.balancer.stochastic.tableSkewCost Restart master after the addition. Cheers > On Aug 30, 2016, at 12:10 AM, Manish Maheshwari wrote: > > Hi Ted, > > Where do we

Re: HBase for Small Key Value Tables

2016-08-30 Thread Manish Maheshwari
Hi Ted, Where do we set this value DEFAULT_TABLE_SKEW_COST = 35. I see it in only in StochasticLoadBalancer.java We don't find this in any of the HBase Config files. Do we need to re-build HBase from code for this? Thanks, Manish On Tue, Aug 30, 2016 at 6:44 AM, Ted Yu

Re: HBase for Small Key Value Tables

2016-08-29 Thread Ted Yu
StochasticLoadBalancer by default would balance regions evenly across the cluster. Regions of particular table may not be evenly distributed. Increase the value for the following config: private static final String TABLE_SKEW_COST_KEY =

Re: HBase for Small Key Value Tables

2016-08-29 Thread Manish Maheshwari
Thanks Ted for the maxregionsize per table idea. We will try to keep it around 1-2 Gigs and see how it goes. Will this also make sure that the region migrates to another region server? Or do we still need to do that manually? On JMX, Since the environment is production, we are yet unable to use

Re: HBase for Small Key Value Tables

2016-08-29 Thread Ted Yu
bq. We cannot change the maxregionsize parameter The region size can be changed on per table basis: hbase> alter 't1', MAX_FILESIZE => '134217728' See the beginning of hbase-shell/src/main/ruby/shell/commands/alter.rb for more details. FYI On Sun, Aug 28, 2016 at 10:44 PM, Manish Maheshwari

Re: HBase for Small Key Value Tables

2016-08-29 Thread Ted Yu
Cycling old bits: http://search-hadoop.com/m/YGbb3E2a71UVLBK=Re+HBase+Count+Rows+in+Regions+and+Region+Servers You can use /jmx to inspect regions and find the hotspot. On Mon, Aug 29, 2016 at 7:29 AM, Manish Maheshwari wrote: > Hi Dima, > > Thanks for the suggestion. We

Re: HBase for Small Key Value Tables

2016-08-29 Thread Manish Maheshwari
Hi Dima, Thanks for the suggestion. We can load the data in heap, but Hbase makes it easier for one to write and another to read. With heap we need to build a process to handle both processes and also write to log so as to not lose the updates in case of process failure. Thanks Manish On Aug

Re: HBase for Small Key Value Tables

2016-08-29 Thread Dima Spivak
(Though if it is only 7 GB, why not just store it in memory?) On Sunday, August 28, 2016, Dima Spivak wrote: > If your data can all fit on one machine, HBase is not the best choice. I > think you'd be better off using a simpler solution for small data and leave > HBase for

Re: HBase for Small Key Value Tables

2016-08-29 Thread Dima Spivak
If your data can all fit on one machine, HBase is not the best choice. I think you'd be better off using a simpler solution for small data and leave HBase for use cases that require proper clusters. On Sunday, August 28, 2016, Manish Maheshwari wrote: > We dont want to

Re: HBase for Small Key Value Tables

2016-08-28 Thread Manish Maheshwari
We dont want to invest into another DB like Dynamo, Cassandra and Already are in the Hadoop Stack. Managing another DB would be a pain. Why HBase over RDMS, is because we call HBase via Spark Streaming to lookup the keys. Manish On Mon, Aug 29, 2016 at 1:47 PM, Dima Spivak

Re: HBase for Small Key Value Tables

2016-08-28 Thread Dima Spivak
Hey Manish, Just to ask the naive question, why use HBase if the data fits into such a small table? On Sunday, August 28, 2016, Manish Maheshwari wrote: > Hi, > > We have a scenario where HBase is used like a Key Value Database to map > Keys to Regions. We have over 5

HBase for Small Key Value Tables

2016-08-28 Thread Manish Maheshwari
Hi, We have a scenario where HBase is used like a Key Value Database to map Keys to Regions. We have over 5 Million Keys, but the table size is less than 7 GB. The read volume is pretty high - About 50x of the put/delete volume. This causes hot spotting on the Data Node and the region is not