Re: Major compaction

2016-04-04 Thread Sumit Nigam
Hello all, Thanks a lot for your replies. @Frank - I will try the compactor you wrote and let you know how it goes. @Esteban - I am trying to understand how to reduce major compaction load. In my cluster whenever it happens, it takes down a region server or two. My setting are - disable time

Re: hbase custom scan

2016-04-04 Thread Shushant Arora
table will have ~100 regions. I did n't get the advantage of top rows from same vs different regions ? They will come from different regions . On Tue, Apr 5, 2016 at 9:10 AM, Ted Yu wrote: > How many regions does your table have ? > > After sorting, is there a chance that

Re: hbase custom scan

2016-04-04 Thread Ted Yu
How many regions does your table have ? After sorting, is there a chance that the top N rows come from distinct regions ? On Mon, Apr 4, 2016 at 8:27 PM, Shushant Arora wrote: > Hi > > I have a requirement to scan a hbase table based on insertion timestamp. > I need

hbase custom scan

2016-04-04 Thread Shushant Arora
Hi I have a requirement to scan a hbase table based on insertion timestamp. I need to fetch the keys sorted by insertion timestamp not by key . I can't made timestamp as prefix of key to avoid hot spotting. Is there any efficient way possible for this requirement. Thanks!

答复: Major compaction

2016-04-04 Thread Liu, Ming (Ming)
Thanks Frank, this is something I am looking for. Would like to have a try with it. Thanks, Ming -邮件原件- 发件人: Frank Luo [mailto:j...@merkleinc.com] 发送时间: 2016年4月5日 1:38 收件人: user@hbase.apache.org 抄送: Sumit Nigam 主题: RE: Major compaction I wrote a small program to

Re: HBase table map to hive

2016-04-04 Thread Wojciech Indyk
Hi! You can use map on your column family or a prefix of column qualifier. https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseColumnFamily -- Kind regards/ Pozdrawiam, Wojciech Indyk http://datacentric.pl 2016-04-04 14:13 GMT+02:00

RE: Major compaction

2016-04-04 Thread Frank Luo
I wrote a small program to do MC in a "smart" way here: https://github.com/jinyeluo/smarthbasecompactor/ Instead of blindly running MC on a table level, the program find a non-hot regions that has most store-files on a per region-server base, and run MC on them. Once done, it finds the next

Re: Major compaction

2016-04-04 Thread Vladimir Rodionov
>> Why I am trying to understand this is because Hbase also sets it to 24 hour default (for time based compaction) and I am looking to lower it to say >> 20 mins to reduce stress by spreading the load. The more frequently you run major compaction the more IO (disk/network) you consume. Usually,

Re: Connecting to hbase 1.0.3 via java client stuck at zookeeper.ClientCnxn: Session establishment complete on server

2016-04-04 Thread Sachin Mittal
There is additional information I would like to share with you which points to region server dying or something or connecting/resolving a wrong region server. Here is the log when trying to connect to server: [main-EventThread] zookeeper.ZooKeeperWatcher: hconnection-0x1e67b872-0x153e135af570008

Re: Major compaction

2016-04-04 Thread Esteban Gutierrez
Hello Sumit, Ideally you shouldn't be triggering major compactions that frequently since minor compactions should be taking care of reducing the number of store files. The caveat of doing it more frequently is the additional disk/network I/O. Can you please elaborate more on "reduce stress by

Major compaction

2016-04-04 Thread Sumit Nigam
Hi, Are there major overheads to running major compaction frequently? As much as I know, it produces one Hfile for a region and processes delete markers and version related drops. So, if this process has happened once say. a few mins back then another major compaction should ideally not cause

Re: Retiring empty regions

2016-04-04 Thread Nick Dimiduk
> Crazy idea, but you might be able to take stripped down version of region > normalizer code and make a Tool to run? Requesting split or merge is done > through the client API, and the only weighing information you need is > whether region empty or not, that you could find out too? Yeah, that's

HBase table map to hive

2016-04-04 Thread ram kumar
Hi, I have a hbase table with column name changes (increases) over time. Is there a way to map such hbase to hive table, inferring schema from the hbase table? Thanks