RE: Is it necessary to set MD5 on rowkey?

2013-12-17 Thread bigdata
Hello, @Alex Baranau Thanks for your salt solution. In my understanding, the salt solution is divide the data into several partial(if 2 letters,00~FF, then 255 parts will be devided). My question is when I want to scan data, do I need scan 256 times for the following situation:rowkey: salt

Thrift Error in HBase

2013-12-17 Thread Ramon Wang
Hi Folks We upgraded our cluster to CDH4.5.0 recently, HBase version is 0.94.6-cdh4.5.0 now. Our client program(written in Python) cannot save data by using Thrift, there are errors happen when we are trying to save data with many columns(more than 7 or 8), and here is the error log: 2013-12-17

Re: Is it necessary to set MD5 on rowkey?

2013-12-17 Thread Damien Hardy
Hello, yes you need 256 scans range or a full (almost) scan with combination of filters for each 256 ranges (https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.Operator.html#MUST_PASS_ONE) For mapreduce, the getsplit() method should be modified from TableInputFormatBase

Re: Thrift Error in HBase

2013-12-17 Thread ramkrishna vasudevan
Due to some reason the row that is created inside the BatchMutation is null. Can you check your Thrift client code where BatchMutation is created? On Tue, Dec 17, 2013 at 2:45 PM, Ramon Wang ra...@appannie.com wrote: Hi Folks We upgraded our cluster to CDH4.5.0 recently, HBase version is

Re: Bulk load moving HFiles to the wrong region

2013-12-17 Thread Amit Sela
Like I mentioned before, running with all reducers works fine. Running with the extension of HFileOutputFormat fails, sometimes, on some tables. .META. encoded qualifier points to different directories for the different regions files are supposedly loaded into. The directories actually do exist,

Re: Thrift Error in HBase

2013-12-17 Thread Anoop John
As per the line no it comes as * byte*[][] famAndQf = KeyValue.*parseColumn*(*getBytes*(m.column)); column inside Mutation comes as null... Can you check client code -Anoop- On Tue, Dec 17, 2013 at 2:59 PM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Due to some reason the

RE: Is it necessary to set MD5 on rowkey?

2013-12-17 Thread bigdata
Thanks for your reply, Damien. So this solution still use one scan object, and sent it to initTableMapperJob? Does modified getsplit() function set the salt Bucket number to the number of mapper?If I set 256 salt buckets, and the mapper number will be 256, right? Another question is can this

Re: Bulk load moving HFiles to the wrong region

2013-12-17 Thread Amit Sela
Region server logs in region servers that were supposed to get the loaded data show that they get request to open the (correct) region, and they open it. But only in the region server where the data is actually loaded in to have the move in the log, for all file.. The log actually shows it copies

Re: Thrift Error in HBase

2013-12-17 Thread Ramon Wang
Thanks guys, we have fixed it by reinstall the python depended Thrift lib. Cheers Ramon

Re: Is it necessary to set MD5 on rowkey?

2013-12-17 Thread Damien Hardy
Using a custom InputFormat with dedicated getsplit() allow you to use a single scan object when initiating job. It is cloned later by each mapper setting startrow and stoprow according the list returned by getsplit(). Getsplit would return a list of couple (startrow, stoprow) calculated based on

RE: Bulk load moving HFiles to the wrong region

2013-12-17 Thread Bijieshan
The previous last region is not supposed to delete I'm just adding new regions (always following lexicographically) so that the last region before the pre-split is not the last anymore. You mean you added the new regions into META? Sorry if I misunderstood you here. But can you tell me

Re: Bulk load moving HFiles to the wrong region

2013-12-17 Thread Amit Sela
Indeed there are more than 2 split points, there are 4 split points for 5 new regions added each day. the new data bulk loaded each day belongs to he new regions. It seems like the partitions read are from the previous insertion, and if that is the case, the comparator will surely indicate that

RE: Why so many unexpected files like partitions_xxxx are created?

2013-12-17 Thread Bijieshan
Yes, it should be cleaned up. But not included in current code in my understanding. Jieshan. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Tuesday, December 17, 2013 10:55 AM To: user@hbase.apache.org Subject: Re: Why so many unexpected files like partitions_

RE: Bulk load moving HFiles to the wrong region

2013-12-17 Thread Bijieshan
Where does the RegionServer save the partitions file written to DistributedCache ? There's no need for RegionServer to save the partitions file. It seems you added the new regions directly into META, and didn't change the endkey of the last previous region? Jieshan. -Original Message-

Re: Newbie question: Rowkey design

2013-12-17 Thread yonghu
In my opinion, it really depends on your queries. The first one achieves data locality. There is no additional data transmit between different nodes. But this strategy sacrifices parallelism and the node which stores A will be a hot node if too many applications try to access A. The second

scan table using multi row prefix match

2013-12-17 Thread fateme Abiri
hi friends I want to use a filter in hbase to return  rows with different prefix... for eg. my rows structure are  id+URLStrings so i want to return rows with  4  id prefix : 12234 4534 134 4234 how can I do that? i use FilterLis RowFilterlist; RowFilterlist= new

Re: scan table using multi row prefix match

2013-12-17 Thread Ted Yu
An 'E' is missing from MUST_PASS_ON The for loop has 5 iterations instead of 4. Cheers On Dec 17, 2013, at 6:14 AM, fateme Abiri fateme.ab...@yahoo.com wrote: hi friends I want to use a filter in hbase to return rows with different prefix... for eg. my rows structure are id+URLStrings

Re: scan table using multi row prefix match

2013-12-17 Thread fateme Abiri
hi my freiend... tanx for your feedback... i m sorry, i take a mistake when i wrote in my email, i write correctly in my IDE, but  the rows which was returned only match with one of the idprefix filters!!! On Tuesday, December 17, 2013 5:57 PM, Ted Yu yuzhih...@gmail.com wrote: An 'E' is

Re: scan table using multi row prefix match

2013-12-17 Thread Ted Yu
Have you looked at this filter ? src/main/java/org/apache/hadoop/hbase/filter/PrefixFilter.java Cheers On Tue, Dec 17, 2013 at 7:14 AM, fateme Abiri fateme.ab...@yahoo.comwrote: hi my freiend... tanx for your feedback... i m sorry, i take a mistake when i wrote in my email, i write

Re: Newbie question: Rowkey design

2013-12-17 Thread Wilm Schumacher
I was afraid of this answer and suspected it ;). I knew that the answer would depend on the actual setting, but I hoped, that there is is a little hint. Thanks a lot for your time and the answer. I will try it out with test data (and a simple table design) and will share my experiments when they

[96.x] replacement for HConnectionManager:deleteConnection() and deleteAllConnection()

2013-12-17 Thread Demai Ni
hi, folks, we are currently using both calls. They are being deprecated. Wondering what APIs should be used to replacement them? many thanks Demai

Re: [96.x] replacement for HConnectionManager:deleteConnection() and deleteAllConnection()

2013-12-17 Thread Ted Yu
See: HBASE-7626 Backport client connection cleanup from HBASE-7460 Cheers On Tue, Dec 17, 2013 at 1:17 PM, Demai Ni nid...@gmail.com wrote: hi, folks, we are currently using both calls. They are being deprecated. Wondering what APIs should be used to replacement them? many thanks Demai

Re: [96.x] replacement for HConnectionManager:deleteConnection() and deleteAllConnection()

2013-12-17 Thread Demai Ni
Ted, thanks. these two patches marked deleteConnection(Configuration conf, boolean stopProxy) as deprecated, and left deleteConnection(Configuration conf) as the API for 94 . However, deleteConnection(Configuration conf) is marked as deprecated on 96.0 now. Is there a way to search which patch

Re: Why so many unexpected files like partitions_xxxx are created?

2013-12-17 Thread Tao Xiao
BTW, I noticed another problem. I bulk load data into HBase every five minutes, but I found that whenever the following command was executed hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles HFiles-Dir MyTable there is a new process called LoadIncrementalHFiles I can see many

Re: Why so many unexpected files like partitions_xxxx are created?

2013-12-17 Thread Ted Yu
Tao: Can you jstack one such process next time you see them hanging ? Thanks On Tue, Dec 17, 2013 at 6:31 PM, Tao Xiao xiaotao.cs@gmail.com wrote: BTW, I noticed another problem. I bulk load data into HBase every five minutes, but I found that whenever the following command was executed

Re: Problems with hbase.hregion.max.filesize

2013-12-17 Thread Timo Schaepe
Hey, sorry for the answer delay, I had a flight to San Francisco and fighting with the jetleg. I am here on vacation, maybe I can visit some interesting talks about HBase/Hadoop :). Am 14.12.2013 um 13:14 schrieb lars hofhansl la...@apache.org: Did you observe anything interesting with such

Re: Problems with hbase.hregion.max.filesize

2013-12-17 Thread Timo Schaepe
Hey Azuryy Yu, yep, checked the GC log, nothing there. I think, there is no special JVM configuration: export HBASE_OPTS=-XX:+UseConcMarkSweepGC export SERVER_GC_OPTS=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1

Re: Problems with hbase.hregion.max.filesize

2013-12-17 Thread Timo Schaepe
Hey Ted Yu, I had digging the name node log and so far I've found nothing special. No Exception, FATAL or ERROR message nor anything other peculiarities. Only I see a lot of messages like this: 2013-12-12 13:53:22,541 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on

Re: Why so many unexpected files like partitions_xxxx are created?

2013-12-17 Thread Tao Xiao
I did jstack one such process and can see the following output in the terminal, and I guess this info told us that the processes started by the command LoadIncrementalHFiles never exit. Why didn't they exit after finished running ? ... ... ... ...

RE: Why so many unexpected files like partitions_xxxx are created?

2013-12-17 Thread Bijieshan
It seems LoadIncrementalHFiles is still running. Can you run jstack on 1 RegionServer process also? Which version are you using? Jieshan. -Original Message- From: Tao Xiao [mailto:xiaotao.cs@gmail.com] Sent: Wednesday, December 18, 2013 1:49 PM To: user@hbase.apache.org Subject: