Re: Filters for hbase scans require reboot.

2012-11-01 Thread Jonathan Bishop
OK. On Thu, Nov 1, 2012 at 9:06 PM, Anoop Sam John wrote: > > Yes Jonathan as of now we need a reboot.. Take a look at HBASE-1936. This > is not completed. You can give your thoughts there and have a look at the > patch/discussion... > > -Anoop- > > Fro

RE: Filters for hbase scans require reboot.

2012-11-01 Thread Anoop Sam John
Yes Jonathan as of now we need a reboot.. Take a look at HBASE-1936. This is not completed. You can give your thoughts there and have a look at the patch/discussion... -Anoop- From: Jonathan Bishop [jbishop@gmail.com] Sent: Friday, November 02, 2012

RE: Bulk Loading - LoadIncrementalHFiles

2012-11-01 Thread Anoop Sam John
Hi Yes while doing the bulk load the table can be presplit. It will have the same number of reducers as that of the region. One per region. Each HFile that the reducer generates will be having a max size of HFile max size configuration. You can see that while bulk loading also there will b

Re: how to "copy" oracle to HBASE, just like goldengate

2012-11-01 Thread Shumin Wu
Have you taken a look at the Sqoop (http://sqoop.apache.org/) tool? Shumin On Thu, Nov 1, 2012 at 6:44 PM, Xiang Hua wrote: > Hi, >IS there any tool to 'copy' whole oracle data of an instance into > 'hbase'. > > > Best R. >huaxiang >

Re: hbase 0.94.0 failed to individually run test case with org.apache.hadoop.hbase.TestZookeeper

2012-11-01 Thread Sergey Shelukhin
http://hbase.apache.org/book.html#hbase.unittests.cmds has the description of the various commands that you can use to run various categories. Maven docs tell you how to skip some test via config (project file): http://maven.apache.org/plugins/maven-surefire-plugin/examples/inclusion-exclusion.htm

how to "copy" oracle to HBASE, just like goldengate

2012-11-01 Thread Xiang Hua
Hi, IS there any tool to 'copy' whole oracle data of an instance into 'hbase'. Best R. huaxiang

Re: Does hbase.hregion.max.filesize have a limit?

2012-11-01 Thread Cheng Su
Thank you all guys. I found out that I misunderstood the "size of a region" and "size of a region server". I found this property 193- 194-hbase.regionserver.regionSplitLimit 195-2147483647 196-Limit for the number of regions after which no more region 197:splitting should take pl

Re: Struggling with Region Servers Running out of Memory

2012-11-01 Thread Jeff Whiting
Also can any of the other call queues just fill up for ever and cause OOME as well? I don't see any code the limits the queue size based off of the amount of memory they are using so it seems like any of them (priorityCallQueue, the replicaitonQueue or the callQueue which are all in the HBaseSer

Re: Struggling with Region Servers Running out of Memory

2012-11-01 Thread Jeff Whiting
Ok so I'm looking through the code. It looks like in HBaseServer.java it will create a replicationQueue if hbase.regionserver.replication.handler.count > 0. We haven't changed that so the default is 3. The replicationQueue is then shared with handlers. Then in processData(byte[] buf) if it i

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Ameya Kantikar
Hi Kevin, I was trying to pre split the table from shell, but either compression or splitting did not work. I tried following: create 'test1', { NAME => 'cf1', SPLITS => ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p' 'q', 'r', 's', 't', 'u', 'v', 'w', 'z'] } disa

Re: Struggling with Region Servers Running out of Memory

2012-11-01 Thread Jeff Whiting
So this is some of what I'm seeing as I go through the profiles: (a) 2GB - org.apache.hadoop.hbase.io.hfile.LruBlockCache This looks like it is the block cache and we aren't having any problems with that... (b) 1.4GB - org.apache.hadoop.hbase.regionserver.HRegionServer -- java.util.concurr

Re: Hbase thrift connections to Zookeeper

2012-11-01 Thread Jean-Daniel Cryans
Should be only 1 if you're using 0.92 and later. J-D On Wed, Oct 31, 2012 at 3:36 PM, Varun Sharma wrote: > Hi, > > Currently, does the hbase thrift server maintain a fixed size pool of > connections to talk to the zookeeper or does it create an additional > zookeeper connection for each incomin

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Kevin O'dell
Ameya, If your new table goes well(did you presplit this time?), then what we can do for the old one: rm /hbase/tablename hbck -fixMeta -fixAssignments restart HBase if it is still present All should be well. Please let us know how it goes. On Thu, Nov 1, 2012 at 2:44 PM, Ameya Kantikar wrote

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Ameya Kantikar
Thanks Kevin & Ram. Please find my answers below: Did you presplit your table? - NO You are on .92, might as well take advantage of HFilev2 and use 10GB region sizes - - I have put my region size now at 10GB and running another load in a separate table, but my existing table is still in bad sha

Re: Region server heap used memory usage?

2012-11-01 Thread Lashing
Thanks for the detail explanation. BTW, are there any rule of thumb for how large should I reserve for those "in flight" memory ? > So are you looking at the total process size (e.g. top) or are you looking at > the amount of memory the jvm says it is using (e.g. jmap -heap )? > > If you are

Re: hbase 0.94.0 failed to individually run test case with org.apache.hadoop.hbase.TestZookeeper

2012-11-01 Thread Liping Zhang
Hello Nicolas, Oh, it is a typo. :) Thanks very much! By the way, can you also help to answer following question? HBase unit tests (command `mvn test`) are *seperated into two parts* in*HBase 0.94.0, * do you know how to let it only run the first part, but not run the second part with `mvn test

Re: Struggling with Region Servers Running out of Memory

2012-11-01 Thread Jeff Whiting
Good to know. Its nice they finally got that in. We aren't on u36 right now in production but I'm going to push on getting us there. Thanks, ~Jeff On 11/1/2012 11:07 AM, Jeremy Carroll wrote: Java 6 update 34 can rotate GC Logs. -XX:+UseGCLogFileRotation http://stackoverflow.com/questions/3

Re: Hbase cluster for serving real time site traffic

2012-11-01 Thread Patrick Angeles
I should have added, that, if you have one host for all the master roles (NN, JT, HMaster) then you may as well go with a single ZK node (quorum = 1) on that same server. On Thu, Nov 1, 2012 at 3:11 PM, Patrick Angeles wrote: > > > On Thu, Nov 1, 2012 at 1:09 PM, Leonid Fedotov > wrote: > >> Var

Re: Hbase cluster for serving real time site traffic

2012-11-01 Thread Patrick Angeles
On Thu, Nov 1, 2012 at 1:09 PM, Leonid Fedotov wrote: > Varun, > for HA NameNode you may want to look at Hortonworks HDP 1.1 release. It > supported on vSphere and on RedHat HA cluster. > HDP 1.1 based on Hadoop 1.0.3 and fully certified for production > environments. > Do not forget, Hadoop 2.0

Re: Hbase cluster for serving real time site traffic

2012-11-01 Thread Stack
On Thu, Nov 1, 2012 at 10:09 AM, Leonid Fedotov wrote: > Varun, > for HA NameNode you may want to look at Hortonworks HDP 1.1 release. It > supported on vSphere and on RedHat HA cluster. > HDP 1.1 based on Hadoop 1.0.3 and fully certified for production > environments. > Do not forget, Hadoop 2.0

Re: SingleColumnValueFilter for empty column qualifier

2012-11-01 Thread Jonathan Bishop
Hi, Thanks for the help. I am going forward with writing my own filter - but having some trouble running it. What I did was simply copied SingleValueColumnFilter, renamed it, and ran. But I got... Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after a

Re: Hbase cluster for serving real time site traffic

2012-11-01 Thread Leonid Fedotov
Varun, for HA NameNode you may want to look at Hortonworks HDP 1.1 release. It supported on vSphere and on RedHat HA cluster. HDP 1.1 based on Hadoop 1.0.3 and fully certified for production environments. Do not forget, Hadoop 2.0 is still in alpha testing stage and a can not be recommended for

Re: Struggling with Region Servers Running out of Memory

2012-11-01 Thread Jeremy Carroll
Java 6 update 34 can rotate GC Logs. -XX:+UseGCLogFileRotation http://stackoverflow.com/questions/3822097/rolling-garbage-collector-logs-in-java As for profiling memory dumps, jprofiler7, yourrkit, etc.. YMMV. On Thu, Nov 1, 2012 at 10:01 AM, Jeff Whiting wrote: > We don't have GC logging ena

Re: Region server heap used memory usage?

2012-11-01 Thread Jeff Whiting
So are you looking at the total process size (e.g. top) or are you looking at the amount of memory the jvm says it is using (e.g. jmap -heap )? If you are looking at the process size (top) jvm doesn't ever give memory back so over time the process will just get bigger and bigger as it will spik

Bulk Loading - LoadIncrementalHFiles

2012-11-01 Thread Amit Sela
Hi everyone, I'm using MR to bulk load into HBase by using HFileOutputFormat.configureIncrementalLoad and after the job is complete I use loadIncrementalHFiles.doBulkLoad >From what I see, the MR outputs a file for each CF written and to my understanding these files are loaded as store files into

Re: Struggling with Region Servers Running out of Memory

2012-11-01 Thread Jeff Whiting
We don't have GC logging enabled (we did but gc.log would begin filling up the hdd and there was no way to clear it out without restarting the region server). Anyway to en gc.log and keep it to a reasonable size? I have two separate jmap dumps of the a region server before it dies. I haven't

Re: Does hbase.hregion.max.filesize have a limit?

2012-11-01 Thread Jeremy Carroll
To speak to 'if it's possible', yes it is. We have some tables over here at Klout during testing where we set the max region size to 100Gb, and actually had tables of that size during a MR job that created HFileV2's for us to import. So I can say that I have seen 100Gb regions that still work. As

Re: Hbase cluster for serving real time site traffic

2012-11-01 Thread Jeremy Carroll
In production you would want 3, 5, or 7, etc... ZK's (Odd number) for Quorum reasons. They should be dedicated on a machine, but it does not have to be a very big one. Updated to ZK are applied to disk before they are in memory for recoverability, so having faster disks helps once you start getting

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Kevin O'dell
Michael, I am not sure, I recommend it as a solid middle ground so that you have room to scale in your cluster. Once you get to 20GB+ from what I understand there are some adverse performance issues. It is the same as recommending 2GB for HFilev1, it is a good middle ground or a 4 max. With

Re: Struggling with Region Servers Running out of Memory

2012-11-01 Thread Jeff Whiting
No fat rows. We have kept the default hbase client limit of 10mb. And most values are quite small < 5k. We haven't tried raising the memory limit and we can try raising one of the servers and see how it does. However looking at the graphs I don't think it will help...but it is worth a try.

Region server heap used memory usage?

2012-11-01 Thread 徐歷盛
What's the usage of region server heap used memory ? I originally thought it's roughly the sum of current memstore + blockcache + storefileindex. But in our environment, I have configured max heap to 20g , heap used memory is sometimes up to 19g. At that moment, storefileindex was 8g, blockcache w

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Michael Segel
Just out of curiosity... What's the impact on having regions of 10GB or larger? What does that do to your footprint in memory and the time it takes to split or compact a region? -Mike On Nov 1, 2012, at 8:35 AM, Kevin O'dell wrote: > Couple thoughts(it is still early here so bear with me)

Re: Does hbase.hregion.max.filesize have a limit?

2012-11-01 Thread Kevin O'dell
There are two trains of thought here. The first is manually splitting your own regions. In this case you would not want your regions over 20GB for HFilev2 or 4GB for HFilev1, but you would set your maxfile size to something like 100GB so you can split when you want to and the system won't automag

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Kevin O'dell
Couple thoughts(it is still early here so bear with me): Did you presplit your table? You are on .92, might as well take advantage of HFilev2 and use 10GB region sizes Loading over MR, I am assuming puts? Did you tune your memstore and Hlog size? You aren't using a different client version or

Re: Does hbase.hregion.max.filesize have a limit?

2012-11-01 Thread Doug Meil
Hi there- re: "The max file size the whole cluster can store for one CF is 60G, right?" No, the max file-size for a region, in your example, is 60GB. When the data exceeds that the region will split - and then you'll have 2 regions with 60GB limit. Check out this section of the RefGuide: h

Re: Hbase cluster for serving real time site traffic

2012-11-01 Thread Marcos Ortiz Valmaseda
Regards, Varun. 1- I think that you should take a look to the Cloudera Manager for CDH 4.1 to create a HA HDFS enviroment. Remember that the version 2.0.x is not ready for production yet. The stable version is Hadoop 1.0.4 with HBase 0.94.2 2- Yes, a recommended practice is to have a separate Zo

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread ramkrishna vasudevan
Can you try restarting the cluster i mean the master and RS. Also if this things persists try to clear the zk data and restart. Regards Ram On Thu, Nov 1, 2012 at 2:46 PM, Cheng Su wrote: > Sorry, my mistake. Ignore about the "max store size of a single CF" please. > > m(_ _)m > > On Thu, Nov 1

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Cheng Su
Sorry, my mistake. Ignore about the "max store size of a single CF" please. m(_ _)m On Thu, Nov 1, 2012 at 4:43 PM, Ameya Kantikar wrote: > Thanks Cheng. I'll try increasing my max region size limit. > > However I am not clear with this math: > > "Since you set the max file size to 2G, you can o

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Ameya Kantikar
Thanks Cheng. I'll try increasing my max region size limit. However I am not clear with this math: "Since you set the max file size to 2G, you can only store 2XN G data into a single CF." Why is that? My assumption is, even though single region can only be 2 GB, I can still have hundreds of regi

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Cheng Su
I met same problem these days. I'm not very sure the error log is exactly same, but I do have the same exception org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: NotServingRegionException: 1 time, servers with issues: smartdeals-hbase8-snc1.snc1:60020, and the

Re: Does hbase.hregion.max.filesize have a limit?

2012-11-01 Thread Cheng Su
Thank you for your answer. The max file size the whole cluster can store for one CF is 60G, right? Maybe the only way is to split the large table into small tables... On Thu, Nov 1, 2012 at 3:05 PM, ramkrishna vasudevan wrote: > Can multiple region servers runs on one real machine? > (I guess not

Re: Hbase cluster for serving real time site traffic

2012-11-01 Thread Varun Sharma
Thanks all for the helpful comments. I read up on HA and was wondering if there are good tools for setting up a HA HDFS + Hbase cluster on EC2 quickly. From my reading, it appears that tools like Whirr still have issues with bringing up the secondary NN on a different machine etc. Also for availabi

Re: Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Ameya Kantikar
One more thing, the Hbase table in question is neither enabled, nor disabled: hbase(main):006:0> is_disabled 'userTable1' false 0 row(s) in 0.0040 seconds hbase(main):007:0> is_enabled 'userTable1' false 0 row(s) in 0.0040 seconds Ameya On Thu, Nov 1, 2012 at 12:02 AM, Ameya Kantikar wrote:

Re: Does hbase.hregion.max.filesize have a limit?

2012-11-01 Thread ramkrishna vasudevan
Can multiple region servers runs on one real machine? (I guess not though) No.. Every RS runs in different physical machines. max.file.size actually applies for region. Suppose you create a table then insert data for 20G that will get explicitly splitted into further regions. Yes all 60G of data

Table in Inconsistent State; Perpetually pending region server transitions while loading lot of data into Hbase via MR

2012-11-01 Thread Ameya Kantikar
Hi, I am trying to load lot of data (around 1.5 TB) into a single Hbase table. I have setup region size at 2 GB. I also set hbase.regionserver.handler.count at 30. When I start loading data via MR, after a while, tasks start failing with following error: org.apache.hadoop.hbase.client.RetriesExh