how to do parallel scanning in map reduce using hbase as input?

2014-06-26 Thread Li Li
my table has about 700 million rows and about 80 regions. each task tracker is configured with 4 mappers and 4 reducers at the same time. The hadoop/hbase cluster has 5 nodes so at the same time, it has 20 mappers running. it takes more than an hour to finish mapper stage. The hbase cluster's load

all regionservers : numberOfOnlineRegions=0

2014-06-26 Thread lilibiao2014
Hey guys, Yesterday our Hbase cluster had 4 of 11 regionserver don't work well, that the numberOfOnlineRegions= 0 . And when we restart the cluster,not only 4 but all of our regionservers this occurs. Here is the hbase master's log.Except the exception of the log ,we also find few zookeeper's exce

Re: Disk space leak when using HBase and HDFS ShortCircuit

2014-06-26 Thread Giuseppe Reina
On Thu, Jun 26, 2014 at 1:02 AM, Enis Söztutar wrote: > Agreed, this seems like an hdfs issue unless hbase itself does not close > the hfiles properly. But judging from the fact that you were able to > circumvent the problem by getting reducing the cache size, it does seem > unlikely. > Well, w

Re: all regionservers : numberOfOnlineRegions=0

2014-06-26 Thread Ted Yu
Which hbase release do you use ? Have you checked region server log to see why log splitting had issues ? Cheers On Jun 26, 2014, at 1:55 AM, "lilibiao2014" wrote: > Hey guys, > > Yesterday our Hbase cluster had 4 of 11 regionserver don't work well, that > the numberOfOnlineRegions= 0 . > And

Re: how to do parallel scanning in map reduce using hbase as input?

2014-06-26 Thread Ted Yu
80 regions over 5 nodes - that's 16 per server. How big is average region size ? Have you considered splitting existing regions ? Cheers On Jun 26, 2014, at 12:34 AM, Li Li wrote: > my table has about 700 million rows and about 80 regions. each task > tracker is configured with 4 mappers and

Store data in HBase with a MapReduce.

2014-06-26 Thread Guillermo Ortiz
I have a question. I want to execute an MapReduce and the output of my reduce it's going to store in HBase. So, it's a MapReduce with an output which it's going to be stored in HBase. I can do a Map and use HFileOutputFormat.configureIncrementalLoad(pJob, table); but, I don't know how I could do i

Re: HBase slow data load

2014-06-26 Thread Ted Yu
What hbase release are you using ? Do you use HTable.batch() to insert records ? Cheers On Thu, Jun 26, 2014 at 1:08 AM, adelin.ghanayem wrote: > I have a problem with loading big data from mysql database into an HBase > small cluster. The cluster configurations are as follow > > Machine(1):

Re: Store data in HBase with a MapReduce.

2014-06-26 Thread Ted Yu
Depending on the MapOutputValueClass, you can override corresponding XXXSortReducer so that your custom logic is added. Cheers On Thu, Jun 26, 2014 at 8:24 AM, Guillermo Ortiz wrote: > I have a question. > I want to execute an MapReduce and the output of my reduce it's going to > store in HBas

HBase slow data load

2014-06-26 Thread adelin.ghanayem
I have a problem with loading big data from mysql database into an HBase small cluster. The cluster configurations are as follow Machine(1): HDFS/ primary HDFS node/ Yarn resource manager/ yarn node manager/ MapReduce / History server /zookeeper / Region Server/ Machine(2): Yarn Node Manager / Se

Bulk load to multiple tables

2014-06-26 Thread Kevin
I am reading data off of HDFS that don't all get loaded into a single table. With the current way of bulk loading I can load to the table that most of the data will end up in, and I can use the client API (i.e., Put) to load the other data from the file into the other tables. The current bulk load

Re: Store data in HBase with a MapReduce.

2014-06-26 Thread Wellington Chevreuil
Hi Guillermo, You can use the TableOutputFormat as the output format for your job, then on your reduce, you just need to write Put objects. On your driver: Job job = new Job(conf); … job.setOutputFormatClass(TableOutputFormatClass); job.setReducerClass(AverageReducer.class); job.setOutputForma

Re: Store data in HBase with a MapReduce.

2014-06-26 Thread Stack
Be sure to read http://hbase.apache.org/book.html#d3314e5975 Guillermo if you have not already. Avoid reduce phase if you can. St.Ack On Thu, Jun 26, 2014 at 8:24 AM, Guillermo Ortiz wrote: > I have a question. > I want to execute an MapReduce and the output of my reduce it's going to > store

Re: how to do parallel scanning in map reduce using hbase as input?

2014-06-26 Thread Li Li
I don't think splitting will help. Adding more mappers in tasktracker will use more resources(heap memory). btw, how to view average region size? I found in web ui: ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed Storefile Size Index Size Bloom Size mphbase1,60

too many regions

2014-06-26 Thread sunweiwei
Hi I'm using a hbase0.94.2 cluster, which has 3 zookeepers, 17 regionservers, about 4 regions. each regionserver has about 2300 regions. Maybe the cluster has too many regions. 1、If a regionserver die,how long hmaster reassign the regionserver's 2300 regions . If I restart hbas

Re: too many regions

2014-06-26 Thread Ted Yu
bq. Can I set hbase.hregion.max.filesize to 10G in hbase0.94.2 Yes. BTW please consider upgrading to newer 0.94 release. Latest was 0.94.20 Cheers On Thu, Jun 26, 2014 at 7:41 PM, sunweiwei wrote: > Hi > > > > I'm using a hbase0.94.2 cluster, which has 3 zookeepers, 17 > regionservers, abo

Re: Bulk load to multiple tables

2014-06-26 Thread Suraj Varma
See this : https://issues.apache.org/jira/browse/HBASE-3727 And see this thread: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/21724 You may need to rebase the code to your specific version of hbase, though. --Suraj On Thu, Jun 26, 2014 at 10:28 AM, Kevin wrote: > I am reading da

Re: Custom TableInputFormat and TableOutputFormat classes

2014-06-26 Thread Suraj Varma
See this thread that seems similar to your use case http://apache-hbase.679495.n3.nabble.com/Hbase-sequential-row-merging-in-MapReduce-job-td4033194.html --Suraj On Wed, Jun 25, 2014 at 2:58 AM, Kuldeep Bora wrote: > Hello, > > I have keys in hbase of form `abc:xyz` and i would like to write/e