What version of HBase / hdfs are you running with ? Cheers
On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din <akhtar.m...@gmail.com>wrote: > Hi, > I have been running a map reduce job that joins 2 datasets of 1.3 and 4 GB > in size. Joining is done at reduce side. Output is written to either Hbase > or HDFS depending upon configuration. The problem I am having is that Hbase > takes about 60-80 minutes to write the processed data, on the other hand > HDFS takes only 3-5 mins to write the same data. I really want to improve > the Hbase speed and bring it down to 1-2 min. > > I am using amazon EC2 instances, launched a cluster of size 3 and later 10, > have tried both c3.4xlarge and c3.8xlarge instances. > > I can see significant increase in performance while writing to HDFS as i > use cluster with more nodes, having high specifications, but in the case of > Hbase there was no significant change in performance. > > I have been going through different posts, articles and have read Hbase > book to solve the Hbase performance issue but have not been able to succeed > so far. > Here are the few things i have tried out so far: > > *Client Side* > - Turned off writing to WAL > - Experimented with write buffer size > - Turned off auto flush on table > - Used cache, experimented with different sizes > > > *Hbase Server Side* > - Increased region servers heap size to 8 GB > - Experimented with handlers count > - Increased Memstore flush size to 512 MB > - Experimented with hbase.hregion.max.filesize, tried different sizes > > There are many other parameters i have tried out following the suggestions > from different sources, but nothing worked so far. > > Your help will be really appreciated. > > -- > Regards > Akhtar Muhammad Din >