Hi Guys,
> Wonder if anybody could shed some light on how to reduce the load on HBase > cluster when running a full scan. > The need is to dump everything I have in HBase and into a Hive table. The > HBase data size is around 500g. > The job creates 9000 mappers, after about 1000 maps things go south every > time.. > If I run below insert it runs for about 30 minutes then starts bringing > down HBase cluster after which region servers need to be restarted.. > Wonder if there is a way to throttle it somehow or otherwise if there is > any other method of getting structured data out? > Any help is appreciated, > Thanks, > -Vitaly > > create external table hbase_linked_table ( > mykey string, > info map<string, string>, > ) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH > SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:") > TBLPROPERTIES ("hbase.table.name" = "hbase_table2"); > > set hive.exec.compress.output=true; > set io.seqfile.compression.type=BLOCK; > set mapred.output.compression.type=BLOCK; > set > mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec; > > set mapred.reduce.tasks=40; > set mapred.map.tasks=25; > > INSERT overwrite table tmp_hive_destination > select * from hbase_linked_table; >