Hey, Need some more info.
Can you paste logs from the MR tasks that fail? What's going on in the cluster while the MR job is running (cpu, io-wait, memory, etc)? And what is the setup of your cluster... how many nodes, specs of nodes (cores, memory, RS heap), and then how many concurrent map tasks you have per node. JG > -----Original Message----- > From: vlisovsky [mailto:vlisov...@gmail.com] > Sent: Thursday, December 09, 2010 10:49 PM > To: user@hbase.apache.org > Subject: Hive HBase integration scan failing > > Hi Guys, > > > Wonder if anybody could shed some light on how to reduce the load on > > HBase cluster when running a full scan. > > The need is to dump everything I have in HBase and into a Hive table. > > The HBase data size is around 500g. > > The job creates 9000 mappers, after about 1000 maps things go south > > every time.. > > If I run below insert it runs for about 30 minutes then starts > > bringing down HBase cluster after which region servers need to be > restarted.. > > Wonder if there is a way to throttle it somehow or otherwise if there > > is any other method of getting structured data out? > > Any help is appreciated, > > Thanks, > > -Vitaly > > > > create external table hbase_linked_table ( > > mykey string, > > info map<string, string>, > > ) > > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > > WITH > > SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:") > TBLPROPERTIES > > ("hbase.table.name" = "hbase_table2"); > > > > set hive.exec.compress.output=true; > > set io.seqfile.compression.type=BLOCK; > > set mapred.output.compression.type=BLOCK; > > set > > > mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCo > de > > c; > > > > set mapred.reduce.tasks=40; > > set mapred.map.tasks=25; > > > > INSERT overwrite table tmp_hive_destination select * from > > hbase_linked_table; > >