Hey,

Need some more info.

Can you paste logs from the MR tasks that fail?  What's going on in the cluster 
while the MR job is running (cpu, io-wait, memory, etc)?

And what is the setup of your cluster... how many nodes, specs of nodes (cores, 
memory, RS heap), and then how many concurrent map tasks you have per node.

JG

> -----Original Message-----
> From: vlisovsky [mailto:vlisov...@gmail.com]
> Sent: Thursday, December 09, 2010 10:49 PM
> To: user@hbase.apache.org
> Subject: Hive HBase integration scan failing
> 
> Hi Guys,
> 
> > Wonder if  anybody could shed some light on how to reduce the load on
> > HBase cluster when running a full scan.
> > The need is to dump everything I have in HBase and into a Hive table.
> > The HBase data size is around 500g.
> > The job creates 9000 mappers, after about 1000 maps things go south
> > every time..
> > If I run below insert it runs for about 30 minutes then starts
> > bringing down HBase cluster after which region servers need to be
> restarted..
> > Wonder if there is a way to throttle it somehow or otherwise if there
> > is any other method of getting structured data out?
> > Any help is appreciated,
> > Thanks,
> > -Vitaly
> >
> > create external table hbase_linked_table (
> > mykey        string,
> > info        map<string, string>,
> > )
> > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> > WITH
> > SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:")
> TBLPROPERTIES
> > ("hbase.table.name" = "hbase_table2");
> >
> > set hive.exec.compress.output=true;
> > set io.seqfile.compression.type=BLOCK;
> > set mapred.output.compression.type=BLOCK;
> > set
> >
> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCo
> de
> > c;
> >
> > set mapred.reduce.tasks=40;
> > set mapred.map.tasks=25;
> >
> > INSERT overwrite table tmp_hive_destination select * from
> > hbase_linked_table;
> >

Reply via email to