Hi Guys,

> Wonder if  anybody could shed some light on how to reduce the load on HBase
> cluster when running a full scan.
> The need is to dump everything I have in HBase and into a Hive table. The
> HBase data size is around 500g.
> The job creates 9000 mappers, after about 1000 maps things go south every
> time..
> If I run below insert it runs for about 30 minutes then starts bringing
> down HBase cluster after which region servers need to be restarted..
> Wonder if there is a way to throttle it somehow or otherwise if there is
> any other method of getting structured data out?
> Any help is appreciated,
> Thanks,
> -Vitaly
>
> create external table hbase_linked_table (
> mykey        string,
> info        map<string, string>,
> )
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH
> SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:")
> TBLPROPERTIES ("hbase.table.name" = "hbase_table2");
>
> set hive.exec.compress.output=true;
> set io.seqfile.compression.type=BLOCK;
> set mapred.output.compression.type=BLOCK;
> set
> mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>
> set mapred.reduce.tasks=40;
> set mapred.map.tasks=25;
>
> INSERT overwrite table tmp_hive_destination
> select * from hbase_linked_table;
>

Reply via email to