Dear Hive Users,

I am in process of running over a poc to one of my customer demonstrating
the huge performance benefits of Hadoop BigData using Hive.

Following is the problem statement i am stuck with.

I have generate a large table with 28 columns( all are double). Table size
on disk is 70GB (i ultimately created compressed table using ORC format to
save disk space bringing down the table size to < 1GB) with more than
450Million records.

In order to demonstrate a complex use case i joined this table with itself.
Following are the queries i have used to create table and  join query i am
using.

*Create Table and Loading Data, Hive parameters settigs:*
set hive.vectorized.execution.enabled = true;
set hive.vectorized.execution.reduce.enabled = true;
set mapred.max.split.size=100000000;
set mapred.min.split.size=1000000;
set hive.auto.convert.join=false;
set hive.enforce.sorting=true;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set mapreduce.reduce.input.limit=-1;
set hive.exec.parallel = true;

CREATE TABLE huge_numeric_table_orc2(col1 double,col2 double,col3
double,col4 double,col5 double,col6 double,col7 double,col8 double,col9
double,col10 double,col11 double,col12 double,col13 double,col14
double,col15 double,col16 double,col17 double,col18 double,col19
double,col20 double,col21 double,col22 double,col23 double,col24
double,col25 double,col26 double,col27 double,col28 double)
clustered by (col1) sorted by (col1) into 240 buckets
STORED AS ORC tblproperties ("orc.compress"="SNAPPY");

from huge_numeric_table insert overwrite table huge_numeric_table_orc2
select * sort by col1;


*JOIN QUERY:*

select (avg(t1.col1)*avg(t1.col6))/(avg(t1.col11)*avg(t1.col16)) as AVG5
from huge_numeric_table_orc2 t1 left outer join huge_numeric_table_orc2 t2
on t1.col1=t2.col1 where (t1.col1) > 34.11 and (t2.col1) >10.12


*The problem is that this query gets stuck at reducers :80-85%. and goes in
a loop and never finishes. *

Version of Hive is 1.2.

Please help.


Thanks and Regards
Nishant Aggarwal, PMP
Cell No:- +91 99588 94305

Reply via email to