Hi,

I am using Pig CUBE to generate OLAP like CUBE aggregations. I have 8 
dimensions to aggregate on.
However, the performance is so bad, and taking more than 2-3 hours for 
aggregating over 50K rows. The reduce job while executing CUBE is taking long 
time (example below).

Can someone please suggest where should I start to improving the performance.


Job Stats (time in seconds):
JobId   Maps    Reduces MaxMapTime      MinMapTIme      AvgMapTime      
MedianMapTime   MaxReduceTime   MinReduceTime   AvgReduceTime   
MedianReducetime        Alias   Feature Outputs
job_1440641036158_61555 1       1       23      23      23      23      10      
10      10      10      
tmp,tmp1,tmp2,userAccessData,userAccessData1,userAccessData2,userAccessData3    
GROUP_BY

job_1440641036158_61556 1       1       9       9       9       9       7       
7       7       7       
activities_1,activityData,activityData_1,activityData_2,activityData_3,activityData_4,activit
yData_5    GROUP_BY
job_1440641036158_61558 2       1       11      7       9       9       9       
9       9       9       
joinedData,joinedData_1,joinedData_2,rawData,rawData1,userData  HASH_JOIN
job_1440641036158_61560 2       1       8       7       8       8       6464    
6464    6464    6464    cube,data,data2 HASH_JOIN
job_1440641036158_61615 1516    204     249     24      76      43      373     
276     308     301     
sessions,sessions2,sessions_new,sessions_new_distinct,sessions_return,sessions_return_distinct,summary_Day,summary_Day_1,summary_Day_2,summary_Day_3,summary_Day_4,summary_Day_5,users,users2,users_new,users_new_distinct,users_return,users_return_distinct,users_tmp
 GROUP_BY,DISTINCT       mobile_diag_dev_tbls.appAnalytic_users_cumulative3,

Regards
Reddy

Reply via email to