All, I am trying to determine visits for customer from omniture weblog file using Hive. Table: omniture_web_data Columns: visid_high,visid_low,evar23,visit_page_num Sample Data: visid_high,visid_low,evar23,visit_page_num 999,888,1003,10 999,888,1003,14 999,888,1003,6 999,777,1003,12 999,777,1003,20 I want to calculate for each Customer Number ( evar23 is Customer Number ) , total visits. visid_high and visid_low determines a unique visit. For each distinct visitor, calculate sum of maximum visit_page_num. In above example 14 + 20 = 34 should be the total visits for the customer 1003. I am trying to run the following queries - Method 1 is almost the same as Method 2. Except in Method 1 I only choose a particualr customer number 1003. In method 2 , i generalized to all. In Method 1 , I am getting the accurate result. In metnhod 2 , I am not getting the same result as Method 1. Any suggestions on how to trouble shoot. ALso, any alternative approaches. // Method 1 select a.evar23,sum(b.max_visit_page_num) from (select distinct visid_high,visid_low,evar23 from web.omniture_web_data where evar23='1003') a JOIN (select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from omniture_web_data where evar23='1003' group by visid_high,visid_low) b where a.visid_high=b.visid_high and a.visid_low=b.visid_low group by a.evar23; ///// Result of Method 1 1003 34 // Method 2
create table temp123 as select a.evar23,sum(b.max_visit_page_num) from (select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a JOIN (select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from omniture_web_data group by visid_high,visid_low) b where a.visid_high=b.visid_high and a.visid_low=b.visid_low group by a.evar23; select * from temp123 where evar23='1003'; // The Result of Method 2 is not the same as Method 1. It is showing a different number. Thanks, Raj