All,
 
I am trying to determine visits for customer from omniture weblog file using 
Hive.
 
Table: omniture_web_data
Columns: visid_high,visid_low,evar23,visit_page_num
 
Sample Data:
visid_high,visid_low,evar23,visit_page_num
999,888,1003,10
999,888,1003,14
999,888,1003,6
999,777,1003,12
999,777,1003,20
 
I want to calculate for each Customer Number ( evar23 is  Customer Number ) , 
total visits. visid_high and visid_low determines a unique visit.
For each distinct visitor, calculate sum of maximum visit_page_num. In above 
example
 
14 + 20 = 34 should be the total visits for the customer 1003.
 
I am trying to run the following queries - Method 1 is almost the same as 
Method 2. Except in Method 1 I only choose a particualr customer number 1003. 
In method 2 , i generalized to all.
 
In Method 1 , I am getting the accurate result. In metnhod 2 , I am not getting 
the same result as Method 1. 
 
Any suggestions on how to trouble shoot. ALso, any alternative approaches.
 
// Method 1
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data where 
evar23='1003') a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data where evar23='1003' group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;
 
///// Result of Method 1
 
1003    34
 
// Method 2

create table temp123 as
select a.evar23,sum(b.max_visit_page_num) from
(select distinct visid_high,visid_low,evar23 from web.omniture_web_data) a
JOIN
(select visid_high,visid_low,max(visit_page_num) as max_visit_page_num from 
omniture_web_data group by visid_high,visid_low) b
where a.visid_high=b.visid_high and a.visid_low=b.visid_low
group by a.evar23;
 
select * from temp123 where evar23='1003';
 
// The Result of Method 2 is not the same as Method 1. It is showing a 
different number.
 
 
 
Thanks,
Raj

Reply via email to