I have a few questions as follows:

1) what's the schema of all 3 tables ? Do these tables only have  (sel_sid,
date_day) as the columns along with the facts that they represent or do they
have more columns besides those 2 columns ?

2) why do you do a left outer join without an ON clause, not too sure if
MySql syntax is like that but I normally don't prefer to do a JOIN without
specifying anything in the ON clause. I just don't trust the result set and
the query is not really readable.

How about trying this query:

SELECT
   h.sel_sid, h.hits, u.uniques, if(c.clicks is not null, c.clicks, 0)
FROM
   selection_daily_hits h
   left outer join selection_daily_uniques u ON (h.sel_sid = u.sel_sid AND
h.date_day = u.date_day AND u.date_day = '20110211')
   left outer join selection_daily_clicks c on ON (h.sel_sid = c.sel_sid AND
h.date_day = c.date_day AND c.date_day = '20110211')
WHERE h.date_day = '20110211';

some notes about the query:
- I restrict the uniques and clicks to 20110211 in the JOIN clause because
in hive 0.5.0 if you put them in the WHERE clause the partitions don't get
trimmed and it scans the entire table before limiting the data. If your data
is not partitioned please go ahead and remove that restriction.
- I join on the date_day columns to make sure the data is correct if the
tables are not partitioned or the query plan causes table scans because
there are chances you can see the same sel_sid on different days (this is an
assumption)

-Viral

On Wed, Feb 23, 2011 at 3:16 AM, Cam Bazz <[email protected]> wrote:

> Hello,
>
> I have three tables, one that counts hits, the other unique visits,
> and the other clicks on that page:
>
> The query below will fail to produce correct results: (number of
> uniques is wrong, always set to 8, same number for all)
>
> select h.sel_sid, h.hits, u.uniques, if(c.clicks is not null,
> c.clicks, 0) from selection_daily_hits h left outer join
> selection_daily_uniques u left outer join selection_daily_clicks c on
> (h.sel_sid = u.sel_sid and h.sel_sid = c.sel_sid and h.date_day =
> '20110211' and u.date_day = '20110211' and c.date_day = '20110211');
>
> where the query below will work and provide correct results
>
> select h.sel_sid, h.hits, u.uniques, if(c.clicks is not null,
> c.clicks, 0) from selection_daily_hits h left outer join
> selection_daily_uniques u left outer join selection_daily_clicks c on
> (h.sel_sid = u.sel_sid and c.sel_sid = h.sel_sid and h.date_day =
> '20110211' and u.date_day = '20110211' and c.date_day = '20110211');
>
> the only difference is, on the non working query I have h.sel_sid =
> c.sel_sid and in the working query I have c.sel_sid = h.sel_sid
>
> notice that while the first and second table will always have the same
> number keys, the third table might not have some keys, hence those
> lines are converted to 0.
>
> Best Regards,
> -C.B.
>

Reply via email to