Re: Hive JOINs not working as expected (returns 0 rows)

Dean Wampler Tue, 20 Nov 2012 08:23:17 -0800

Did you install v0.9.0 on your MapR cluster? I suspect there's a bug in the
interaction between the two. I'm not sure which version of Hive is the
latest shipped with MapR.


I just tried your example in an older MapR M5 virtual machine instance that
came with Hive v0.7.1. It worked correctly. I then repeated the experiment
in Hive v0.9.0 on a generic Apache Hadoop setup I have in another VM. It
also worked correctly.  (I don't have Hive v0.9.0 available in the MapR VM.)

Hope this helps.

dean

On Tue, Nov 20, 2012 at 9:43 AM, John Morrison <
[email protected]> wrote:

>  Hi,****
>
> ** **
>
> New to hive in last few weeks and could use some help with JOINs.****
>
> ** **
>
> Using MapR (version 0.9.0)    /usr/bin/hive ->
> /opt/mapr/hive/hive-0.9.0/bin/hive****
>
> ** **
>
> I have 2 tables I am wanting to join by date (order_t and date_t).  DDL at
> bottom.****
>
> ** **
>
> I have reduced this to 1 column and 1 row and still can’t get things to
> work.****
>
> ** **
>
> Any help will be appreciated.****
>
> ** **
>
> -John****
>
> ** **
>
> Details:****
>
> ** **
>
> # data in each table****
>
> hive> select * from ext_order_1 ;****
>
> OK****
>
> 20081203****
>
> Time taken: 0.076 seconds****
>
> hive> select * from ext_date_1 ;****
>
> OK****
>
> 20081203****
>
> Time taken: 0.068 seconds****
>
> ** **
>
> # Trying to join results in 0 rows (see yellow below)****
>
> #****
>
> hive> select count(*) from ext_order_1 a join ext_date_1 b on (a.cal_dt =
> b.cal_dt) ;****
>
> Total MapReduce jobs = 2****
>
> Launching Job 1 out of 2****
>
> Number of reduce tasks not specified. Estimated from input data size: 1***
> *
>
> In order to change the average load for a reducer (in bytes):****
>
>   set hive.exec.reducers.bytes.per.reducer=<number>****
>
> In order to limit the maximum number of reducers:****
>
>   set hive.exec.reducers.max=<number>****
>
> In order to set a constant number of reducers:****
>
>   set mapred.reduce.tasks=<number>****
>
> Starting Job = job_201211050921_0232, Tracking URL =  xxxx****
>
> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
> -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0232****
>
> Hadoop job information for Stage-1: number of mappers: 1; number of
> reducers: 1****
>
> 2012-11-20 10:08:48,797 Stage-1 map = 0%,  reduce = 0%****
>
> 2012-11-20 10:08:53,823 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 1.25 sec****
>
> 2012-11-20 10:08:54,829 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 1.25 sec****
>
> 2012-11-20 10:08:55,835 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 1.25 sec****
>
> 2012-11-20 10:08:56,842 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 1.25 sec****
>
> 2012-11-20 10:08:57,848 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 1.25 sec****
>
> 2012-11-20 10:08:58,854 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 1.25 sec****
>
> 2012-11-20 10:08:59,860 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
> 1.25 sec****
>
> 2012-11-20 10:09:00,866 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 3.23 sec****
>
> MapReduce Total cumulative CPU time: 3 seconds 230 msec****
>
> Ended Job = job_201211050921_0232****
>
> Launching Job 2 out of 2****
>
> Number of reduce tasks determined at compile time: 1****
>
> In order to change the average load for a reducer (in bytes):****
>
>   set hive.exec.reducers.bytes.per.reducer=<number>****
>
> In order to limit the maximum number of reducers:****
>
>   set hive.exec.reducers.max=<number>****
>
> In order to set a constant number of reducers:****
>
>   set mapred.reduce.tasks=<number>****
>
> Starting Job = job_201211050921_0233, Tracking URL =  xxxx****
>
> Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job
> -Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0233****
>
> Hadoop job information for Stage-2: number of mappers: 1; number of
> reducers: 1****
>
> 2012-11-20 10:09:02,058 Stage-2 map = 0%,  reduce = 0%****
>
> 2012-11-20 10:09:07,084 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU
> 0.89 sec****
>
> 2012-11-20 10:09:08,091 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU
> 0.89 sec****
>
> 2012-11-20 10:09:09,101 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU
> 0.89 sec****
>
> 2012-11-20 10:09:10,106 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU
> 0.89 sec****
>
> 2012-11-20 10:09:11,112 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU
> 0.89 sec****
>
> 2012-11-20 10:09:12,119 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU
> 0.89 sec****
>
> 2012-11-20 10:09:13,125 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU
> 1.5 sec****
>
> MapReduce Total cumulative CPU time: 1 seconds 500 msec****
>
> Ended Job = job_201211050921_0233****
>
> MapReduce Jobs Launched:****
>
> Job 0: Map: 1  Reduce: 1   Cumulative CPU: 3.23 sec   MAPRFS Read: 396
> MAPRFS Write: 222 SUCCESS****
>
> Job 1: Map: 1  Reduce: 1   Cumulative CPU: 1.5 sec   MAPRFS Read: 628
> MAPRFS Write: 72 SUCCESS****
>
> Total MapReduce CPU Time Spent: 4 seconds 730 msec****
>
> OK****
>
> 0****
>
> Time taken: 25.727 seconds****
>
> ** **
>
> ** **
>
> ** **
>
> # Hive DDL****
>
> #****
>
> $> cat date_t_1_row.ddl****
>
> ** **
>
> CREATE external TABLE  ext_date_1****
>
> (****
>
>      cal_dt                      INT****
>
> )****
>
>   ROW FORMAT****
>
>   DELIMITED FIELDS TERMINATED BY '~'****
>
>   LINES TERMINATED BY '\n' STORED AS TEXTFILE****
>
>   LOCATION '/Work/Hive/tables/date_1'****
>
> ;****
>
> ** **
>
> ** **
>
> $>  cat order_1.ddl****
>
> ** **
>
> CREATE external TABLE ext_order_1****
>
> (****
>
> cal_dt             int****
>
> )****
>
>   ROW FORMAT****
>
>   DELIMITED FIELDS TERMINATED BY '~'****
>
>   LINES TERMINATED BY '\n' STORED AS TEXTFILE****
>
>   LOCATION '/Work/Hive/tables/order_1'****
>
> ;****
>
> ** **
>
> ** **
>
> ** **
>



-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330

Re: Hive JOINs not working as expected (returns 0 rows)

Reply via email to