Hive JOINs not working as expected (returns 0 rows)

John Morrison Tue, 20 Nov 2012 07:43:57 -0800

Hi,

New to hive in last few weeks and could use some help with JOINs.


Using MapR (version 0.9.0)    /usr/bin/hive -> 
/opt/mapr/hive/hive-0.9.0/bin/hive

I have 2 tables I am wanting to join by date (order_t and date_t).  DDL at 
bottom.

I have reduced this to 1 column and 1 row and still can't get things to work.

Any help will be appreciated.

-John

Details:

# data in each table
hive> select * from ext_order_1 ;
OK
20081203
Time taken: 0.076 seconds
hive> select * from ext_date_1 ;
OK
20081203
Time taken: 0.068 seconds

# Trying to join results in 0 rows (see yellow below)
#
hive> select count(*) from ext_order_1 a join ext_date_1 b on (a.cal_dt = 
b.cal_dt) ;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201211050921_0232, Tracking URL =  xxxx
Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job  
-Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0232
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2012-11-20 10:08:48,797 Stage-1 map = 0%,  reduce = 0%
2012-11-20 10:08:53,823 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.25 
sec
2012-11-20 10:08:54,829 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.25 
sec
2012-11-20 10:08:55,835 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.25 
sec
2012-11-20 10:08:56,842 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.25 
sec
2012-11-20 10:08:57,848 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.25 
sec
2012-11-20 10:08:58,854 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.25 
sec
2012-11-20 10:08:59,860 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.25 
sec
2012-11-20 10:09:00,866 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.23 
sec
MapReduce Total cumulative CPU time: 3 seconds 230 msec
Ended Job = job_201211050921_0232
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201211050921_0233, Tracking URL =  xxxx
Kill Command = /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job  
-Dmapred.job.tracker=maprfs:/// -kill job_201211050921_0233
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2012-11-20 10:09:02,058 Stage-2 map = 0%,  reduce = 0%
2012-11-20 10:09:07,084 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.89 
sec
2012-11-20 10:09:08,091 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.89 
sec
2012-11-20 10:09:09,101 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.89 
sec
2012-11-20 10:09:10,106 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.89 
sec
2012-11-20 10:09:11,112 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.89 
sec
2012-11-20 10:09:12,119 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.89 
sec
2012-11-20 10:09:13,125 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 1.5 
sec
MapReduce Total cumulative CPU time: 1 seconds 500 msec
Ended Job = job_201211050921_0233
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   Cumulative CPU: 3.23 sec   MAPRFS Read: 396 MAPRFS 
Write: 222 SUCCESS
Job 1: Map: 1  Reduce: 1   Cumulative CPU: 1.5 sec   MAPRFS Read: 628 MAPRFS 
Write: 72 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 730 msec
OK
0
Time taken: 25.727 seconds



# Hive DDL
#
$> cat date_t_1_row.ddl

CREATE external TABLE  ext_date_1
(
     cal_dt                      INT
)
  ROW FORMAT
  DELIMITED FIELDS TERMINATED BY '~'
  LINES TERMINATED BY '\n' STORED AS TEXTFILE
  LOCATION '/Work/Hive/tables/date_1'
;


$>  cat order_1.ddl

CREATE external TABLE ext_order_1
(
cal_dt             int
)
  ROW FORMAT
  DELIMITED FIELDS TERMINATED BY '~'
  LINES TERMINATED BY '\n' STORED AS TEXTFILE
  LOCATION '/Work/Hive/tables/order_1'
;

Hive JOINs not working as expected (returns 0 rows)

Reply via email to