RE: Wrong column is picked in HIVE 2.0.0 + TEZ 0.8.2 left join

GAO Chi Tue, 01 Mar 2016 08:58:47 -0800

I added “set hive.execution.engine=mr;” at top of the script, seems the result 
is correct…


 

NFO  : Completed compiling 
command(queryId=hive_20160302005157_6ca43017-d52c-48c8-b4e4-e11c724c3d39); Time 
taken: 0.032 seconds

INFO  : Concurrency mode is disabled, not creating a lock manager

INFO  : Executing 
command(queryId=hive_20160302005157_6ca43017-d52c-48c8-b4e4-e11c724c3d39): 
SELECT

  a.key,

  a.a_one,

  b.b_one,

  a.a_zero,

  b.b_zero

FROM

(

    SELECT

      11 key,

      0 confuse_you,

      1 a_one,

      0 a_zero

) a

LEFT JOIN

(

    SELECT

      11 key,

      0 confuse_you,

      1 b_one,

      0 b_zero

) b

ON a.key = b.key

WARN  : Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.

INFO  : WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in 
the future versions. Consider using a different execution engine (i.e. spark, 
tez) or using Hive 1.X releases.

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
future versions. Consider using a different execution engine (i.e. spark, tez) 
or using Hive 1.X releases.

INFO  : Query ID = hive_20160302005157_6ca43017-d52c-48c8-b4e4-e11c724c3d39

INFO  : Total jobs = 1

INFO  : Starting task [Stage-4:MAPREDLOCAL] in serial mode

INFO  : Execution completed successfully

INFO  : MapredLocal task succeeded

INFO  : Launching Job 1 out of 1

INFO  : Starting task [Stage-3:MAPRED] in serial mode

INFO  : Number of reduce tasks is set to 0 since there's no reduce operator

INFO  : number of splits:1

INFO  : Submitting tokens for job: job_1456723490535_5586

INFO  : The url to track the job: 
http://apache-master:8088/proxy/application_1456723490535_5586/

INFO  : Starting Job = job_1456723490535_5586, Tracking URL = 
http://apache-master:8088/proxy/application_1456723490535_5586/

INFO  : Kill Command = /opt/hadoop/hadoop-2.7.1/bin/hadoop job  -kill 
job_1456723490535_5586

INFO  : Hadoop job information for Stage-3: number of mappers: 0; number of 
reducers: 0

INFO  : 2016-03-02 00:52:13,316 Stage-3 map = 0%,  reduce = 0%

INFO  : 2016-03-02 00:52:17,416 Stage-3 map = 100%,  reduce = 0%

INFO  : Ended Job = job_1456723490535_5586

INFO  : MapReduce Jobs Launched:

INFO  : Stage-Stage-3:  HDFS Read: 0 HDFS Write: 0 SUCCESS

INFO  : Total MapReduce CPU Time Spent: 0 msec

INFO  : Completed executing 
command(queryId=hive_20160302005157_6ca43017-d52c-48c8-b4e4-e11c724c3d39); Time 
taken: 19.56 seconds

INFO  : OK

+--------+----------+----------+-----------+-----------+--+

| a.key  | a.a_one  | b.b_one  | a.a_zero  | b.b_zero  |

+--------+----------+----------+-----------+-----------+--+

| 11     | 1        | 1        | 0         | 0         |

+--------+----------+----------+-----------+-----------+--+

1 row selected (19.643 seconds)

 

 

From: Stephen Sprague [mailto:[email protected]] 
Sent: Tuesday, March 1, 2016 11:45 PM
To: [email protected]
Subject: Re: Wrong column is picked in HIVE 2.0.0 + TEZ 0.8.2 left join

 

good to know.  then per Jeff Zhang's thinking if you were to set the exec 
engine to 'mr' would it still fail?   if so, then its not Tez . :)

 

On Mon, Feb 29, 2016 at 9:31 PM, GAO Chi <[email protected] 
<mailto:[email protected]> > wrote:

 

Yes. We have not changed our script, and this only appears after we upgraded to 
new version at 24th.

 

 

Previously we’re using HIVE 1.2.0 + TEZ 0.7.0

 

 

Thanks!

 

Chi

 

From: Stephen Sprague [mailto:[email protected] <mailto:[email protected]> ] 
Sent: Tuesday, March 1, 2016 12:31 PM
To: [email protected] <mailto:[email protected]> 
Subject: Re: Wrong column is picked in HIVE 2.0.0 + TEZ 0.8.2 left join

 

very interesting.  so this did work correctly on your previous distribution of 
these two products?  May i ask what they were?

 

On Mon, Feb 29, 2016 at 8:24 PM, GAO Chi <[email protected] 
<mailto:[email protected]> > wrote:

Hi all,

 

We encountered a strange behavior after upgrading to HIVE 2.0.0 + TEZ 0.8.2.  

 

I simplified our query to this:

 

SELECT

  a.key,

  a.a_one,

  b.b_one,

  a.a_zero,

  b.b_zero

FROM

(

    SELECT

      11 key,

      0 confuse_you,

      1 a_one,

      0 a_zero

) a

LEFT JOIN 

(

    SELECT

      11 key,

      0 confuse_you,

      1 b_one,

      0 b_zero

) b 

ON a.key = b.key

;

 

 

Above query generates this unexpected result:

 

INFO  : Status: Running (Executing on YARN cluster with App id 
application_1456723490535_3653)

 

INFO  : Map 1: 0/1      Map 2: 0/1

INFO  : Map 1: 0/1      Map 2: 0(+1)/1

INFO  : Map 1: 0(+1)/1  Map 2: 0(+1)/1

INFO  : Map 1: 0(+1)/1  Map 2: 1/1

INFO  : Map 1: 1/1      Map 2: 1/1

INFO  : Completed executing 
command(queryId=hive_20160301115630_0a0dbee5-ba4b-45e7-b027-085f655640fd); Time 
taken: 10.225 seconds

INFO  : OK

+--------+----------+----------+-----------+-----------+--+

| a.key  | a.a_one  | b.b_one  | a.a_zero  | b.b_zero  |

+--------+----------+----------+-----------+-----------+--+

| 11     | 1        | 0        | 0         | 1         |

+--------+----------+----------+-----------+-----------+--+

 

If you change the constant value of subquery-b’s confuse_you column from 0 to 
2, the problem disappears. The plan returned from EXPLAIN shows the incorrect 
one is picking _col1 and _col2, while the correct one is picking _col2 and 
_col3 form sub query b.

 

Seems it cannot distinguish 2 columns with same constant value?

 

 

Anyone encountered similar problem?

 

 

Thanks!

 

Chi

RE: Wrong column is picked in HIVE 2.0.0 + TEZ 0.8.2 left join

Reply via email to