Hi, Amit,

You can see the description of HIVE-5256 for more detailed explanation.

Both table aliases and names (if no alias) may run into this issue.

This issue happened to be covered by the XML serialization/deserialization of 
the MapredWork containing the join operator (HashMap 
serialization/deserialization will reverse the order of key-value pairs in the 
same bucket) and was exposed by HIVE-4078 because the copy of Mapredwork in the 
case of noconditionaltask optimization was optimized off.


From: Amit Sharma [mailto:amsha...@netflix.com]
Sent: Friday, September 13, 2013 6:05 AM
To: user@hive.apache.org
Subject: Re: 回复: hive 0.11 auto convert join bug report

Hi Navis,

I was trying to look at this email thread as well as the jira to understand the 
scope of this issue. Does this get triggered only in cases of using aliases 
which end up mapping to the same value upon hashing? Or can this be triggered 
under other conditions as well? What if the aliases are not used and the table 
names some how might map to similar hashcode values?

Also is changing the alias the only workaround for this problem or is there any 
other workaround possible?

Thanks,
Amit

On Sun, Aug 11, 2013 at 9:22 PM, Navis류승우 
<navis....@nexr.com<mailto:navis....@nexr.com>> wrote:
Hi,

Hive is notorious making different result with different aliases.
Changing alias was a final way to avoid bug in desperate situation.

I think the patch in the issue is ready, wish it's helpful.

Thanks.

2013/8/11  <wzc1...@gmail.com<mailto:wzc1...@gmail.com>>:
> Hi Navis,
>
> My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date' are
> the same and the code in MapJoinProcessor.java ignores the order of
> rowschema.
> I look at your patch and it's exactly the same place we are working on.
> Thanks for your patch.
>
> 在 2013年8月11日星期日,下午9:38,Navis류승우 写道:
>
> Hi,
>
> I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
> and attached patch for it.
>
> It needs full test for confirmation but you can try it.
>
> Thanks.
>
> 2013/8/11 <wzc1...@gmail.com<mailto:wzc1...@gmail.com>>:
>
> Hi all:
> when I change the table alias dim_pay_date to A, the query pass in hive
> 0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):
>
> use test;
> create table if not exists src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `A`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> It's quite strange and interesting now. I will keep searching for the answer
> to this issue.
>
>
>
> 在 2013年8月9日星期五,上午3:32,wzc1...@gmail.com<mailto:wzc1...@gmail.com> 写道:
>
> Hi all:
> I'm currently testing hive11 and encounter one bug with
> hive.auto.convert.join, I construct a testcase so everyone can reproduce
> it(or you can reach the testcase
> here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>
> use test;
> create table src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite
> into table src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string,
> `cityid` int, `userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24',
> '2011-03-24', 55 ,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `dim_pay_date`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>
>
> You should replace the path of kv1.txt by yourself. You can run the above
> query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException, You
> can see the explain result and the console output of the query here :
> https://gist.github.com/code6/6187569
>
> I compile the trunk code but it doesn't work with this query. I can run this
> query in hive 0.9 with hive.auto.convert.join turns on.
>
> I try to dig into this problem and I think it may be caused by the map join
> optimization. Some adjacent operators aren't match for the input/output
> tableinfo(column positions diff).
>
> I'm not able to fix this bug and I would appreciate it if someone would like
> to look into this problem.
>
> Thanks.
>
>

Reply via email to