Re: 回复： hive 0.11 auto convert join bug report

Amit Sharma Thu, 12 Sep 2013 21:39:28 -0700

Hi Navis,

I was trying to look at this email thread as well as the jira to understand
the scope of this issue. Does this get triggered only in cases of using
aliases which end up mapping to the same value upon hashing? Or can this be
triggered under other conditions as well? What if the aliases are not used
and the table names some how might map to similar hashcode values?


Also is changing the alias the only workaround for this problem or is there
any other workaround possible?

Thanks,
Amit


On Sun, Aug 11, 2013 at 9:22 PM, Navis류승우 <navis....@nexr.com> wrote:

> Hi,
>
> Hive is notorious making different result with different aliases.
> Changing alias was a final way to avoid bug in desperate situation.
>
> I think the patch in the issue is ready, wish it's helpful.
>
> Thanks.
>
> 2013/8/11  <wzc1...@gmail.com>:
> > Hi Navis,
> >
> > My colleague chenchun finds that hashcode of 'deal' and 'dim_pay_date'
> are
> > the same and the code in MapJoinProcessor.java ignores the order of
> > rowschema.
> > I look at your patch and it's exactly the same place we are working on.
> > Thanks for your patch.
> >
> > 在 2013年8月11日星期日，下午9:38，Navis류승우 写道：
> >
> > Hi,
> >
> > I've booked this on https://issues.apache.org/jira/browse/HIVE-5056
> > and attached patch for it.
> >
> > It needs full test for confirmation but you can try it.
> >
> > Thanks.
> >
> > 2013/8/11 <wzc1...@gmail.com>:
> >
> > Hi all:
> > when I change the table alias dim_pay_date to A, the query pass in hive
> > 0.11(
> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass
> ):
> >
> > use test;
> > create table if not exists src ( `key` int,`val` string);
> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
> overwrite
> > into table src;
> > drop table if exists orderpayment_small;
> > create table orderpayment_small (`dealid` int,`date` string,`time`
> string,
> > `cityid` int, `userid` int);
> > insert overwrite table orderpayment_small select 748, '2011-03-24',
> > '2011-03-24', 55 ,5372613 from src limit 1;
> > drop table if exists user_small;
> > create table user_small( userid int);
> > insert overwrite table user_small select key from src limit 100;
> > set hive.auto.convert.join.noconditionaltask.size = 200;
> > SELECT
> > `A`.`date`
> > , `deal`.`dealid`
> > FROM `orderpayment_small` `orderpayment`
> > JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> > `orderpayment`.`dealid`
> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> > `orderpayment`.`cityid`
> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> > limit 5;
> >
> >
> > It's quite strange and interesting now. I will keep searching for the
> answer
> > to this issue.
> >
> >
> >
> > 在 2013年8月9日星期五，上午3:32，wzc1...@gmail.com 写道：
> >
> > Hi all:
> > I'm currently testing hive11 and encounter one bug with
> > hive.auto.convert.join, I construct a testcase so everyone can reproduce
> > it(or you can reach the testcase
> > here:
> https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
> >
> > use test;
> > create table src ( `key` int,`val` string);
> > load data local inpath '/Users/code6/git/hive/data/files/kv1.txt'
> overwrite
> > into table src;
> > drop table if exists orderpayment_small;
> > create table orderpayment_small (`dealid` int,`date` string,`time`
> string,
> > `cityid` int, `userid` int);
> > insert overwrite table orderpayment_small select 748, '2011-03-24',
> > '2011-03-24', 55 ,5372613 from src limit 1;
> > drop table if exists user_small;
> > create table user_small( userid int);
> > insert overwrite table user_small select key from src limit 100;
> > set hive.auto.convert.join.noconditionaltask.size = 200;
> > SELECT
> > `dim_pay_date`.`date`
> > , `deal`.`dealid`
> > FROM `orderpayment_small` `orderpayment`
> > JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` =
> > `orderpayment`.`date`
> > JOIN `orderpayment_small` `deal` ON `deal`.`dealid` =
> > `orderpayment`.`dealid`
> > JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` =
> > `orderpayment`.`cityid`
> > JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> > limit 5;
> >
> >
> > You should replace the path of kv1.txt by yourself. You can run the above
> > query in hive 0.11 and it will fail with ArrayIndexOutOfBoundsException,
> You
> > can see the explain result and the console output of the query here :
> > https://gist.github.com/code6/6187569
> >
> > I compile the trunk code but it doesn't work with this query. I can run
> this
> > query in hive 0.9 with hive.auto.convert.join turns on.
> >
> > I try to dig into this problem and I think it may be caused by the map
> join
> > optimization. Some adjacent operators aren't match for the input/output
> > tableinfo(column positions diff).
> >
> > I'm not able to fix this bug and I would appreciate it if someone would
> like
> > to look into this problem.
> >
> > Thanks.
> >
> >
>

Re: 回复： hive 0.11 auto convert join bug report

Reply via email to