[ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai closed PIG-834. -------------------------- > incorrect plan when algebraic functions are nested > -------------------------------------------------- > > Key: PIG-834 > URL: https://issues.apache.org/jira/browse/PIG-834 > Project: Pig > Issue Type: Bug > Components: impl > Reporter: Thejas M Nair > Assignee: Ashutosh Chauhan > Fix For: 0.7.0 > > Attachments: pig-834.patch, pig-834_2.patch, pig-834_3.patch > > > a = load 'students.txt' as (c1,c2,c3,c4); > c = group a by c2; > f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2)); > Notice that Distinct udf is missing in Combiner and reduce stage. As a result > distinct does not function, and incorrect results are produced. > Distinct should have been evaluated in the 3 stages and output of Distinct > should be given to COUNT in reduce stage. > {code} > # Map Reduce Plan > #-------------------------------------------------- > MapReduce node 1-122 > Map Plan > Local Rearrange[tuple]{bytearray}(false) - 1-139 > | | > | Project[bytearray][1] - 1-140 > | > |---New For Each(false,false)[bag] - 1-127 > | | > | POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125 > | | > | |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126 > | | > | |---Project[bag][2] - 1-123 > | | > | |---Project[bag][1] - 1-124 > | | > | Project[bytearray][0] - 1-133 > | > |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141 > | > > |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) > - 1-111-------- > Combine Plan > Local Rearrange[tuple]{bytearray}(false) - 1-143 > | | > | Project[bytearray][1] - 1-144 > | > |---New For Each(false,false)[bag] - 1-132 > | | > | POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130 > | | > | |---Project[bag][0] - 1-135 > | | > | Project[bytearray][1] - 1-134 > | > |---POCombinerPackage[tuple]{bytearray} - 1-137-------- > Reduce Plan > Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121 > | > |---New For Each(false)[bag] - 1-120 > | | > | POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119 > | | > | |---Project[bag][0] - 1-136 > | > |---POCombinerPackage[tuple]{bytearray} - 1-145-------- > Global sort: false > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.