[ 
https://issues.apache.org/jira/browse/PIG-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814456#comment-17814456
 ] 

Koji Noguchi commented on PIG-5445:
-----------------------------------

{quote}Basically 3rd argument to MergeJoinIndexer differed.
{quote}
This is a serializedPhyPlan passed to MergeJoinIndexer constructor.
{code:java}
    /** @param funcSpec : Loader specification.
     *  @param innerPlan : This is serialized version of LR plan. We
     *  want to keep only keys in our index file and not the whole tuple. So, 
we need LR and thus its plan
     *  to get keys out of the sampled tuple.
     * @param serializedPhyPlan Serialized physical plan on right side.
     * @throws ExecException
     */
    @SuppressWarnings("unchecked")
    public MergeJoinIndexer(String funcSpec, String innerPlan, String 
serializedPhyPlan,
            String udfCntxtSignature, String scope, String ignoreNulls) throws 
ExecException{
{code}
When deserializing both strings and printing out the physical plans, they both 
showed exact same physical plan
{noformat}
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
a: New For Each(false,false)[bag] - scope-30
|   |
|   Cast[int] - scope-27
|   |
|   |---Project[bytearray][0] - scope-26
|   |
|   Cast[int] - scope-29
|   |
|   |---Project[bytearray][1] - scope-28
{noformat}
Comparing the serialized string and checking the memory dump, it turns out that 
difference came from 
POForeach from "a: New For Each" contains an "inputs" param pointing to POLoad 
which holds "PigContext pc". These POLoad and PigContext were serialized as 
part of the MergeJoinIndexer which caused the difference in goldenfile outputs 
whenever anything changed in the config (which is stored in the PigContext).

> TestTezCompiler.testMergeCogroup fails whenever config is updated
> -----------------------------------------------------------------
>
>                 Key: PIG-5445
>                 URL: https://issues.apache.org/jira/browse/PIG-5445
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.19.0
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>
> TestTezCompiler.testMergeCogroup started failing after upgrading Tez (and 
> config that comes with it).
> {noformat}
> testMergeCogroupFailure
> expected:
> <|---a: 
> Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
>   
> pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[fWtsHFeWXvEhWm9Ls...XOuwcT+fzW1+yM]=','a_1-0','scope','...>
>  
> but was:
> <|---a: 
> Load(file:///tmp/input1:org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MergeJoinIndexer('org.a
>   
> pache.pig.test.TestMapSideCogroup$DummyCollectableLoader','.../doPMfwFKyneZ','eNq9[V01sG0UUnmycxHWSN...DyC6P4Drk9M9w=]=','a_1-0','scope','...>
> at org.apache.pig.tez.TestTezCompiler.run(TestTezCompiler.java:1472)
> at 
> org.apache.pig.tez.TestTezCompiler.testMergeCogroup(TestTezCompiler.java:292) 
> {noformat}
> (edited the diff above a bit to make it easier to identify where the 
> difference was)
> Basically 3rd argument to MergeJoinIndexer differed. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to