[ https://issues.apache.org/jira/browse/PIG-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adam Szita updated PIG-5240: ---------------------------- Summary: Fix TestPigRunner in spark mode for wrong inputStats (was: Fix TestPigRunner#simpleMultiQueryTest3 in spark mode for wrong inputStats) > Fix TestPigRunner in spark mode for wrong inputStats > ---------------------------------------------------- > > Key: PIG-5240 > URL: https://issues.apache.org/jira/browse/PIG-5240 > Project: Pig > Issue Type: Sub-task > Components: spark > Reporter: liyunzhang_intel > Fix For: spark-branch > > > in TestPigRunner#simpleMultiQueryTest3 , > the explain plan > {code} > #-------------------------------------------------- > # Spark Plan > #-------------------------------------------------- > Spark node scope-53 > Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) > - scope-54 > | > |---A: New For Each(false,false,false)[bag] - scope-10 > | | > | Cast[int] - scope-2 > | | > | |---Project[bytearray][0] - scope-1 > | | > | Cast[int] - scope-5 > | | > | |---Project[bytearray][1] - scope-4 > | | > | Cast[int] - scope-8 > | | > | |---Project[bytearray][2] - scope-7 > | > |---A: > Load(hdfs://localhost:58892/user/root/input:org.apache.pig.builtin.PigStorage) > - scope-0-------- > Spark node scope-55 > Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) > - scope-56 > | > |---C: Filter[bag] - scope-14 > | | > | Less Than or Equal[boolean] - scope-17 > | | > | |---Project[int][1] - scope-15 > | | > | |---Constant(5) - scope-16 > | > > |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) > - scope-10-------- > Spark node scope-57 > C: > Store(hdfs://localhost:58892/user/root/output:org.apache.pig.builtin.PigStorage) > - scope-21 > | > |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) > - scope-14-------- > Spark node scope-65 > D: > Store(hdfs://localhost:58892/user/root/output2:org.apache.pig.builtin.PigStorage) > - scope-52 > | > |---D: FRJoinSpark[tuple] - scope-44 > | | > | Project[int][0] - scope-41 > | | > | Project[int][0] - scope-42 > | | > | Project[int][0] - scope-43 > | > > |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage) > - scope-58 > | > |---BroadcastSpark - scope-63 > | | > | |---B: Filter[bag] - scope-26 > | | | > | | Equal To[boolean] - scope-29 > | | | > | | |---Project[int][0] - scope-27 > | | | > | | |---Constant(3) - scope-28 > | | > | > |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage) > - scope-60 > | > |---BroadcastSpark - scope-64 > | > |---A1: New For Each(false,false,false)[bag] - scope-40 > | | > | Cast[int] - scope-32 > | | > | |---Project[bytearray][0] - scope-31 > | | > | Cast[int] - scope-35 > | | > | |---Project[bytearray][1] - scope-34 > | | > | Cast[int] - scope-38 > | | > | |---Project[bytearray][2] - scope-37 > | > |---A1: > Load(hdfs://localhost:58892/user/root/input2:org.apache.pig.builtin.PigStorage) > - scope-30-------- > {code} > assertEquals(30, inputStats.get(0).getBytes()) is correct in spark mode, > assertEquals(18, inputStats.get(1).getBytes()) is wrong in spark mode as the > there are 3 loads in {{Spark node scope-65}}. > [{{stats.get("BytesRead")}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L93] > returns 49( guess this is the sum of > three loads({{input2}},{{tmp1818797386}},{{tmp-546700946}}). But current > [{{bytesRead}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L91] > is -1 because > [{{singleInput}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L92] > is false. -- This message was sent by Atlassian JIRA (v6.3.15#6346)