[ 
https://issues.apache.org/jira/browse/PIG-5240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated PIG-5240:
----------------------------
    Summary: Fix TestPigRunner in spark mode for wrong inputStats  (was: Fix 
TestPigRunner#simpleMultiQueryTest3 in spark mode for wrong inputStats)

> Fix TestPigRunner in spark mode for wrong inputStats
> ----------------------------------------------------
>
>                 Key: PIG-5240
>                 URL: https://issues.apache.org/jira/browse/PIG-5240
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>             Fix For: spark-branch
>
>
> in  TestPigRunner#simpleMultiQueryTest3 ,
> the explain plan
> {code}
> #--------------------------------------------------
> # Spark Plan                                  
> #--------------------------------------------------
> Spark node scope-53
> Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage)
>  - scope-54
> |
> |---A: New For Each(false,false,false)[bag] - scope-10
>     |   |
>     |   Cast[int] - scope-2
>     |   |
>     |   |---Project[bytearray][0] - scope-1
>     |   |
>     |   Cast[int] - scope-5
>     |   |
>     |   |---Project[bytearray][1] - scope-4
>     |   |
>     |   Cast[int] - scope-8
>     |   |
>     |   |---Project[bytearray][2] - scope-7
>     |
>     |---A: 
> Load(hdfs://localhost:58892/user/root/input:org.apache.pig.builtin.PigStorage)
>  - scope-0--------
> Spark node scope-55
> Store(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage)
>  - scope-56
> |
> |---C: Filter[bag] - scope-14
>     |   |
>     |   Less Than or Equal[boolean] - scope-17
>     |   |
>     |   |---Project[int][1] - scope-15
>     |   |
>     |   |---Constant(5) - scope-16
>     |
>     
> |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage)
>  - scope-10--------
> Spark node scope-57
> C: 
> Store(hdfs://localhost:58892/user/root/output:org.apache.pig.builtin.PigStorage)
>  - scope-21
> |
> |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage)
>  - scope-14--------
> Spark node scope-65
> D: 
> Store(hdfs://localhost:58892/user/root/output2:org.apache.pig.builtin.PigStorage)
>  - scope-52
> |
> |---D: FRJoinSpark[tuple] - scope-44
>     |   |
>     |   Project[int][0] - scope-41
>     |   |
>     |   Project[int][0] - scope-42
>     |   |
>     |   Project[int][0] - scope-43
>     |
>     
> |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp-546700946:org.apache.pig.impl.io.InterStorage)
>  - scope-58
>     |
>     |---BroadcastSpark - scope-63
>     |   |
>     |   |---B: Filter[bag] - scope-26
>     |       |   |
>     |       |   Equal To[boolean] - scope-29
>     |       |   |
>     |       |   |---Project[int][0] - scope-27
>     |       |   |
>     |       |   |---Constant(3) - scope-28
>     |       |
>     |       
> |---Load(hdfs://localhost:58892/tmp/temp-1660154197/tmp1818797386:org.apache.pig.impl.io.InterStorage)
>  - scope-60
>     |
>     |---BroadcastSpark - scope-64
>         |
>         |---A1: New For Each(false,false,false)[bag] - scope-40
>             |   |
>             |   Cast[int] - scope-32
>             |   |
>             |   |---Project[bytearray][0] - scope-31
>             |   |
>             |   Cast[int] - scope-35
>             |   |
>             |   |---Project[bytearray][1] - scope-34
>             |   |
>             |   Cast[int] - scope-38
>             |   |
>             |   |---Project[bytearray][2] - scope-37
>             |
>             |---A1: 
> Load(hdfs://localhost:58892/user/root/input2:org.apache.pig.builtin.PigStorage)
>  - scope-30--------
> {code}
>  assertEquals(30, inputStats.get(0).getBytes()) is correct in spark mode,
>  assertEquals(18, inputStats.get(1).getBytes()) is wrong in spark mode as the 
> there are 3 loads in {{Spark node scope-65}}.  
> [{{stats.get("BytesRead")}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L93]
>  returns 49( guess this is the sum of 
> three loads({{input2}},{{tmp1818797386}},{{tmp-546700946}}). But current 
> [{{bytesRead}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L91]
>  is -1 because 
> [{{singleInput}}|https://github.com/apache/pig/blob/spark/src/org/apache/pig/tools/pigstats/spark/SparkJobStats.java#L92]
>  is false.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to