[ 
https://issues.apache.org/jira/browse/HIVE-16341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959640#comment-15959640
 ] 

Jason Dere edited comment on HIVE-16341 at 4/6/17 8:52 PM:
-----------------------------------------------------------

Committed to master/branch-2/branch-2.3


was (Author: jdere):
Committed to master

> Tez Task Execution Summary has incorrect input record counts on some operators
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-16341
>                 URL: https://issues.apache.org/jira/browse/HIVE-16341
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>             Fix For: 2.3.0, 3.0.0
>
>         Attachments: HIVE-16341.1.patch, HIVE-16341.2.patch
>
>
> {noformat}
> Task Execution Summary
> --------------------------------------------------------------------------------------------------------------------------------
>   VERTICES  TOTAL_TASKS  FAILED_ATTEMPTS  KILLED_TASKS   DURATION(ms)  
> CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  OUTPUT_RECORDS
> --------------------------------------------------------------------------------------------------------------------------------
>      Map 1          167                0             0       17640.00     
> 2,109,200       23,068    150,000,004      11,995,136
>     Map 11            5                0             0       10559.00        
> 71,960          633      4,023,690         799,900
>     Map 13            1                0             0        2244.00         
> 6,090           29             25               3
>      Map 3            1                0             0        2849.00         
> 7,080           99             25               3
>      Map 5          271                0             0       55834.00    
> 12,934,890      358,376  1,500,000,001   1,500,000,161
>      Map 7          241                0             0       91243.00     
> 5,020,860       71,182  1,827,250,341     652,413,443
> Reducer 10            1                0             0        1010.00         
> 1,900            0              4               0
> Reducer 12            1                0             0        3854.00         
> 1,320            0        799,900               1
> Reducer 14            1                0             0        1420.00         
> 3,790           45              3               1
>  Reducer 2            1                0             0        9720.00         
> 6,220          122     11,995,136               1
>  Reducer 4            1                0             0         810.00         
> 2,100          105              3               1
>  Reducer 6            1                0             0       24863.00         
> 3,260            5  1,500,000,161               1
>  Reducer 8          412                0             0       88215.00    
> 17,106,440      184,524  2,165,208,640           1,864
>  Reducer 9            2                0             0       29752.00         
> 3,980            0          1,864               4
> --------------------------------------------------------------------------------------------------------------------
> {noformat}
> Seeing this on queries using runtime filtering. Noticed the INPUT_RECORDS 
> look incorrect for the reducers that are responsible for aggregating the 
> min/max/bloomfilter (Reducers 12, 14, 2, 6). For example Reducer 2 shows 12M 
> input records. However looking at the task logs for Reducer 2, there were 
> only 167 input records.
> It looks like Map 1 has 2 different output vertices (Reducer 2 and Reducer 
> 8), but the total output rows for Map 1 (rather than just the rows going to 
> each specific vertex) is being counted in the input rows for both Reducer 2 
> and Reducer 8.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to