What version are you running with? 

thanks
— Hitesh 

> On May 5, 2016, at 10:31 AM, Kurt Muehlner <kmuehl...@connexity.com> wrote:
> 
> Hello,
> 
> We have a Pig/Tez application which is exhibiting a strange problem.  This 
> application was recently migrated from Pig/MR to Pig/Tez.  We carefully 
> vetted during QA that both MR and Tez versions produced identical results.  
> However, after deploying to production, we noticed that occasionally, results 
> are not the same (either as compared to MR results, or results of Tez 
> processing the same data on a QA cluster).
> 
> We’re still looking into the root cause, but I’d like to reach out to the 
> user group in case anyone has seen anything similar, or has suggestions on 
> what might be wrong/what to investigate.
> 
> *** What we know so far ***
> Results discrepancy occurs ONLY when the number of containers given to the 
> application by YARN is less than the number requested (we have disabled 
> auto-parallelism, and are using SET_DEFAULT_PARALLEL=50 in all pig scripts).  
> When this occurs, we also see a corresponding discrepancy in the the file 
> system counters HDFS_READ_OPS and HDFS_BYTES_READ (lower when number of 
> containers is low), despite the fact that in all cases number of records 
> processed is identical.
> 
> Thus, when the production cluster is very busy, we get invalid results.  We 
> have kept a separate instance of the Pig/Tez application running on another 
> cluster where it never competes for resources, so we have been able to 
> compare results for each run of the application, which has allowed us to 
> diagnose the problem this far.  By comparing results on these two clusters, 
> we also know that the ratio (actual HDFS_READ_OPS)/(expected HDFS_READ_OPS) 
> correlates with the ratio (actual containers)/(requested containers).  
> Likewise, we see the same correlation between hdfs ops ratio and container 
> ratio.
> 
> Below are some relevant counters.  For each counter, the first line is the 
> value from the production cluster showing the problem, and the second line is 
> the value from the QA cluster running on the same data.
> 
> Any hints/suggestions/questions are most welcome.
> 
> Thanks,
> Kurt
> 
> org.apache.tez.common.counters.DAGCounter
> 
>  NUM_SUCCEEDED_TASKS=950
>  NUM_SUCCEEDED_TASKS=950
> 
>  TOTAL_LAUNCHED_TASKS=950
>  TOTAL_LAUNCHED_TASKS=950
> 
> File System Counters
> 
>  FILE_BYTES_READ=7745801982
>  FILE_BYTES_READ=8003771938
> 
>  FILE_BYTES_WRITTEN=9725468612
>  FILE_BYTES_WRITTEN=9675253887
> 
>  *HDFS_BYTES_READ=9487600888  (when number of containers equals the number 
> requested, this counter is the same between the two clusters)
>  *HDFS_BYTES_READ=17996466110
> 
>  *HDFS_READ_OPS=3080  (when number of containers equals the number requested, 
> this counter is the same between the two clusters)
>  *HDFS_READ_OPS=3600
> 
>  HDFS_WRITE_OPS=900
>  HDFS_WRITE_OPS=900
> 
> org.apache.tez.common.counters.TaskCounter
>  INPUT_RECORDS_PROCESSED=28729671
>  INPUT_RECORDS_PROCESSED=28729671
> 
> 
>  OUTPUT_RECORDS=33655895
>  OUTPUT_RECORDS=33655895
> 
>  OUTPUT_BYTES=28290888628
>  OUTPUT_BYTES=28294000270
> 
> Input(s):
> Successfully read 2254733 records (1632743360 bytes) from: "input1"
> Successfully read 2254733 records (1632743360 bytes) from: "input1"
> 
> 
> Output(s):
> Successfully stored 0 records in: “output1”
> Successfully stored 0 records in: "output1”
> 
> Successfully stored 56019 records (10437069 bytes) in: “output2”
> Successfully stored 56019 records (10437069 bytes) in: "output2”
> 
> Successfully stored 2254733 records (1651936175 bytes) in: "output3”
> Successfully stored 2254733 records (1651936175 bytes) in: "output3”
> 
> Successfully stored 1160599 records (823479742 bytes) in: "output4”
> Successfully stored 1160599 records (823480450 bytes) in: "output4”
> 
> Successfully stored 28605 records (21176320 bytes) in: "output5”
> Successfully stored 28605 records (21177552 bytes) in: "output5”
> 
> Successfully stored 6574 records (4442933 bytes) in: "output6”
> Successfully stored 6574 records (4442933 bytes) in: "output6”
> 
> Successfully stored 111416 records (164375858 bytes) in: "output7”
> Successfully stored 111416 records (164379800 bytes) in: "output7”
> 
> Successfully stored 542 records (387761 bytes) in: "output8”
> Successfully stored 542 records (387762 bytes) in: "output8"
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

Reply via email to