@Rohini, I used new version of pig i.e. 0.15.0 unfortunately the performance of my script degraded. 2015-09-03 15:15:24,698 [main] INFO org.apache.pig.Main - Pig script completed in 4 minutes, 1 second and 22 milliseconds (241022 ms)
whereas earlier it was taking hardly 3 minutes and 27 seconds. PFA the task counters. Following are the version of softwares being used: HadoopVersion: 2.6.0-cdh5.4.4 PigVersion: 0.15.1-SNAPSHOT TezVersion: 0.7.0 Regards, Sandeep On Thu, Sep 3, 2015 at 2:46 PM, Sandeep Kumar <[email protected]> wrote: > @Rajesh, PFA the required statistics. Its difficult to share application > log because they are huge in size(i.e. 167MB). In case you want anything > specific from those logs then please let me know. > > @Rohini, > Thanks for suggesting regarding new version of Pig. I'll give it a try for > sure. > > Regards, > Sandeep > > On Thu, Sep 3, 2015 at 2:31 PM, Rohini Palaniswamy < > [email protected]> wrote: > >> Sandeep, >> Can you try with Pig 0.15 first? There is ton of fixes that has gone >> in for Pig on Tez into that release and many of them are performance fixes. >> >> Regards, >> Rohini >> >> On Thu, Sep 3, 2015 at 1:05 AM, Rajesh Balamohan <[email protected]> >> wrote: >> >>> Can you post the application logs? It would be helpful if you could run >>> with "tez.task.generate.counters.per.io=true". This would generate the >>> per IO statistics which can be useful for debugging. >>> >>> >>> ~Rajesh.B >>> >>> On Thu, Sep 3, 2015 at 1:20 PM, Sandeep Kumar <[email protected]> >>> wrote: >>> >>>> Hi All, >>>> >>>> I'm using Pig-0.14.0 over Tez-0.7.0 for running some basic pig scripts. >>>> I'm not able to see any performance gain using Tez. My pig scripts are >>>> taking same amount of time on mapred executionType as well. >>>> >>>> Following are the parameters which are in mapred-site.xml and being >>>> read by Tez and I'm not able to override them even if i mention them in my >>>> tez-site.xml: >>>> >>>> tez.runtime.shuffle.merge.percent=0.66 >>>> tez.runtime.shuffle.fetch.buffer.percent=0.70 >>>> tez.runtime.io.sort.mb=256 >>>> tez.runtime.shuffle.memory.limit.percent=0.25 >>>> tez.runtime.io.sort.factor=64 >>>> tez.runtime.shuffle.connect.timeout=180000 >>>> tez.runtime.internal.sorter.class=org.apache.hadoop.util.QuickSort >>>> tez.runtime.merge.progress.records=10000 >>>> tez.runtime.compress=true >>>> tez.runtime.sort.spill.percent=0.8 >>>> tez.runtime.shuffle.ssl.enable=false >>>> tez.runtime.ifile.readahead=true >>>> tez.runtime.shuffle.parallel.copies=10 >>>> tez.runtime.ifile.readahead.bytes=4194304 >>>> tez.runtime.task.input.post-merge.buffer.percent=0.0 >>>> tez.runtime.shuffle.read.timeout=180000 >>>> tez.runtime.compress.codec=org.apache.hadoop.io.compress.SnappyCodec >>>> >>>> >>>> >>>> PFA the list of task counter. I can see a lot of data is being spilled >>>> but if i try to increase tez.runtime.io.sort.mb through >>>> mapred-site.xml then my script terminates with OOM exception. >>>> >>>> Can you please suggest what parameters i should change to improve the >>>> performance of pig using Tez? >>>> >>>> Regards, >>>> Sandeep >>>> >>> >>> >> >
Task-Counter
Description: Binary data
