Can you post the application logs? It would be helpful if you could run with "tez.task.generate.counters.per.io=true". This would generate the per IO statistics which can be useful for debugging.
~Rajesh.B On Thu, Sep 3, 2015 at 1:20 PM, Sandeep Kumar <[email protected]> wrote: > Hi All, > > I'm using Pig-0.14.0 over Tez-0.7.0 for running some basic pig scripts. > I'm not able to see any performance gain using Tez. My pig scripts are > taking same amount of time on mapred executionType as well. > > Following are the parameters which are in mapred-site.xml and being read > by Tez and I'm not able to override them even if i mention them in my > tez-site.xml: > > tez.runtime.shuffle.merge.percent=0.66 > tez.runtime.shuffle.fetch.buffer.percent=0.70 > tez.runtime.io.sort.mb=256 > tez.runtime.shuffle.memory.limit.percent=0.25 > tez.runtime.io.sort.factor=64 > tez.runtime.shuffle.connect.timeout=180000 > tez.runtime.internal.sorter.class=org.apache.hadoop.util.QuickSort > tez.runtime.merge.progress.records=10000 > tez.runtime.compress=true > tez.runtime.sort.spill.percent=0.8 > tez.runtime.shuffle.ssl.enable=false > tez.runtime.ifile.readahead=true > tez.runtime.shuffle.parallel.copies=10 > tez.runtime.ifile.readahead.bytes=4194304 > tez.runtime.task.input.post-merge.buffer.percent=0.0 > tez.runtime.shuffle.read.timeout=180000 > tez.runtime.compress.codec=org.apache.hadoop.io.compress.SnappyCodec > > > > PFA the list of task counter. I can see a lot of data is being spilled but > if i try to increase tez.runtime.io.sort.mb through mapred-site.xml then > my script terminates with OOM exception. > > Can you please suggest what parameters i should change to improve the > performance of pig using Tez? > > Regards, > Sandeep >
