Hi All,

I'm using Pig-0.14.0 over Tez-0.7.0 for running some basic pig scripts. I'm
not able to see any performance gain using Tez. My pig scripts are taking
same amount of time on mapred executionType as well.

Following are the parameters which are in mapred-site.xml and being read by
Tez and I'm not able to override them even if i mention them in my
tez-site.xml:

 tez.runtime.shuffle.merge.percent=0.66
 tez.runtime.shuffle.fetch.buffer.percent=0.70
 tez.runtime.io.sort.mb=256
 tez.runtime.shuffle.memory.limit.percent=0.25
 tez.runtime.io.sort.factor=64
 tez.runtime.shuffle.connect.timeout=180000
 tez.runtime.internal.sorter.class=org.apache.hadoop.util.QuickSort
 tez.runtime.merge.progress.records=10000
 tez.runtime.compress=true
 tez.runtime.sort.spill.percent=0.8
 tez.runtime.shuffle.ssl.enable=false
 tez.runtime.ifile.readahead=true
 tez.runtime.shuffle.parallel.copies=10
 tez.runtime.ifile.readahead.bytes=4194304
 tez.runtime.task.input.post-merge.buffer.percent=0.0
 tez.runtime.shuffle.read.timeout=180000
 tez.runtime.compress.codec=org.apache.hadoop.io.compress.SnappyCodec



PFA the list of task counter. I can see a lot of data is being spilled but
if i try to increase tez.runtime.io.sort.mb through mapred-site.xml then my
script terminates with OOM exception.

Can you please suggest what parameters i should change to improve the
performance of pig using Tez?

Regards,
Sandeep

Attachment: Task-Counter
Description: Binary data

Reply via email to