@Rohini, I used new version of pig i.e. 0.15.0 unfortunately the
performance of my script degraded.
2015-09-03 15:15:24,698 [main] INFO  org.apache.pig.Main - Pig script
completed in 4 minutes, 1 second and 22 milliseconds (241022 ms)

whereas earlier it was taking hardly 3 minutes and 27 seconds.

PFA the task counters. Following are the version of softwares being used:

HadoopVersion:
2.6.0-cdh5.4.4

PigVersion:
0.15.1-SNAPSHOT

TezVersion:
0.7.0


Regards,
Sandeep

On Thu, Sep 3, 2015 at 2:46 PM, Sandeep Kumar <[email protected]>
wrote:

> @Rajesh, PFA the required statistics. Its difficult to share application
> log because they are huge in size(i.e. 167MB). In case you want anything
> specific from those logs then please let me know.
>
> @Rohini,
> Thanks for suggesting regarding new version of Pig. I'll give it a try for
> sure.
>
> Regards,
> Sandeep
>
> On Thu, Sep 3, 2015 at 2:31 PM, Rohini Palaniswamy <
> [email protected]> wrote:
>
>> Sandeep,
>>    Can you try with Pig 0.15 first? There is ton of fixes that has gone
>> in for Pig on Tez into that release and many of them are performance fixes.
>>
>> Regards,
>> Rohini
>>
>> On Thu, Sep 3, 2015 at 1:05 AM, Rajesh Balamohan <[email protected]>
>> wrote:
>>
>>> Can you post the application logs?  It would be helpful if you could run
>>> with "tez.task.generate.counters.per.io=true". This would generate the
>>> per IO statistics which can be useful for debugging.
>>>
>>>
>>> ~Rajesh.B
>>>
>>> On Thu, Sep 3, 2015 at 1:20 PM, Sandeep Kumar <[email protected]>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I'm using Pig-0.14.0 over Tez-0.7.0 for running some basic pig scripts.
>>>> I'm not able to see any performance gain using Tez. My pig scripts are
>>>> taking same amount of time on mapred executionType as well.
>>>>
>>>> Following are the parameters which are in mapred-site.xml and being
>>>> read by Tez and I'm not able to override them even if i mention them in my
>>>> tez-site.xml:
>>>>
>>>>  tez.runtime.shuffle.merge.percent=0.66
>>>>  tez.runtime.shuffle.fetch.buffer.percent=0.70
>>>>  tez.runtime.io.sort.mb=256
>>>>  tez.runtime.shuffle.memory.limit.percent=0.25
>>>>  tez.runtime.io.sort.factor=64
>>>>  tez.runtime.shuffle.connect.timeout=180000
>>>>  tez.runtime.internal.sorter.class=org.apache.hadoop.util.QuickSort
>>>>  tez.runtime.merge.progress.records=10000
>>>>  tez.runtime.compress=true
>>>>  tez.runtime.sort.spill.percent=0.8
>>>>  tez.runtime.shuffle.ssl.enable=false
>>>>  tez.runtime.ifile.readahead=true
>>>>  tez.runtime.shuffle.parallel.copies=10
>>>>  tez.runtime.ifile.readahead.bytes=4194304
>>>>  tez.runtime.task.input.post-merge.buffer.percent=0.0
>>>>  tez.runtime.shuffle.read.timeout=180000
>>>>  tez.runtime.compress.codec=org.apache.hadoop.io.compress.SnappyCodec
>>>>
>>>>
>>>>
>>>> PFA the list of task counter. I can see a lot of data is being spilled
>>>> but if i try to increase tez.runtime.io.sort.mb through
>>>> mapred-site.xml then my script terminates with OOM exception.
>>>>
>>>> Can you please suggest what parameters i should change to improve the
>>>> performance of pig using Tez?
>>>>
>>>> Regards,
>>>> Sandeep
>>>>
>>>
>>>
>>
>

Attachment: Task-Counter
Description: Binary data

Reply via email to