Hi Jonathan,

Here is a more self-contained example than what I had before:
http://ews.illinois.edu/~yang43/shared/students.tar.gz

I wrote a trivial GFV class, but the slowdown still exists.
students-a.pig starts up noticeably slower than students-b.pig .

Thanks,
Chun

On 8/8/12 12:22 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote:

> Thanks for this info. Can you go ahead and paste the whole GFV class?
> 
> Thanks
> 
> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> 
>> Thanks Jonathan,
>> 
>> I've tried to produce an example script which exhibits the slowdown and
>> posted it on Pastebin: http://pastebin.com/kTSsDUr3
>> 
>> The slowdown seems to occur when we are using a lot of UDFs to parse our
>> input data. Variant A in the script is noticeably slower than variant B in
>> Pig 0.10 while performance is similar in Pig 0.9.1
>> 
>> I've pasted the exec() function of the GFV function on Pastebin as well:
>> http://pastebin.com/FVnkQCJ5
>> 
>> Please let us know if you need more details.
>> 
>> Thanks,
>> Chun
>> 
>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote:
>> 
>>> Can you guys give a script that has the issue? My tactic would be to use
>>> some sort of profiler (we have access to YourKit for open source Pig
>>> contribution work) and try and isolate what is triggering GC.
>>> 
>>> 2012/8/7 Prashant Kommireddi <prash1...@gmail.com>
>>> 
>>>> Hi All,
>>>> 
>>>> Just wanted to follow-up on Chun's question. Several of our Pig users
>> have
>>>> been experiencing slow start-ups with Pig 0.10.0, when the same script
>> runs
>>>> fine with 0.9.1. Anyone else facing similar issues?
>>>> 
>>>> Thanks,
>>>> Prashant
>>>> 
>>>> Hi all,
>>>> 
>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the
>>>> same
>>>> script using the two Pig versions, 0.9.1 starts off fast and almost
>>>> immediately submits the job to the cluster. On the other hand, Pig
>> 0.10.0
>>>> takes forever to submit the job. When I use the java option
>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
>> times
>>>> before and after the job is submitted to the cluster.
>>>> 
>>>> Does anyone know what is causing this and/or how I might be able to
>>>> troubleshoot it?
>>>> 
>>>> I've uploaded truncated output showing when GC happens to
>>>> Pastebin:http://pastebin.com/B8WTHW9r
>>>> 
>>>> Thanks,
>>>> Chun
>>>> 
>> 
>> 

Reply via email to