Hi Jonathan, Here is a more self-contained example than what I had before: http://ews.illinois.edu/~yang43/shared/students.tar.gz
I wrote a trivial GFV class, but the slowdown still exists. students-a.pig starts up noticeably slower than students-b.pig . Thanks, Chun On 8/8/12 12:22 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: > Thanks for this info. Can you go ahead and paste the whole GFV class? > > Thanks > > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com> > >> Thanks Jonathan, >> >> I've tried to produce an example script which exhibits the slowdown and >> posted it on Pastebin: http://pastebin.com/kTSsDUr3 >> >> The slowdown seems to occur when we are using a lot of UDFs to parse our >> input data. Variant A in the script is noticeably slower than variant B in >> Pig 0.10 while performance is similar in Pig 0.9.1 >> >> I've pasted the exec() function of the GFV function on Pastebin as well: >> http://pastebin.com/FVnkQCJ5 >> >> Please let us know if you need more details. >> >> Thanks, >> Chun >> >> On 8/7/12 10:07 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: >> >>> Can you guys give a script that has the issue? My tactic would be to use >>> some sort of profiler (we have access to YourKit for open source Pig >>> contribution work) and try and isolate what is triggering GC. >>> >>> 2012/8/7 Prashant Kommireddi <prash1...@gmail.com> >>> >>>> Hi All, >>>> >>>> Just wanted to follow-up on Chun's question. Several of our Pig users >> have >>>> been experiencing slow start-ups with Pig 0.10.0, when the same script >> runs >>>> fine with 0.9.1. Anyone else facing similar issues? >>>> >>>> Thanks, >>>> Prashant >>>> >>>> Hi all, >>>> >>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run the >>>> same >>>> script using the two Pig versions, 0.9.1 starts off fast and almost >>>> immediately submits the job to the cluster. On the other hand, Pig >> 0.10.0 >>>> takes forever to submit the job. When I use the java option >>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many >> times >>>> before and after the job is submitted to the cluster. >>>> >>>> Does anyone know what is causing this and/or how I might be able to >>>> troubleshoot it? >>>> >>>> I've uploaded truncated output showing when GC happens to >>>> Pastebin:http://pastebin.com/B8WTHW9r >>>> >>>> Thanks, >>>> Chun >>>> >> >>