Thanks Chun. Jon, any idea what on 0.11 might have fixed it?
On Thu, Aug 9, 2012 at 3:32 PM, Chun Yang <cy...@contractor.salesforce.com>wrote: > I tried with pig11 (from git), timing for the two variants are more > comparable. > > stats for `pig11 -b -e 'explain -script students-a.pig'` > 6.33s user 0.74s system 153% cpu 4.611 total > 6.55s user 0.68s system 155% cpu 4.664 total > 6.40s user 0.79s system 157% cpu 4.560 total > 6.47s user 0.62s system 155% cpu 4.560 total > > stats for `pig11 -b -e 'explain -script students-b.pig'` > 5.66s user 0.62s system 169% cpu 3.707 total > 5.69s user 0.53s system 165% cpu 3.758 total > 5.44s user 0.70s system 165% cpu 3.706 total > 5.68s user 0.51s system 166% cpu 3.708 total > > So looks like it was fixed somewhere for 0.11? > ________________________________________ > From: Jonathan Coveney [jcove...@gmail.com] > Sent: Thursday, August 09, 2012 11:00 AM > To: user@pig.apache.org > Subject: Re: Pig 0.10.0 slow startup > > Can you do me a favor and run the exact same stuff with pig11? Just to > isolate if this is an issue that has been removed. I will also try and run > this on pig10, to see if I can see te same issue. > > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com> > > > Thanks Jonathan, > > > > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1: > > > > pig10 -b -e 'explain -script students-a.pig' 35.35s user 8.52s system > 63% > > cpu 1:08.77 total > > > > pig10 -b -e 'explain -script students-b.pig' 5.32s user 0.48s system > 130% > > cpu 4.460 total > > > > pig9 -b -e 'explain -script students-a.pig' 4.93s user 0.51s system 131% > > cpu 4.153 total > > > > pig9 -b -e 'explain -script students-b.pig' 3.86s user 0.41s system 131% > > cpu 3.254 total > > > > Seems like the first run is always slower, but subsequent runs are about > > the > > same: > > > > pig10 -b -e 'explain -script students-a.pig' 35.17s user 8.20s system > 123% > > cpu 35.017 total > > > > pig10 -b -e 'explain -script students-a.pig' 35.41s user 8.55s system > 122% > > cpu 35.803 total > > > > A little more than 1.5s slowdown :) > > > > Thanks, > > Chun > > > > On 8/8/12 5:38 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: > > > > > Thanks for putting that together, Chun. > > > > > > So, it looks like there are ~400 instantiations of the class, and the > > time > > > from the first instantiation to the last one is about ~1.5s. Is that on > > the > > > order of the slowdown your experiencing? > > > > > > (note: I'm testing with Pig 11...if your slowdown is much higher than > > that, > > > I'll test on Pig 10) > > > > > > Either way, it seems like the slowdown is directly attributable to UDF > > > invocations. Have you seen slowdowns much larger than this? > > > > > > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com> > > > > > >> Hi Jonathan, > > >> > > >> Here is a more self-contained example than what I had before: > > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz > > >> > > >> I wrote a trivial GFV class, but the slowdown still exists. > > >> students-a.pig starts up noticeably slower than students-b.pig . > > >> > > >> Thanks, > > >> Chun > > >> > > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: > > >> > > >>> Thanks for this info. Can you go ahead and paste the whole GFV class? > > >>> > > >>> Thanks > > >>> > > >>> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com> > > >>> > > >>>> Thanks Jonathan, > > >>>> > > >>>> I've tried to produce an example script which exhibits the slowdown > > and > > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3 > > >>>> > > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse > > our > > >>>> input data. Variant A in the script is noticeably slower than > variant > > B > > >> in > > >>>> Pig 0.10 while performance is similar in Pig 0.9.1 > > >>>> > > >>>> I've pasted the exec() function of the GFV function on Pastebin as > > well: > > >>>> http://pastebin.com/FVnkQCJ5 > > >>>> > > >>>> Please let us know if you need more details. > > >>>> > > >>>> Thanks, > > >>>> Chun > > >>>> > > >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: > > >>>> > > >>>>> Can you guys give a script that has the issue? My tactic would be > to > > >> use > > >>>>> some sort of profiler (we have access to YourKit for open source > Pig > > >>>>> contribution work) and try and isolate what is triggering GC. > > >>>>> > > >>>>> 2012/8/7 Prashant Kommireddi <prash1...@gmail.com> > > >>>>> > > >>>>>> Hi All, > > >>>>>> > > >>>>>> Just wanted to follow-up on Chun's question. Several of our Pig > > users > > >>>> have > > >>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same > > script > > >>>> runs > > >>>>>> fine with 0.9.1. Anyone else facing similar issues? > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Prashant > > >>>>>> > > >>>>>> Hi all, > > >>>>>> > > >>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to > run > > >> the > > >>>>>> same > > >>>>>> script using the two Pig versions, 0.9.1 starts off fast and > almost > > >>>>>> immediately submits the job to the cluster. On the other hand, Pig > > >>>> 0.10.0 > > >>>>>> takes forever to submit the job. When I use the java option > > >>>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run > many > > >>>> times > > >>>>>> before and after the job is submitted to the cluster. > > >>>>>> > > >>>>>> Does anyone know what is causing this and/or how I might be able > to > > >>>>>> troubleshoot it? > > >>>>>> > > >>>>>> I've uploaded truncated output showing when GC happens to > > >>>>>> Pastebin:http://pastebin.com/B8WTHW9r > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Chun > > >>>>>> > > >>>> > > >>>> > > >> > > >> > > > > >