I tried with pig11 (from git), timing for the two variants are more comparable.
stats for `pig11 -b -e 'explain -script students-a.pig'` 6.33s user 0.74s system 153% cpu 4.611 total 6.55s user 0.68s system 155% cpu 4.664 total 6.40s user 0.79s system 157% cpu 4.560 total 6.47s user 0.62s system 155% cpu 4.560 total stats for `pig11 -b -e 'explain -script students-b.pig'` 5.66s user 0.62s system 169% cpu 3.707 total 5.69s user 0.53s system 165% cpu 3.758 total 5.44s user 0.70s system 165% cpu 3.706 total 5.68s user 0.51s system 166% cpu 3.708 total So looks like it was fixed somewhere for 0.11? ________________________________________ From: Jonathan Coveney [jcove...@gmail.com] Sent: Thursday, August 09, 2012 11:00 AM To: user@pig.apache.org Subject: Re: Pig 0.10.0 slow startup Can you do me a favor and run the exact same stuff with pig11? Just to isolate if this is an issue that has been removed. I will also try and run this on pig10, to see if I can see te same issue. 2012/8/8 Chun Yang <cy...@contractor.salesforce.com> > Thanks Jonathan, > > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1: > > pig10 -b -e 'explain -script students-a.pig' 35.35s user 8.52s system 63% > cpu 1:08.77 total > > pig10 -b -e 'explain -script students-b.pig' 5.32s user 0.48s system 130% > cpu 4.460 total > > pig9 -b -e 'explain -script students-a.pig' 4.93s user 0.51s system 131% > cpu 4.153 total > > pig9 -b -e 'explain -script students-b.pig' 3.86s user 0.41s system 131% > cpu 3.254 total > > Seems like the first run is always slower, but subsequent runs are about > the > same: > > pig10 -b -e 'explain -script students-a.pig' 35.17s user 8.20s system 123% > cpu 35.017 total > > pig10 -b -e 'explain -script students-a.pig' 35.41s user 8.55s system 122% > cpu 35.803 total > > A little more than 1.5s slowdown :) > > Thanks, > Chun > > On 8/8/12 5:38 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: > > > Thanks for putting that together, Chun. > > > > So, it looks like there are ~400 instantiations of the class, and the > time > > from the first instantiation to the last one is about ~1.5s. Is that on > the > > order of the slowdown your experiencing? > > > > (note: I'm testing with Pig 11...if your slowdown is much higher than > that, > > I'll test on Pig 10) > > > > Either way, it seems like the slowdown is directly attributable to UDF > > invocations. Have you seen slowdowns much larger than this? > > > > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com> > > > >> Hi Jonathan, > >> > >> Here is a more self-contained example than what I had before: > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz > >> > >> I wrote a trivial GFV class, but the slowdown still exists. > >> students-a.pig starts up noticeably slower than students-b.pig . > >> > >> Thanks, > >> Chun > >> > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: > >> > >>> Thanks for this info. Can you go ahead and paste the whole GFV class? > >>> > >>> Thanks > >>> > >>> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com> > >>> > >>>> Thanks Jonathan, > >>>> > >>>> I've tried to produce an example script which exhibits the slowdown > and > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3 > >>>> > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse > our > >>>> input data. Variant A in the script is noticeably slower than variant > B > >> in > >>>> Pig 0.10 while performance is similar in Pig 0.9.1 > >>>> > >>>> I've pasted the exec() function of the GFV function on Pastebin as > well: > >>>> http://pastebin.com/FVnkQCJ5 > >>>> > >>>> Please let us know if you need more details. > >>>> > >>>> Thanks, > >>>> Chun > >>>> > >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: > >>>> > >>>>> Can you guys give a script that has the issue? My tactic would be to > >> use > >>>>> some sort of profiler (we have access to YourKit for open source Pig > >>>>> contribution work) and try and isolate what is triggering GC. > >>>>> > >>>>> 2012/8/7 Prashant Kommireddi <prash1...@gmail.com> > >>>>> > >>>>>> Hi All, > >>>>>> > >>>>>> Just wanted to follow-up on Chun's question. Several of our Pig > users > >>>> have > >>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same > script > >>>> runs > >>>>>> fine with 0.9.1. Anyone else facing similar issues? > >>>>>> > >>>>>> Thanks, > >>>>>> Prashant > >>>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run > >> the > >>>>>> same > >>>>>> script using the two Pig versions, 0.9.1 starts off fast and almost > >>>>>> immediately submits the job to the cluster. On the other hand, Pig > >>>> 0.10.0 > >>>>>> takes forever to submit the job. When I use the java option > >>>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many > >>>> times > >>>>>> before and after the job is submitted to the cluster. > >>>>>> > >>>>>> Does anyone know what is causing this and/or how I might be able to > >>>>>> troubleshoot it? > >>>>>> > >>>>>> I've uploaded truncated output showing when GC happens to > >>>>>> Pastebin:http://pastebin.com/B8WTHW9r > >>>>>> > >>>>>> Thanks, > >>>>>> Chun > >>>>>> > >>>> > >>>> > >> > >> > >