Julien removed a dozen or so loader/storer instantiations. That can do it if you do work in constructors.
D On Fri, Aug 10, 2012 at 1:15 PM, Prashant Kommireddi <prash1...@gmail.com> wrote: > Thanks Chun. > > Jon, any idea what on 0.11 might have fixed it? > > On Thu, Aug 9, 2012 at 3:32 PM, Chun Yang > <cy...@contractor.salesforce.com>wrote: > >> I tried with pig11 (from git), timing for the two variants are more >> comparable. >> >> stats for `pig11 -b -e 'explain -script students-a.pig'` >> 6.33s user 0.74s system 153% cpu 4.611 total >> 6.55s user 0.68s system 155% cpu 4.664 total >> 6.40s user 0.79s system 157% cpu 4.560 total >> 6.47s user 0.62s system 155% cpu 4.560 total >> >> stats for `pig11 -b -e 'explain -script students-b.pig'` >> 5.66s user 0.62s system 169% cpu 3.707 total >> 5.69s user 0.53s system 165% cpu 3.758 total >> 5.44s user 0.70s system 165% cpu 3.706 total >> 5.68s user 0.51s system 166% cpu 3.708 total >> >> So looks like it was fixed somewhere for 0.11? >> ________________________________________ >> From: Jonathan Coveney [jcove...@gmail.com] >> Sent: Thursday, August 09, 2012 11:00 AM >> To: user@pig.apache.org >> Subject: Re: Pig 0.10.0 slow startup >> >> Can you do me a favor and run the exact same stuff with pig11? Just to >> isolate if this is an issue that has been removed. I will also try and run >> this on pig10, to see if I can see te same issue. >> >> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com> >> >> > Thanks Jonathan, >> > >> > Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1: >> > >> > pig10 -b -e 'explain -script students-a.pig' 35.35s user 8.52s system >> 63% >> > cpu 1:08.77 total >> > >> > pig10 -b -e 'explain -script students-b.pig' 5.32s user 0.48s system >> 130% >> > cpu 4.460 total >> > >> > pig9 -b -e 'explain -script students-a.pig' 4.93s user 0.51s system 131% >> > cpu 4.153 total >> > >> > pig9 -b -e 'explain -script students-b.pig' 3.86s user 0.41s system 131% >> > cpu 3.254 total >> > >> > Seems like the first run is always slower, but subsequent runs are about >> > the >> > same: >> > >> > pig10 -b -e 'explain -script students-a.pig' 35.17s user 8.20s system >> 123% >> > cpu 35.017 total >> > >> > pig10 -b -e 'explain -script students-a.pig' 35.41s user 8.55s system >> 122% >> > cpu 35.803 total >> > >> > A little more than 1.5s slowdown :) >> > >> > Thanks, >> > Chun >> > >> > On 8/8/12 5:38 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: >> > >> > > Thanks for putting that together, Chun. >> > > >> > > So, it looks like there are ~400 instantiations of the class, and the >> > time >> > > from the first instantiation to the last one is about ~1.5s. Is that on >> > the >> > > order of the slowdown your experiencing? >> > > >> > > (note: I'm testing with Pig 11...if your slowdown is much higher than >> > that, >> > > I'll test on Pig 10) >> > > >> > > Either way, it seems like the slowdown is directly attributable to UDF >> > > invocations. Have you seen slowdowns much larger than this? >> > > >> > > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com> >> > > >> > >> Hi Jonathan, >> > >> >> > >> Here is a more self-contained example than what I had before: >> > >> http://ews.illinois.edu/~yang43/shared/students.tar.gz >> > >> >> > >> I wrote a trivial GFV class, but the slowdown still exists. >> > >> students-a.pig starts up noticeably slower than students-b.pig . >> > >> >> > >> Thanks, >> > >> Chun >> > >> >> > >> On 8/8/12 12:22 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: >> > >> >> > >>> Thanks for this info. Can you go ahead and paste the whole GFV class? >> > >>> >> > >>> Thanks >> > >>> >> > >>> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com> >> > >>> >> > >>>> Thanks Jonathan, >> > >>>> >> > >>>> I've tried to produce an example script which exhibits the slowdown >> > and >> > >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3 >> > >>>> >> > >>>> The slowdown seems to occur when we are using a lot of UDFs to parse >> > our >> > >>>> input data. Variant A in the script is noticeably slower than >> variant >> > B >> > >> in >> > >>>> Pig 0.10 while performance is similar in Pig 0.9.1 >> > >>>> >> > >>>> I've pasted the exec() function of the GFV function on Pastebin as >> > well: >> > >>>> http://pastebin.com/FVnkQCJ5 >> > >>>> >> > >>>> Please let us know if you need more details. >> > >>>> >> > >>>> Thanks, >> > >>>> Chun >> > >>>> >> > >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote: >> > >>>> >> > >>>>> Can you guys give a script that has the issue? My tactic would be >> to >> > >> use >> > >>>>> some sort of profiler (we have access to YourKit for open source >> Pig >> > >>>>> contribution work) and try and isolate what is triggering GC. >> > >>>>> >> > >>>>> 2012/8/7 Prashant Kommireddi <prash1...@gmail.com> >> > >>>>> >> > >>>>>> Hi All, >> > >>>>>> >> > >>>>>> Just wanted to follow-up on Chun's question. Several of our Pig >> > users >> > >>>> have >> > >>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same >> > script >> > >>>> runs >> > >>>>>> fine with 0.9.1. Anyone else facing similar issues? >> > >>>>>> >> > >>>>>> Thanks, >> > >>>>>> Prashant >> > >>>>>> >> > >>>>>> Hi all, >> > >>>>>> >> > >>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to >> run >> > >> the >> > >>>>>> same >> > >>>>>> script using the two Pig versions, 0.9.1 starts off fast and >> almost >> > >>>>>> immediately submits the job to the cluster. On the other hand, Pig >> > >>>> 0.10.0 >> > >>>>>> takes forever to submit the job. When I use the java option >> > >>>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run >> many >> > >>>> times >> > >>>>>> before and after the job is submitted to the cluster. >> > >>>>>> >> > >>>>>> Does anyone know what is causing this and/or how I might be able >> to >> > >>>>>> troubleshoot it? >> > >>>>>> >> > >>>>>> I've uploaded truncated output showing when GC happens to >> > >>>>>> Pastebin:http://pastebin.com/B8WTHW9r >> > >>>>>> >> > >>>>>> Thanks, >> > >>>>>> Chun >> > >>>>>> >> > >>>> >> > >>>> >> > >> >> > >> >> > >> > >>