I tried with pig11 (from git), timing for the two variants are more comparable.

stats for `pig11 -b -e 'explain -script students-a.pig'`
6.33s user 0.74s system 153% cpu 4.611 total
6.55s user 0.68s system 155% cpu 4.664 total
6.40s user 0.79s system 157% cpu 4.560 total
6.47s user 0.62s system 155% cpu 4.560 total

stats for `pig11 -b -e 'explain -script students-b.pig'`
5.66s user 0.62s system 169% cpu 3.707 total
5.69s user 0.53s system 165% cpu 3.758 total
5.44s user 0.70s system 165% cpu 3.706 total
5.68s user 0.51s system 166% cpu 3.708 total

So looks like it was fixed somewhere for 0.11?
________________________________________
From: Jonathan Coveney [jcove...@gmail.com]
Sent: Thursday, August 09, 2012 11:00 AM
To: user@pig.apache.org
Subject: Re: Pig 0.10.0 slow startup

Can you do me a favor and run the exact same stuff with pig11? Just to
isolate if this is an issue that has been removed. I will also try and run
this on pig10, to see if I can see te same issue.

2012/8/8 Chun Yang <cy...@contractor.salesforce.com>

> Thanks Jonathan,
>
> Here are some numbers that I'm getting from Pig 0.10 and Pig 0.9.1:
>
> pig10 -b -e 'explain -script students-a.pig'  35.35s user 8.52s system 63%
> cpu 1:08.77 total
>
> pig10 -b -e 'explain -script students-b.pig'  5.32s user 0.48s system 130%
> cpu 4.460 total
>
> pig9 -b -e 'explain -script students-a.pig'  4.93s user 0.51s system 131%
> cpu 4.153 total
>
> pig9 -b -e 'explain -script students-b.pig'  3.86s user 0.41s system 131%
> cpu 3.254 total
>
> Seems like the first run is always slower, but subsequent runs are about
> the
> same:
>
> pig10 -b -e 'explain -script students-a.pig'  35.17s user 8.20s system 123%
> cpu 35.017 total
>
> pig10 -b -e 'explain -script students-a.pig'  35.41s user 8.55s system 122%
> cpu 35.803 total
>
> A little more than 1.5s slowdown :)
>
> Thanks,
> Chun
>
> On 8/8/12 5:38 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote:
>
> > Thanks for putting that together, Chun.
> >
> > So, it looks like there are ~400 instantiations of the class, and the
> time
> > from the first instantiation to the last one is about ~1.5s. Is that on
> the
> > order of the slowdown your experiencing?
> >
> > (note: I'm testing with Pig 11...if your slowdown is much higher than
> that,
> > I'll test on Pig 10)
> >
> > Either way, it seems like the slowdown is directly attributable to UDF
> > invocations. Have you seen slowdowns much larger than this?
> >
> > 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> >
> >> Hi Jonathan,
> >>
> >> Here is a more self-contained example than what I had before:
> >> http://ews.illinois.edu/~yang43/shared/students.tar.gz
> >>
> >> I wrote a trivial GFV class, but the slowdown still exists.
> >> students-a.pig starts up noticeably slower than students-b.pig .
> >>
> >> Thanks,
> >> Chun
> >>
> >> On 8/8/12 12:22 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote:
> >>
> >>> Thanks for this info. Can you go ahead and paste the whole GFV class?
> >>>
> >>> Thanks
> >>>
> >>> 2012/8/8 Chun Yang <cy...@contractor.salesforce.com>
> >>>
> >>>> Thanks Jonathan,
> >>>>
> >>>> I've tried to produce an example script which exhibits the slowdown
> and
> >>>> posted it on Pastebin: http://pastebin.com/kTSsDUr3
> >>>>
> >>>> The slowdown seems to occur when we are using a lot of UDFs to parse
> our
> >>>> input data. Variant A in the script is noticeably slower than variant
> B
> >> in
> >>>> Pig 0.10 while performance is similar in Pig 0.9.1
> >>>>
> >>>> I've pasted the exec() function of the GFV function on Pastebin as
> well:
> >>>> http://pastebin.com/FVnkQCJ5
> >>>>
> >>>> Please let us know if you need more details.
> >>>>
> >>>> Thanks,
> >>>> Chun
> >>>>
> >>>> On 8/7/12 10:07 PM, "Jonathan Coveney" <jcove...@gmail.com> wrote:
> >>>>
> >>>>> Can you guys give a script that has the issue? My tactic would be to
> >> use
> >>>>> some sort of profiler (we have access to YourKit for open source Pig
> >>>>> contribution work) and try and isolate what is triggering GC.
> >>>>>
> >>>>> 2012/8/7 Prashant Kommireddi <prash1...@gmail.com>
> >>>>>
> >>>>>> Hi All,
> >>>>>>
> >>>>>> Just wanted to follow-up on Chun's question. Several of our Pig
> users
> >>>> have
> >>>>>> been experiencing slow start-ups with Pig 0.10.0, when the same
> script
> >>>> runs
> >>>>>> fine with 0.9.1. Anyone else facing similar issues?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Prashant
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I'm trying to move from Pig 0.9.1 to Pig 0.10.0 . When I try to run
> >> the
> >>>>>> same
> >>>>>> script using the two Pig versions, 0.9.1 starts off fast and almost
> >>>>>> immediately submits the job to the cluster. On the other hand, Pig
> >>>> 0.10.0
> >>>>>> takes forever to submit the job. When I use the java option
> >>>>>> -XX:+PrintGCDetails, I see that for 0.10.0 the GC is being run many
> >>>> times
> >>>>>> before and after the job is submitted to the cluster.
> >>>>>>
> >>>>>> Does anyone know what is causing this and/or how I might be able to
> >>>>>> troubleshoot it?
> >>>>>>
> >>>>>> I've uploaded truncated output showing when GC happens to
> >>>>>> Pastebin:http://pastebin.com/B8WTHW9r
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Chun
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Reply via email to