Re: Pig performance profiling and reusing an optimized plan

Dmitriy Ryaboy Tue, 16 Aug 2011 13:53:55 -0700

For local mode, it is not necessary to generate the jars. This is fixed in
trunk, and the trivial patch can be applied to 8 or 9 if you like (
https://issues.apache.org/jira/browse/PIG-2128).
There's a lot more optimization we can do there for the MR mode, such as
stick the jars into distributed cache instead of unjarring them and
re-packaging everything every time. There are tickets for this (with
patches, even).
Storing the optimized plan and reusing it is a good idea, we should consider
caching plans.. open a jira?


D

On Tue, Aug 16, 2011 at 10:08 AM, John Amos <[email protected]> wrote:

> I profiled Pig running in single-JVM local mode (using "-x local") and
> 75% of processing time is spent building JARs and optimizing the
> execution plan:
>
>
>
> 50% in org.apache.pig.impl.util.JarManager.createJar
>
> 25% in org.apache.pig.newplan.optimizer.PlanOptimizer.optimize
>
>
>
> The remaining 25% is spent in Hadoop running map/reduce.  To improve
> performance I want to avoid generating Hadoop code every time I run a
> Pig script.  I want to be able to run different data through the same
> Pig script, where the data changes frequently but the Pig script never
> changes.  Is there a way to reuse the generated JAR files instead of
> regenerating them every time?
>
> Regards,
>
> John
>
>

Re: Pig performance profiling and reusing an optimized plan

Reply via email to