Pig performance profiling and reusing an optimized plan

John Amos Tue, 16 Aug 2011 10:09:14 -0700

I profiled Pig running in single-JVM local mode (using "-x local") and
75% of processing time is spent building JARs and optimizing the
execution plan:


 

50% in org.apache.pig.impl.util.JarManager.createJar

25% in org.apache.pig.newplan.optimizer.PlanOptimizer.optimize

 

The remaining 25% is spent in Hadoop running map/reduce.  To improve
performance I want to avoid generating Hadoop code every time I run a
Pig script.  I want to be able to run different data through the same
Pig script, where the data changes frequently but the Pig script never
changes.  Is there a way to reuse the generated JAR files instead of
regenerating them every time?

Regards,

John

Pig performance profiling and reusing an optimized plan

Reply via email to