I profiled Pig running in single-JVM local mode (using "-x local") and 75% of processing time is spent building JARs and optimizing the execution plan:
50% in org.apache.pig.impl.util.JarManager.createJar 25% in org.apache.pig.newplan.optimizer.PlanOptimizer.optimize The remaining 25% is spent in Hadoop running map/reduce. To improve performance I want to avoid generating Hadoop code every time I run a Pig script. I want to be able to run different data through the same Pig script, where the data changes frequently but the Pig script never changes. Is there a way to reuse the generated JAR files instead of regenerating them every time? Regards, John
