For local mode, it is not necessary to generate the jars. This is fixed in trunk, and the trivial patch can be applied to 8 or 9 if you like ( https://issues.apache.org/jira/browse/PIG-2128). There's a lot more optimization we can do there for the MR mode, such as stick the jars into distributed cache instead of unjarring them and re-packaging everything every time. There are tickets for this (with patches, even). Storing the optimized plan and reusing it is a good idea, we should consider caching plans.. open a jira?
D On Tue, Aug 16, 2011 at 10:08 AM, John Amos <[email protected]> wrote: > I profiled Pig running in single-JVM local mode (using "-x local") and > 75% of processing time is spent building JARs and optimizing the > execution plan: > > > > 50% in org.apache.pig.impl.util.JarManager.createJar > > 25% in org.apache.pig.newplan.optimizer.PlanOptimizer.optimize > > > > The remaining 25% is spent in Hadoop running map/reduce. To improve > performance I want to avoid generating Hadoop code every time I run a > Pig script. I want to be able to run different data through the same > Pig script, where the data changes frequently but the Pig script never > changes. Is there a way to reuse the generated JAR files instead of > regenerating them every time? > > Regards, > > John > >
