So in the interest of being a little less i/o bound, and saving a whole mess of disk, I've started using com.twitter.elephantbird.pig.store.LzoTokenizedStorage for storage... or more accurately, will be using it as soon as I stop getting the following error (with stack trace):
ERROR 2998: Unhandled internal error. Implementing class java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632) at java.lang.ClassLoader.defineClass(ClassLoader.java:616) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632) at java.lang.ClassLoader.defineClass(ClassLoader.java:616) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:422) at org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:452) at org.apache.pig.impl.logicalLayer.parser.QueryParser.NonEvalFuncSpec(QueryParser.java:5087) at org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3568) at org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1369) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911) at org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:724) at org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1336) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1286) at org.apache.pig.PigServer.registerQuery(PigServer.java:460) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:738) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:414) Anyhow, the host machine is running CentOS, with the Cloudera distribution of hadoop and pig (pig version: Apache Pig version 0.7.0+16), and elephant-bird from the hadoop-gpl-packing RPM. The tweaks described at http://code.google.com/p/hadoop-gpl-packing/#Using_in_Pig have been applied, and the class still seems to be failing to load. Anyone have any idea what the problem might be (or how to solve it)? Thanks, Kris -- Kris Coward http://unripe.melon.org/ GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3