Hello, I posted this issue to the pig mailing list and I'm thinking the issue I'm having is more related to cassandra?
When I run pig scripts against hadoop it works as advertised, however when I try to have pig get data from cassandra it fails everytime. Cassandra: [cqlsh 4.1.0 | Cassandra 2.0.4 | CQL spec 3.1.1 | Thrift protocol 19.39.0] Hadoop (Cloudera): 2.0.0+1518 Map Reduce: v2 (Yarn) Pig: Apache Pig version 0.11.0-cdh4.5.0 The test schema is very simple: cqlsh:main> create table a (id int, name varchar, primary key (id)); cqlsh:main> insert into a (id, name) values (1, 'blah'); cqlsh:main> select * from a; id | name ----+------ 1 | blah (1 rows) bash-4.2$ ./apache-cassandra-2.0.4-src/examples/pig/bin/pig_cassandra -x local Using /home/hdfs/pig-0.12.0-src/pig-withouthadoop.jar. 2014-02-07 17:09:18,948 [main] INFO org.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 20 2012, 00:33:25 2014-02-07 17:09:18,949 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hdfs/pig_1391810958945.log 2014-02-07 17:09:19,373 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2014-02-07 17:09:19,377 [main] WARN org.apache.hadoop.conf.Configuration - mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used 2014-02-07 17:09:19,394 [main] WARN org.apache.hadoop.conf.Configuration -fs.default.name is deprecated. Instead, use fs.defaultFS 2014-02-07 17:09:19,395 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/hdfs/apache-cassandra-2.0.4-src/lib/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2014-02-07 17:09:20,026 [main] WARN org.apache.hadoop.conf.Configuration - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2014-02-07 17:09:20,030 [main] WARN org.apache.hadoop.conf.Configuration -fs.default.name is deprecated. Instead, use fs.defaultFS 2014-02-07 17:09:20,030 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address grunt> rows = LOAD 'cql://main/a' USING CqlStorage(); grunt> describe rows; rows: {id: int,name: chararray} If I'm reading this correctly, pig can get the columns from the table in question - but that's where things go awry. grunt> data = foreach rows generate $1; grunt> dump data; 2014-02-07 17:09:47,347 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2014-02-07 17:09:47,416 [main] INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned for rows: $0 2014-02-07 17:09:47,548 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2014-02-07 17:09:47,589 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2014-02-07 17:09:47,589 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2014-02-07 17:09:47,960 [main] WARN org.apache.hadoop.conf.Configuration -session.id is deprecated. Instead, use dfs.metrics.session-id 2014-02-07 17:09:47,968 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2014-02-07 17:09:48,055 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2014-02-07 17:09:48,075 [main] WARN org.apache.hadoop.conf.Configuration - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2014-02-07 17:09:48,075 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2014-02-07 17:09:48,075 [main] WARN org.apache.hadoop.conf.Configuration - mapred.output.compress is deprecated. Instead, use mapreduce.output.fileoutputformat.compress 2014-02-07 17:09:48,206 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2014-02-07 17:09:48,330 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String; Details at logfile: /home/hdfs/pig_1391810958945.log The log says: Pig Stack Trace --------------- ERROR 2998: Unhandled internal error. org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String; java.lang.NoSuchMethodError: org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String; at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:261) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:180) at org.apache.pig.PigServer.launchPlan(PigServer.java:1270) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1255) at org.apache.pig.PigServer.storeEx(PigServer.java:952) at org.apache.pig.PigServer.store(PigServer.java:919) at org.apache.pig.PigServer.openIterator(PigServer.java:832) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:490) at org.apache.pig.Main.main(Main.java:111) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:208) Any help on this is much appreciated. Thanks.