problems loading cassandra data from pig

Irooniam Mon, 10 Feb 2014 11:16:34 -0800

Hello,

I posted this issue to the pig mailing list and I'm thinking the issue I'm
having is more related to cassandra?


When I run pig scripts against hadoop it works as advertised, however when
I try to have pig get data from cassandra it fails everytime.

Cassandra: [cqlsh 4.1.0 | Cassandra 2.0.4 | CQL spec 3.1.1 | Thrift
protocol 19.39.0]

Hadoop (Cloudera): 2.0.0+1518

Map Reduce: v2 (Yarn)

Pig: Apache Pig version 0.11.0-cdh4.5.0


The test schema is very simple:

cqlsh:main> create table a (id int, name varchar, primary key (id));

cqlsh:main> insert into a (id, name) values (1, 'blah');

cqlsh:main> select * from a;

 id | name

----+------

1 | blah

 (1 rows)


bash-4.2$ ./apache-cassandra-2.0.4-src/examples/pig/bin/pig_cassandra -x
local

Using /home/hdfs/pig-0.12.0-src/pig-withouthadoop.jar.

2014-02-07 17:09:18,948 [main] INFO org.apache.pig.Main - Apache Pig
version 0.10.0 (r1328203) compiled Apr 20 2012, 00:33:25

2014-02-07 17:09:18,949 [main] INFO org.apache.pig.Main - Logging error
messages to: /home/hdfs/pig_1391810958945.log

2014-02-07 17:09:19,373 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting
to hadoop file system at: file:///

2014-02-07 17:09:19,377 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used

2014-02-07 17:09:19,394 [main] WARN
org.apache.hadoop.conf.Configuration -fs.default.name is deprecated.
Instead, use fs.defaultFS

2014-02-07 17:09:19,395 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in
[jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]


SLF4J: Found binding in
[jar:file:/home/hdfs/apache-cassandra-2.0.4-src/lib/slf4j-log4j12-1.7.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]


SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.

2014-02-07 17:09:20,026 [main] WARN org.apache.hadoop.conf.Configuration -
io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum

2014-02-07 17:09:20,030 [main] WARN
org.apache.hadoop.conf.Configuration -fs.default.name is deprecated.
Instead, use fs.defaultFS

2014-02-07 17:09:20,030 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

grunt> rows = LOAD 'cql://main/a' USING CqlStorage();

grunt> describe rows;

rows: {id: int,name: chararray}


If I'm reading this correctly, pig can get the columns from the table in
question - but that's where things go awry.

grunt> data = foreach rows generate $1;

grunt> dump data;

2014-02-07 17:09:47,347 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN

2014-02-07 17:09:47,416 [main] INFO
org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned
for rows: $0

2014-02-07 17:09:47,548 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false

2014-02-07 17:09:47,589 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1

2014-02-07 17:09:47,589 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1

2014-02-07 17:09:47,960 [main] WARN
org.apache.hadoop.conf.Configuration -session.id is deprecated.
Instead, use dfs.metrics.session-id

2014-02-07 17:09:47,968 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=

2014-02-07 17:09:48,055 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job

2014-02-07 17:09:48,075 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use
mapreduce.reduce.markreset.buffer.percent

2014-02-07 17:09:48,075 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3

2014-02-07 17:09:48,075 [main] WARN org.apache.hadoop.conf.Configuration -
mapred.output.compress is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress

2014-02-07 17:09:48,206 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job

2014-02-07 17:09:48,330 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2998: Unhandled internal error.
org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;


Details at logfile: /home/hdfs/pig_1391810958945.log


 The log says:

Pig Stack Trace

---------------

ERROR 2998: Unhandled internal error.
org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;


 java.lang.NoSuchMethodError:
org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;


at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:261)


at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:180)


at org.apache.pig.PigServer.launchPlan(PigServer.java:1270)

at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1255)

at org.apache.pig.PigServer.storeEx(PigServer.java:952)

at org.apache.pig.PigServer.store(PigServer.java:919)

at org.apache.pig.PigServer.openIterator(PigServer.java:832)

at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682)

at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)


at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)


at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)


at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)

at org.apache.pig.Main.run(Main.java:490)

at org.apache.pig.Main.main(Main.java:111)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)


at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)


at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:208)


Any help on this is much appreciated.

Thanks.

problems loading cassandra data from pig

Reply via email to