Thanks, that worked! I downloaded the version pre-built against hadoop1 and the examples worked.
- David On Tue, Sep 30, 2014 at 5:08 PM, Kan Zhang <kzh...@apache.org> wrote: > > java.lang.IncompatibleClassChangeError: Found interface > org.apache.hadoop.mapreduce.JobContext, but class was expected > > Most likely it is the Hadoop 1 vs Hadoop 2 issue. The example was given > for Hadoop 1 (default Hadoop version for Spark). You may try to set the > output format class in conf for Hadoop 2, or recompile your Spark with > Hadoop 1. > > On Tue, Sep 30, 2014 at 11:37 AM, David Vincelli < > david.vince...@vantageanalytics.com> wrote: > >> I've been trying to get the cassandra_inputformat.py and >> cassandra_outputformat.py examples running for the past half day. I am >> running cassandra21 community from datastax on a single node (in my dev >> environment) with spark-1.1.0-bin-hadoop2.4. >> >> I can connect and use cassandra via cqlsh and I can run the pyspark >> computation of pi job. >> >> Unfortunately, I cannot run the cassandra_inputformat and >> cassandra_outputformat examples succesfully. >> >> This is the output I am getting now: >> >> 14/09/30 18:15:41 INFO AkkaUtils: Connecting to HeartbeatReceiver: >> akka.tcp://sparkDriver@dev:40208/user/HeartbeatReceiver >> 14/09/30 18:15:42 INFO deprecation: mapreduce.outputformat.class is >> deprecated. Instead, use mapreduce.job.outputformat.class >> 14/09/30 18:15:43 INFO Converter: Loaded converter: >> org.apache.spark.examples.pythonconverters.ToCassandraCQLKeyConverter >> 14/09/30 18:15:43 INFO Converter: Loaded converter: >> org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter >> Traceback (most recent call last): >> File >> "/opt/spark-1.1.0-bin-hadoop2.4/examples/src/main/python/cassandra_outputformat.py", >> line 83, in <module> >> >> valueConverter="org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter") >> File "/opt/spark-1.1.0-bin-hadoop2.4/python/pyspark/rdd.py", line 1184, >> in saveAsNewAPIHadoopDataset >> keyConverter, valueConverter, True) >> File >> "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", >> line 538, in __call__ >> File >> "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", >> line 300, in get_return_value >> py4j.protocol.Py4JJavaError: An error occurred while calling >> z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset. >> : java.lang.IncompatibleClassChangeError: Found interface >> org.apache.hadoop.mapreduce.JobContext, but class was expected >> at >> org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.checkOutputSpecs(AbstractColumnFamilyOutputFormat.java:75) >> at >> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900) >> at >> org.apache.spark.api.python.PythonRDD$.saveAsHadoopDataset(PythonRDD.scala:687) >> at >> org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset(PythonRDD.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) >> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) >> at py4j.Gateway.invoke(Gateway.java:259) >> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) >> at py4j.commands.CallCommand.execute(CallCommand.java:79) >> at py4j.GatewayConnection.run(GatewayConnection.java:207) >> at java.lang.Thread.run(Thread.java:745) >> >> Should I have built a custom spark assembly? Am I missing a cassandra >> driver? I have browsed through the documentation and found nothing >> specifically relevant to cassandra, is there such a piece of documentation? >> >> Thank you, >> >> - David >> > >