Re: pyspark cassandra examples

Kan Zhang Tue, 30 Sep 2014 14:09:07 -0700

>  java.lang.IncompatibleClassChangeError: Found interface
org.apache.hadoop.mapreduce.JobContext, but class was expected


Most likely it is the Hadoop 1 vs Hadoop 2 issue. The example was given for
Hadoop 1 (default Hadoop version for Spark). You may try to set the output
format class in conf for Hadoop 2, or recompile your Spark with Hadoop 1.

On Tue, Sep 30, 2014 at 11:37 AM, David Vincelli <
david.vince...@vantageanalytics.com> wrote:

> I've been trying to get the cassandra_inputformat.py and
> cassandra_outputformat.py examples running for the past half day. I am
> running cassandra21 community from datastax on a single node (in my dev
> environment) with spark-1.1.0-bin-hadoop2.4.
>
> I can connect and use cassandra via cqlsh and I can run the pyspark
> computation of pi job.
>
> Unfortunately, I cannot run the cassandra_inputformat and
> cassandra_outputformat examples succesfully.
>
> This is the output I am getting now:
>
> 14/09/30 18:15:41 INFO AkkaUtils: Connecting to HeartbeatReceiver:
> akka.tcp://sparkDriver@dev:40208/user/HeartbeatReceiver
> 14/09/30 18:15:42 INFO deprecation: mapreduce.outputformat.class is
> deprecated. Instead, use mapreduce.job.outputformat.class
> 14/09/30 18:15:43 INFO Converter: Loaded converter:
> org.apache.spark.examples.pythonconverters.ToCassandraCQLKeyConverter
> 14/09/30 18:15:43 INFO Converter: Loaded converter:
> org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter
> Traceback (most recent call last):
>   File
> "/opt/spark-1.1.0-bin-hadoop2.4/examples/src/main/python/cassandra_outputformat.py",
> line 83, in <module>
>
> valueConverter="org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter")
>   File "/opt/spark-1.1.0-bin-hadoop2.4/python/pyspark/rdd.py", line 1184,
> in saveAsNewAPIHadoopDataset
>     keyConverter, valueConverter, True)
>   File
> "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>   File
> "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
> : java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
> at
> org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.checkOutputSpecs(AbstractColumnFamilyOutputFormat.java:75)
> at
> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900)
> at
> org.apache.spark.api.python.PythonRDD$.saveAsHadoopDataset(PythonRDD.scala:687)
> at
> org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset(PythonRDD.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:745)
>
> Should I have built a custom spark assembly? Am I missing a cassandra
> driver? I have browsed through the documentation and found nothing
> specifically relevant to cassandra, is there such a piece of documentation?
>
> Thank you,
>
> - David
>

Re: pyspark cassandra examples

Reply via email to