Thanks, that worked! I downloaded the version pre-built against hadoop1 and
the examples worked.

- David

On Tue, Sep 30, 2014 at 5:08 PM, Kan Zhang <kzh...@apache.org> wrote:

> >  java.lang.IncompatibleClassChangeError: Found interface
> org.apache.hadoop.mapreduce.JobContext, but class was expected
>
> Most likely it is the Hadoop 1 vs Hadoop 2 issue. The example was given
> for Hadoop 1 (default Hadoop version for Spark). You may try to set the
> output format class in conf for Hadoop 2, or recompile your Spark with
> Hadoop 1.
>
> On Tue, Sep 30, 2014 at 11:37 AM, David Vincelli <
> david.vince...@vantageanalytics.com> wrote:
>
>> I've been trying to get the cassandra_inputformat.py and
>> cassandra_outputformat.py examples running for the past half day. I am
>> running cassandra21 community from datastax on a single node (in my dev
>> environment) with spark-1.1.0-bin-hadoop2.4.
>>
>> I can connect and use cassandra via cqlsh and I can run the pyspark
>> computation of pi job.
>>
>> Unfortunately, I cannot run the cassandra_inputformat and
>> cassandra_outputformat examples succesfully.
>>
>> This is the output I am getting now:
>>
>> 14/09/30 18:15:41 INFO AkkaUtils: Connecting to HeartbeatReceiver:
>> akka.tcp://sparkDriver@dev:40208/user/HeartbeatReceiver
>> 14/09/30 18:15:42 INFO deprecation: mapreduce.outputformat.class is
>> deprecated. Instead, use mapreduce.job.outputformat.class
>> 14/09/30 18:15:43 INFO Converter: Loaded converter:
>> org.apache.spark.examples.pythonconverters.ToCassandraCQLKeyConverter
>> 14/09/30 18:15:43 INFO Converter: Loaded converter:
>> org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter
>> Traceback (most recent call last):
>>   File
>> "/opt/spark-1.1.0-bin-hadoop2.4/examples/src/main/python/cassandra_outputformat.py",
>> line 83, in <module>
>>
>> valueConverter="org.apache.spark.examples.pythonconverters.ToCassandraCQLValueConverter")
>>   File "/opt/spark-1.1.0-bin-hadoop2.4/python/pyspark/rdd.py", line 1184,
>> in saveAsNewAPIHadoopDataset
>>     keyConverter, valueConverter, True)
>>   File
>> "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>> line 538, in __call__
>>   File
>> "/opt/spark-1.1.0-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>> line 300, in get_return_value
>> py4j.protocol.Py4JJavaError: An error occurred while calling
>> z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
>> : java.lang.IncompatibleClassChangeError: Found interface
>> org.apache.hadoop.mapreduce.JobContext, but class was expected
>> at
>> org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.checkOutputSpecs(AbstractColumnFamilyOutputFormat.java:75)
>> at
>> org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:900)
>> at
>> org.apache.spark.api.python.PythonRDD$.saveAsHadoopDataset(PythonRDD.scala:687)
>> at
>> org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset(PythonRDD.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>> at py4j.Gateway.invoke(Gateway.java:259)
>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Should I have built a custom spark assembly? Am I missing a cassandra
>> driver? I have browsed through the documentation and found nothing
>> specifically relevant to cassandra, is there such a piece of documentation?
>>
>> Thank you,
>>
>> - David
>>
>
>

Reply via email to