Can you try the Cassandra connector 1.5? It is also compatible with Spark 1.6 according to their documentation https://github.com/datastax/spark-cassandra-connector#version-compatibility You can also crosspost it over here https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user
On Fri, Jul 1, 2016 at 5:45 PM, Joaquin Alzola <joaquin.alz...@lebara.com> wrote: > HI Akhil > > > > I am using: > > Cassandra: 3.0.5 > > Spark: 1.6.1 > > Scala 2.10 > > Spark-cassandra connector: 1.6.0 > > > > *From:* Akhil Das [mailto:ak...@hacked.work] > *Sent:* 01 July 2016 11:38 > *To:* Joaquin Alzola <joaquin.alz...@lebara.com> > *Cc:* user@spark.apache.org > *Subject:* Re: Remote RPC client disassociated > > > > This looks like a version conflict, which version of spark are you using? > The Cassandra connector you are using is for Scala 2.10x and Spark 1.6 > version. > > > > On Thu, Jun 30, 2016 at 6:34 PM, Joaquin Alzola <joaquin.alz...@lebara.com> > wrote: > > HI List, > > > > I am launching this spark-submit job: > > > > hadoop@testbedocg:/mnt/spark> bin/spark-submit --packages > com.datastax.spark:spark-cassandra-connector_2.10:1.6.0 --jars > /mnt/spark/lib/TargetHolding_pyspark-cassandra-0.3.5.jar spark_v2.py > > > > spark_v2.py is: > > from pyspark_cassandra import CassandraSparkContext, Row > > from pyspark import SparkContext, SparkConf > > from pyspark.sql import SQLContext > > conf = SparkConf().setAppName("test").setMaster("spark:// > 192.168.23.31:7077").set("spark.cassandra.connection.host", > "192.168.23.31") > > sc = CassandraSparkContext(conf=conf) > > table = > sc.cassandraTable("lebara_diameter_codes","nl_lebara_diameter_codes") > > food_count = table.select("errorcode2001").groupBy("errorcode2001").count() > > food_count.collect() > > > > > > Error I get when running the above command: > > > > [Stage 0:> (0 + > 3) / 7]16/06/30 10:40:36 ERROR TaskSchedulerImpl: Lost executor 0 on as5: > Remote RPC client disassociated. Likely due to containers exceeding > thresholds, or network issues. Check driver logs for WARN messages. > > [Stage 0:> (0 + > 7) / 7]16/06/30 10:40:40 ERROR TaskSchedulerImpl: Lost executor 1 on as4: > Remote RPC client disassociated. Likely due to containers exceeding > thresholds, or network issues. Check driver logs for WARN messages. > > [Stage 0:> (0 + > 5) / 7]16/06/30 10:40:42 ERROR TaskSchedulerImpl: Lost executor 3 on as5: > Remote RPC client disassociated. Likely due to containers exceeding > thresholds, or network issues. Check driver logs for WARN messages. > > [Stage 0:> (0 + > 4) / 7]16/06/30 10:40:46 ERROR TaskSchedulerImpl: Lost executor 4 on as4: > Remote RPC client disassociated. Likely due to containers exceeding > thresholds, or network issues. Check driver logs for WARN messages. > > 16/06/30 10:40:46 ERROR TaskSetManager: Task 5 in stage 0.0 failed 4 > times; aborting job > > Traceback (most recent call last): > > File "/mnt/spark-1.6.1-bin-hadoop2.6/spark_v2.py", line 11, in <module> > > food_count = > table.select("errorcode2001").groupBy("errorcode2001").count() > > File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1004, in > count > > File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 995, in sum > > File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 869, in > fold > > File "/mnt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 771, in > collect > > File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line > 813, in __call__ > > File "/mnt/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line > 308, in get_return_value > > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > > : org.apache.spark.SparkException: Job aborted due to stage failure: Task > 5 in stage 0.0 failed 4 times, most recent failure: Lost task 5.3 in stage > 0.0 (TID 14, as4): ExecutorLostFailure (executor 4 exited caused by one of > the running tasks) Reason: Remote RPC client disassociated. Likely due to > containers exceeding thresholds, or network issues. Check driver logs for > WARN messages. > > Driver stacktrace: > > at org.apache.spark.scheduler.DAGScheduler.org > $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) > > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) > > at scala.Option.foreach(Option.scala:236) > > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) > > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) > > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) > > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) > > at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927) > > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) > > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) > > at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) > > at org.apache.spark.rdd.RDD.collect(RDD.scala:926) > > at > org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:405) > > at > org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > > at > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) > > at py4j.Gateway.invoke(Gateway.java:259) > > at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > > at py4j.commands.CallCommand.execute(CallCommand.java:79) > > at py4j.GatewayConnection.run(GatewayConnection.java:209) > > at java.lang.Thread.run(Thread.java:745) > > > > > > From the jobs URL spark web (*stderr log page for > app-20160630104030-0086/4)*: > > > > stderr log page for app-20160630104030-0086/4 > > 16/06/30 10:44:34 ERROR util.Utils: Uncaught exception in thread stdout > writer for python > > java.lang.AbstractMethodError: > pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object; > > at > com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315) > > at > com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315) > > at > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > at > scala.collection.Iterator$$anon$13.next(Iterator.scala:372) > > at > com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16) > > at > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > at > scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914) > > at > scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929) > > at > scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968) > > at > scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972) > > at > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > at > scala.collection.Iterator$class.foreach(Iterator.scala:727) > > at > scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > at > org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452) > > at > org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280) > > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765) > > at > org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239) > > 16/06/30 10:44:34 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[stdout writer for python,5,main] > > java.lang.AbstractMethodError: > pyspark_cassandra.DeferringRowReader.read(Lcom/datastax/driver/core/Row;Lcom/datastax/spark/connector/CassandraRowMetadata;)Ljava/lang/Object; > > at > com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315) > > at > com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:315) > > at > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > at > scala.collection.Iterator$$anon$13.next(Iterator.scala:372) > > at > com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16) > > at > scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > > at > scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:914) > > at > scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929) > > at > scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:968) > > at > scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972) > > at > scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > > at > scala.collection.Iterator$class.foreach(Iterator.scala:727) > > at > scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > > at > org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452) > > at > org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280) > > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765) > > at > org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239) > BR > > > > Joaquin > > This email is confidential and may be subject to privilege. If you are not > the intended recipient, please do not copy or disclose its content but > contact the sender immediately upon receipt. > > > > > > -- > > Cheers! > > > This email is confidential and may be subject to privilege. If you are not > the intended recipient, please do not copy or disclose its content but > contact the sender immediately upon receipt. > -- Cheers!