Re: java.lang.NullPointerException while performing rdd.SaveToCassandra

Ted Yu Mon, 02 May 2016 11:35:15 -0700

Adding back user@spark.

Since the top of stack trace is in Datastax class(es), I suggest polling on
their mailing list.


On Mon, May 2, 2016 at 11:29 AM, Piyush Verma <piy...@piyushverma.net>
wrote:

> Hmm weird. They show up on the Web interface.
>
> Wait, got it. Its wrapped up Inside the < raw >..< /raw > so Text-only
> mail clients prune what’s inside.
> Anyway here’s the text again. (Inline)
>
> > On 02-May-2016, at 23:56, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > Maybe you were trying to embed pictures for the error and your code -
> but they didn't go through.
> >
> > On Mon, May 2, 2016 at 10:32 AM, meson10 <sp...@piyushverma.net> wrote:
> > Hi,
> >
> > I am trying to save a RDD to Cassandra but I am running into the
> following
> > error:
>
> [{'key': 3, 'value': 'foobar'}]
>
> [Stage 9:>                                                          (0 +
> 2) / 2]
> [Stage 9:=============================>                             (1 +
> 1) / 2]WARN  2016-05-02 17:23:55,240
> org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in stage 9.0 (TID
> 11, 10.0.6.200): java.lang.NullPointerException
>         at com.datastax.bdp.spark.python.RDDPythonFunctions.com
> $datastax$bdp$spark$python$RDDPythonFunctions$$toCassandraRow(RDDPythonFunctions.scala:57)
>         at
> com.datastax.bdp.spark.python.RDDPythonFunctions$$anonfun$toCassandraRows$1.apply(RDDPythonFunctions.scala:73)
>         at
> com.datastax.bdp.spark.python.RDDPythonFunctions$$anonfun$toCassandraRows$1.apply(RDDPythonFunctions.scala:73)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at
> com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
>         at
> com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:106)
>         at
> com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:31)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at
> com.datastax.spark.connector.writer.GroupingBatchBuilder.foreach(GroupingBatchBuilder.scala:31)
>         at
> com.datastax.spark.connector.writer.TableWriter$$anonfun$write$1.apply(TableWriter.scala:155)
>         at
> com.datastax.spark.connector.writer.TableWriter$$anonfun$write$1.apply(TableWriter.scala:139)
>         at
> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:110)
>         at
> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:109)
>         at
> com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:139)
>         at
> com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
>         at
> com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:139)
>         at
> com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:37)
>         at
> com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:37)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
>         at org.apache.spark.scheduler.Task.run(Task.scala:70)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> ERROR 2016-05-02 17:23:55,406 org.apache.spark.scheduler.TaskSetManager:
> Task 1 in stage 9.0 failed 4 times; aborting job
> Traceback (most recent call last):
>   File "/home/ubuntu/test-spark.py", line 50, in <module>
>     main()
>   File "/home/ubuntu/test-spark.py", line 47, in main
>     runner.run()
>   File "/home/ubuntu/spark_common.py", line 62, in run
>     self.save_logs_to_cassandra()
>   File "/home/ubuntu/spark_common.py", line 142, in save_logs_to_cassandra
>     rdd.saveToCassandra(keyspace, tablename)
>   File "/usr/share/dse/spark/python/lib/pyspark.zip/pyspark/rdd.py", line
> 2313, in saveToCassandra
>   File
> "/usr/share/dse/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>   File
> "/usr/share/dse/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o149.saveToCassandra.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 1 in stage 9.0 failed 4 times, most recent failure: Lost task 1.3 in stage
> 9.0 (TID 14, 10.0.6.200): java.lang.NullPointerException
>         at com.datastax.bdp.spark.python.RDDPythonFunctions.com
> $datastax$bdp$spark$python$RDDPythonFunctions$$toCassandraRow(RDDPythonFunctions.scala:57)
>         at
> com.datastax.bdp.spark.python.RDDPythonFunctions$$anonfun$toCassandraRows$1.apply(RDDPythonFunctions.scala:73)
>         at
> com.datastax.bdp.spark.python.RDDPythonFunctions$$anonfun$toCassandraRows$1.apply(RDDPythonFunctions.scala:73)
>         at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>         at
> com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16)
>         at
> com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:106)
>         at
> com.datastax.spark.connector.writer.GroupingBatchBuilder.next(GroupingBatchBuilder.scala:31)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at
> com.datastax.spark.connector.writer.GroupingBatchBuilder.foreach(GroupingBatchBuilder.scala:31)
>         at
> com.datastax.spark.connector.writer.TableWriter$$anonfun$write$1.apply(TableWriter.scala:155)
>         at
> com.datastax.spark.connector.writer.TableWriter$$anonfun$write$1.apply(TableWriter.scala:139)
>         at
> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:110)
>         at
> com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:109)
>         at
> com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:139)
>         at
> com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:109)
>         at
> com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:139)
>         at
> com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:37)
>         at
> com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:37)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
>         at org.apache.spark.scheduler.Task.run(Task.scala:70)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> Driver stacktrace:
>         at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1276)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1267)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1266)
>         at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1266)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>         at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>         at scala.Option.foreach(Option.scala:236)
>         at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1460)
>         at
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1421)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
> > The Python code looks like this:
>
>         rdd = self.context.parallelize([{'key': 3, 'value': 'foobar'}])
>         print rdd.collect()
>         rdd.saveToCassandra("test", "dummy”)
>
> > I am using DSE 4.8.6 which runs Spark 1.4.2
> >
> > I ran through a bunch of existing posts on this mailing lists and have
> > already performed the following routines:
> >
> >  * Ensure that there is no redundant cassandra .jar lying around,
> > interfering with the process.
> >  * Wiped clean and reinstall DSE to ensure that.
> >  * Tried Loading data from Cassandra to ensure that Spark <-> Cassandra
> > communication is working. I used        print
> > self.context.cassandraTable(keyspace='test', table='dummy').collect() to
> > validate that.
> >  * Ensure there are no null values in my dataset that is being written.
> >  * The namespace and the table exist in Cassandra using cassandra@cqlsh>
> > SELECT * from test.dummy ;
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-NullPointerException-while-performing-rdd-SaveToCassandra-tp26862.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
> >
>
>

Re: java.lang.NullPointerException while performing rdd.SaveToCassandra

Reply via email to