Hi All, I am working with Spark 1.6.0 and pySpark shell specifically. I have an JavaRDD[org.apache.avro.GenericRecord] which I have converted to pythonRDD in the following way.
javaRDD = sc._jvm.java.package.loadJson("path to data", sc._jsc) javaPython = sc._jvm.SerDe.javaToPython(javaRDD) from pyspark.rdd import RDD pythonRDD=RDD(javaPython,sc) pythonRDD.first() However everytime I am trying to call collect() or first() method on pythonRDD I am getting the following error : 16/02/11 06:19:19 ERROR python.PythonRunner: Python worker exited unexpectedly (crashed) org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/disk2/spark6/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/worker.py", line 98, in main command = pickleSer._read_with_length(infile) File "/disk2/spark6/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/serializers.py", line 156, in _read_with_length length = read_int(stream) File "/disk2/spark6/spark-1.6.0-bin-hadoop2.4/python/lib/pyspark.zip/pyspark/serializers.py", line 545, in read_int raise EOFError EOFError at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166) at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207) at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: net.razorvine.pickle.PickleException: couldn't pickle object of type class org.apache.avro.generic.GenericData$Record at net.razorvine.pickle.Pickler.save(Pickler.java:142) at net.razorvine.pickle.Pickler.put_arrayOfObjects(Pickler.java:493) at net.razorvine.pickle.Pickler.dispatch(Pickler.java:205) at net.razorvine.pickle.Pickler.save(Pickler.java:137) at net.razorvine.pickle.Pickler.dump(Pickler.java:107) at net.razorvine.pickle.Pickler.dumps(Pickler.java:92) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:121) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.next(SerDeUtil.scala:110) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:110) at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452) at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1741) at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239) Thanks for your time, AnoopShiralige -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-couldn-t-pickle-object-of-type-class-T-tp26204.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org