I'm trying to perform operations on a large RDD, that ends up being about 1.3 GB in memory when loaded in. It's being cached in memory during the first operation, but when another task begins that uses the RDD, I'm getting this error that says the RDD was lost:
14/06/30 09:48:17 INFO TaskSetManager: Serialized task 1.0:4 as 8245 bytes in 0 ms 14/06/30 09:48:17 WARN TaskSetManager: Lost TID 15611 (task 1.0:3) 14/06/30 09:48:17 WARN TaskSetManager: Loss was due to org.apache.spark.api.python.PythonException org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/me/Desktop/spark-1.0.0/python/pyspark/worker.py", line 73, in main command = pickleSer._read_with_length(infile) File "/Users/me/Desktop/spark-1.0.0/python/pyspark/serializers.py", line 142, in _read_with_length length = read_int(stream) File "/Users/me/Desktop/spark-1.0.0/python/pyspark/serializers.py", line 337, in read_int raise EOFError EOFError at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115) at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145) at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/06/30 09:48:18 INFO AppClient$ClientActor: Executor updated: app-20140630090515-0000/0 is now FAILED (Command exited with code 52) 14/06/30 09:48:18 INFO SparkDeploySchedulerBackend: Executor app-20140630090515-0000/0 removed: Command exited with code 52 14/06/30 09:48:18 INFO SparkDeploySchedulerBackend: Executor 0 disconnected, so removing it 14/06/30 09:48:18 ERROR TaskSchedulerImpl: Lost executor 0 on localhost: OutOfMemoryError 14/06/30 09:48:18 INFO TaskSetManager: Re-queueing tasks for 0 from TaskSet 1.0 14/06/30 09:48:18 WARN TaskSetManager: Lost TID 15610 (task 1.0:2) 14/06/30 09:48:18 WARN TaskSetManager: Lost TID 15609 (task 1.0:1) 14/06/30 09:48:18 WARN TaskSetManager: Lost TID 15612 (task 1.0:4) 14/06/30 09:48:18 WARN TaskSetManager: Lost TID 15608 (task 1.0:0) The operation it fails on is a ReduceByKey(), and the RDD before the operation is split into several thousand partitions (I'm doing term weighting that requires a different partition initially for each document), and the system has 6 GB of memory for the executor, so I'm not sure if it's actually a memory error, as is mentioned 5 lines from the end of the error. The serializer error portion is what's really confusing me, and I can't find references to this particular error with Spark anywhere. Does anyone have a clue as to what the actual error might be here, and what a possible solution would be? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Serializer-or-Out-of-Memory-issues-tp8533.html Sent from the Apache Spark User List mailing list archive at Nabble.com.