I don't know if this is related but a little earlier in stderr, I also have
the following stacktrace. But this stacktrace seems to be when the code is
grabbing RDD data from a remote node, which is different from the above.


14/06/09 21:33:26 ERROR executor.ExecutorUncaughtExceptionHandler: Uncaught
exception in thread Thread[Executor task launch worker-16,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:329)
at org.apache.spark.storage.BlockMessage.set(BlockMessage.scala:94)
at
org.apache.spark.storage.BlockMessage$.fromByteBuffer(BlockMessage.scala:176)
at
org.apache.spark.storage.BlockMessageArray.set(BlockMessageArray.scala:63)
at
org.apache.spark.storage.BlockMessageArray$.fromBufferMessage(BlockMessageArray.scala:109)
at
org.apache.spark.storage.BlockManagerWorker$.syncGetBlock(BlockManagerWorker.scala:128)
at
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:489)
at
org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:487)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:487)
at org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:473)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:513)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:39)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)



On Mon, Jun 9, 2014 at 10:05 PM, Surendranauth Hiraman <
suren.hira...@velos.io> wrote:

> I have a dataset of about 10GB. I am using persist(DISK_ONLY) to avoid out
> of memory issues when running my job.
>
> When I run with a dataset of about 1 GB, the job is able to complete.
>
> But when I run with the larger dataset of 10 GB, I get the following
> error/stacktrace, which seems to be happening when the RDD is writing out
> to disk.
>
> Anyone have any ideas as to what is going on or if there is a setting I
> can tune?
>
>
> 14/06/09 21:33:55 ERROR executor.Executor: Exception in task ID 560
> java.io.FileNotFoundException:
> /tmp/spark-local-20140609210741-0bb8/14/rdd_331_175 (No such file or
> directory)
>  at java.io.FileOutputStream.open(Native Method)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:209)
>  at java.io.FileOutputStream.<init>(FileOutputStream.java:160)
> at org.apache.spark.storage.DiskStore.putValues(DiskStore.scala:79)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:698)
> at org.apache.spark.storage.BlockManager.put(BlockManager.scala:546)
>  at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:95)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
>  at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>  at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
>  at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>  at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:679)
>
> --
>
> SUREN HIRAMAN, VP TECHNOLOGY
> Velos
> Accelerating Machine Learning
>
> 440 NINTH AVENUE, 11TH FLOOR
> NEW YORK, NY 10001
> O: (917) 525-2466 ext. 105
> F: 646.349.4063
> E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io
> W: www.velos.io
>
>


-- 

SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hiraman@v <suren.hira...@sociocast.com>elos.io
W: www.velos.io

Reply via email to