Hi,
  I am experimenting with pyspark lately...
Every now and then, I see this error bieng streamed to pyspark shell .. and
most of the times.. the computation/operation completes.. and sometimes, it
just gets stuck...
My setup is 8 node cluster.. with loads of ram(256GB's) and space( TB's)
per node.
This enviornment is shared by general hadoop and hadoopy stuff..with recent
spark addition...

java.lang.OutOfMemoryError: Java heap space
    at
com.ning.compress.BufferRecycler.allocEncodingBuffer(BufferRecycler.java:59)
    at com.ning.compress.lzf.ChunkEncoder.<init>(ChunkEncoder.java:93)
    at
com.ning.compress.lzf.impl.UnsafeChunkEncoder.<init>(UnsafeChunkEncoder.java:40)
    at
com.ning.compress.lzf.impl.UnsafeChunkEncoderLE.<init>(UnsafeChunkEncoderLE.java:13)
    at
com.ning.compress.lzf.impl.UnsafeChunkEncoders.createEncoder(UnsafeChunkEncoders.java:31)
    at
com.ning.compress.lzf.util.ChunkEncoderFactory.optimalInstance(ChunkEncoderFactory.java:44)
    at com.ning.compress.lzf.LZFOutputStream.<init>(LZFOutputStream.java:61)
    at
org.apache.spark.io.LZFCompressionCodec.compressedOutputStream(CompressionCodec.scala:60)
    at
org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:803)
    at
org.apache.spark.storage.BlockManager$$anonfun$5.apply(BlockManager.scala:471)
    at
org.apache.spark.storage.BlockManager$$anonfun$5.apply(BlockManager.scala:471)
    at
org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:117)
    at
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:174)
    at
org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:164)
    at
org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:161)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
    at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
    at
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)



Most of the settings in spark are default.. So i was wondering if maybe,
there is some configuration that needs to happen?
There is this one config I have addded to spark_env file
SPARK_WORKER_MEMORY=20g

Also, I see tons of these errors as well..
14/02/26 14:33:17 INFO TaskSetManager: Loss was due to
java.lang.OutOfMemoryError: Java heap space [duplicate 1]
14/02/26 14:33:17 INFO TaskSetManager: Starting task 996.0:278 as TID 1792
on executor 9: node02 (PROCESS_LOCAL)
14/02/26 14:33:17 INFO TaskSetManager: Serialized task 996.0:278 as 4070
bytes in 0 ms
14/02/26 14:33:17 WARN TaskSetManager: Lost TID 1488 (task 996.0:184)
14/02/26 14:33:17 INFO TaskSetManager: Loss was due to
java.lang.OutOfMemoryError: Java heap space [duplicate 2]
14/02/26 14:33:17 INFO TaskSetManager: Starting task 996.0:247 as TID 1793
on executor 9: node02 (PROCESS_LOCAL)
14/02/26 14:33:17 INFO TaskSetManager: Serialized task 996.0:247 as 4070
bytes in 0 ms
14/02/26 14:33:17 WARN TaskSetManager: Lost TID 1484 (task 996.0:82)
14/02/26 14:33:17 INFO TaskSetManager: Loss was due to
java.lang.OutOfMemoryError: Java heap space [duplicate 3]
14/02/26 14:33:17 INFO TaskSetManager: Starting task 996.0:116 as TID 1794
on executor 9: node02 (PROCESS_LOCAL)
14/02/26 14:33:17 INFO TaskSetManager: Serialized task 996.0:116 as 4070
bytes in 1 ms
14/02/26 14:33:17 WARN TaskSetManager: Lost TID 1475 (task 996.0:157)
14/02/26 14:33:17 INFO TaskSetManager: Loss was due to
java.lang.OutOfMemoryError: Java heap space [duplicate 4]
14/02/26 14:33:17 INFO TaskSetManager: Starting task 996.0:98 as TID 1795
on executor 9: node02 (PROCESS_LOCAL)
14/02/26 14:33:17 INFO TaskSetManager: Serialized task 996.0:98 as 4070
bytes in 1 ms
14/02/26 14:33:17 WARN TaskSetManager: Lost TID 1492 (task 996.0:17)


and then...

14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1649 (task 996.0:115)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1666 (task 996.0:32)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1675 (task 996.0:160)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1657 (task 996.0:349)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1660 (task 996.0:141)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1651 (task 996.0:55)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1669 (task 996.0:126)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1678 (task 996.0:173)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1663 (task 996.0:128)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1672 (task 996.0:28)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1654 (task 996.0:96)
14/02/26 14:33:20 WARN TaskSetManager: Lost TID 1699 (task 996.0:294)
14/02/26 14:33:20 INFO DAGScheduler: Executor lost: 12 (epoch 16)
14/02/26 14:33:20 INFO BlockManagerMasterActor: Trying to remove executor
12 from BlockManagerMaster.
14/02/26 14:33:20 INFO BlockManagerMaster: Removed 12 successfully in
removeExecutor
14/02/26 14:33:20 INFO Stage: Stage 996 is now unavailable on executor 12
(0/379, false)


which looks like warnings..


The code I tried to run was:
subs_count = complex_key.map( lambda x: (x[0],int(x[1])).reduceByKey(lambda
a,b:a+b))
subs_count.take(20)

Thanks

-- 
Mohit

"When you want success as badly as you want the air, then you will get it.
There is no other secret of success."
-Socrates

Reply via email to