Does anybody have insight on this? Thanks. On Fri, Jan 9, 2015 at 6:30 PM, Pala M Muthaia <mchett...@rocketfuelinc.com> wrote:
> Hi, > > I am using Spark 1.0.1. I am trying to debug a OOM exception i saw during > a join step. > > Basically, i have a RDD of rows, that i am joining with another RDD of > tuples. > > Some of the tasks succeed but a fair number failed with OOM exception with > stack below. The stack belongs to the 'reducer' that is reading shuffle > output from the 'mapper'. > > My question is what's the object being deserialized here - just a portion > of an RDD or the whole RDD partition assigned to current reducer? The rows > in the RDD could be large, but definitely not something that would run to > 100s of MBs in size, and thus run out of memory. > > Also, is there a way to determine size of the object being deserialized > that results in the error (either by looking at some staging hdfs dir or > logs)? > > java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit > exceeded} > java.util.Arrays.copyOf(Arrays.java:2367) > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) > java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535) > java.lang.StringBuilder.append(StringBuilder.java:204) > java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3142) > java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3050) > java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2863) > java.io.ObjectInputStream.readString(ObjectInputStream.java:1636) > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1339) > java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > java.util.ArrayList.readObject(ArrayList.java:771) > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > java.lang.reflect.Method.invoke(Method.java:606) > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891) > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63) > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125) > org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) > org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1031) > > > > Thanks, > pala >