Thank you, Guillaume, my dataset is not that large, it's totally ~2GB 2014-10-20 16:58 GMT+08:00 Guillaume Pitel <guillaume.pi...@exensa.com>:
> Hi, > > It happened to me with blocks which take more than 1 or 2 GB once > serialized > > I think the problem was that during serialization, a Byte Array is > created, and arrays in java are indexed by ints. When the serializer needs > to increase the buffer size, it does so somehow, but then writing in the > array leads to an error. > > Don't know if your problem is the same, but maybe. > > In general Java or Java libraries do not check for oversized arrays, which > is really bad when you play with big data. > > Guillaume > > The exception drives me crazy, because it occurs randomly. > I didn't know which line of my code causes this exception. > I didn't even understand what "KryoException: > java.lang.NegativeArraySizeException" means, or even implies? > > > 14/10/20 15:59:01 WARN scheduler.TaskSetManager: Lost task 32.2 in stage > 0.0 (TID 181, gs-server-1000): com.esotericsoftware.kryo.KryoException: > java.lang.NegativeArraySizeException > Serialization trace: > value (org.apache.spark.sql.catalyst.expressions.MutableAny) > values (org.apache.spark.sql.catalyst.expressions.SpecificMutableRow) > otherElements (org.apache.spark.util.collection.CompactBuffer) > > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:585) > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) > > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) > > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) > com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) > > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) > > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:318) > > com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293) > com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:501) > > com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.write(FieldSerializer.java:564) > > com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:213) > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) > com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:38) > com.twitter.chill.Tuple2Serializer.write(TupleSerializers.scala:34) > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568) > > org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:119) > > org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195) > > org.apache.spark.util.collection.ExternalAppendOnlyMap.spill(ExternalAppendOnlyMap.scala:203) > > org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:150) > org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:90) > > org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$2.apply(PairRDDFunctions.scala:89) > org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:625) > org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:625) > > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > org.apache.spark.scheduler.Task.run(Task.scala:54) > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > java.lang.Thread.run(Thread.java:745) > > > > -- > [image: eXenSa] > *Guillaume PITEL, Président* > +33(0)626 222 431 > > eXenSa S.A.S. <http://www.exensa.com/> > 41, rue Périer - 92120 Montrouge - FRANCE > Tel +33(0)184 163 677 / Fax +33(0)972 283 705 >