HI All, getting the below Exception while converting my rdd to Map below is the code.and my data size is hardly 200MD snappy file and the code looks like this
@SuppressWarnings("unchecked") public Tuple2<Map<String, byte[]>, String> getMatchData(String location, String key) { ParquetInputFormat.setReadSupportClass(this.getJob(), (Class<AvroReadSupport<GenericRecord>>) (Class<?>) AvroReadSupport.class); JavaPairRDD<Void, GenericRecord> avroRDD = this.getSparkContext().newAPIHadoopFile(location, (Class<ParquetInputFormat<GenericRecord>>) (Class<?>) ParquetInputFormat.class, Void.class, GenericRecord.class, this.getJob().getConfiguration()); JavaPairRDD<String, GenericRecord> kv = avroRDD.mapToPair(new MapAvroToKV(key, new SpelExpressionParser())); Schema schema = kv.first()._2().getSchema(); JavaPairRDD<String, byte[]> bytesKV = kv.mapToPair(new AvroToBytesFunction()); Map<String, byte[]> map = bytesKV.collectAsMap(); Map<String, byte[]> hashmap = new HashMap<String, byte[]>(map); return new Tuple2<>(hashmap, schema.toString()); } please help me out if any thoughts and thanks in advance 04/27 03:13:18 INFO YarnClusterScheduler: Cancelling stage 11 04/27 03:13:18 INFO DAGScheduler: ResultStage 11 (collectAsMap at DoubleClickSORJob.java:281) failed in 0.734 s due to Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 15, ip-172-31-50-58.ec2.internal, executor 1): java.io.IOException: java.lang.NegativeArraySizeException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NegativeArraySizeException at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:325) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:60) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:43) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:244) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$10.apply(TorrentBroadcast.scala:286) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1303) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:287) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:221) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1269) ... 11 more Driver stacktrace: 04/27 03:13:18 INFO DAGScheduler: Job 11 failed: collectAsMap at DoubleClickSORJob.java:281, took 3.651447 s 04/27 03:13:18 ERROR ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 11.0 failed 4 times, most recent failure: Lost task 0.3 in stage 11.0 (TID 15, ip-172-31-50-58.ec2.internal, executor 1): java.io.IOException: java.lang.NegativeArraySizeException at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:206) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:81) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NegativeArraySizeException at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:325) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:60) at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:43) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790) at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:244) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$10.apply(TorrentBroadcast.scala:286) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1303) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:287) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:221) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1269) ... 11 more -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/javaRDD-to-collectasMap-throuwa-ava-lang-NegativeArraySizeException-tp28631.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org