Hi All, When running a spark job, I have 100MB+ input and get more than 700GB shuffle write, which is really weird. And this job finally end up with the OOM error. Does anybody know why this happened? [image: Screen Shot 2018-11-05 at 15.20.35.png] My code is like:
> JavaPairRDD<Text, Text> inputRDD = sc.sequenceFile(inputPath, Text.class, > Text.class); inputRDD.repartition(partitionNum).mapToPair(...).saveAsNewAPIHadoopDataset(job.getConfiguration()); Environment: *CPU 32 core; Memory 256G; Storage 7.5GCentOS 7.5* *java version "1.8.0_162"* *Spark 2.1.2* Any help is greatly appreciated. Regards, Yichen