Hi All,

When running a spark job, I have 100MB+ input and get more than 700GB
shuffle write, which is really weird. And this job finally end up with the
OOM error. Does anybody know why this happened?
[image: Screen Shot 2018-11-05 at 15.20.35.png]
My code is like:

> JavaPairRDD<Text, Text> inputRDD = sc.sequenceFile(inputPath, Text.class,
> Text.class);

 
inputRDD.repartition(partitionNum).mapToPair(...).saveAsNewAPIHadoopDataset(job.getConfiguration());


Environment:

*CPU 32 core; Memory 256G; Storage 7.5GCentOS 7.5*
*java version "1.8.0_162"*
*Spark 2.1.2*

Any help is greatly appreciated.

Regards,
Yichen

Reply via email to