Can you post your code and sample input? That should help us understand if there is a bug in the code written or with the platform.
Regards, Kiran From: "Barona, Ricardo" <[email protected]> Date: Friday, June 9, 2017 at 10:47 PM To: "[email protected]" <[email protected]> Subject: RDD saveAsText and DataFrame write.mode(SaveMode).text(Path) duplicating rows In Spark 1.6.0 I’m having an issue with saveAsText and write.mode.text where I have a data frame with 1M+ rows and then I do: dataFrame.limit(500).map(_.mkString(“\t”)).toDF(“row”).write.mode(SaveMode.Overwrite).text(“myHDFSFolder/results”) then when I check for the results file, I see 900+ rows. Doing further analysis I found some of the rows are being duplicated. Does anyone know if this is something that has been reported before? The only outstanding characteristic of my data is that I have a column that exceeds 2000 characters. Appreciate your help, thanks. Cheers, Ricardo Barona
