Hi Lian, I was following the thread since one of my students had the same issue. The problem was when trying to save a larger XML dataset into HDFS and due to the connectivity timeout between Spark and HDFS, the output wasn't able to be displayed. I also suggested him to do the same as @Apostolos said in the previous mail, using saveAsTextFile instead (haven't got any result/reply after my suggestion).
Seeing the last commit date "*Jan 10, 2017*" made on databricks/spark-csv [1] project, not sure how much inline with Spark 2.x is. Even though there is a *note* about it on the README file. Would it be possible that you share your solution (in case the project is open-sourced already) with us and then we can have a look at it? Many thanks in advance. Best regards, [1]. https://github.com/databricks/spark-csv On Tue, Mar 26, 2019 at 1:09 AM Lian Jiang <jiangok2...@gmail.com> wrote: > Thanks guys for reply. > > The execution plan shows a giant query. After divide and conquer, saving > is quick. > > On Fri, Mar 22, 2019 at 4:01 PM kathy Harayama <kathleenli...@gmail.com> > wrote: > >> Hi Lian, >> Since you using repartition(1), do you want to decrease the number of >> partitions? If so, have you tried to use coalesce instead? >> >> Kathleen >> >> On Fri, Mar 22, 2019 at 2:43 PM Lian Jiang <jiangok2...@gmail.com> wrote: >> >>> Hi, >>> >>> Writing a csv to HDFS takes about 1 hour: >>> >>> >>> df.repartition(1).write.format('com.databricks.spark.csv').mode('overwrite').options(header='true').save(csv) >>> >>> The generated csv file is only about 150kb. The job uses 3 containers >>> (13 cores, 23g mem). >>> >>> Other people have similar issues but I don't see a good explanation and >>> solution. >>> >>> Any clue is highly appreciated! Thanks. >>> >>> >>> -- _____________ *Gëzim Sejdiu* *PhD Student & Research Associate* *SDA, University of Bonn* *Endenicher Allee 19a, 53115 Bonn, Germany* *https://gezimsejdiu.github.io/ <https://gezimsejdiu.github.io/>* GitHub <https://github.com/GezimSejdiu> | Twitter <https://twitter.com/Gezim_Sejdiu> | LinkedIn <https://www.linkedin.com/in/g%C3%ABzim-sejdiu-08b1761b> | Google Scholar <https://scholar.google.de/citations?user=Lpbwr9oAAAAJ>