> Is there any way to save it as raw_csv file as we do in pandas? I have a
I did write such a function for scala. Please take a look at https://github.com/EDS-APHP/spark-etl/blob/master/spark-csv/src/main/scala/CSVTool.scala see writeCsvToLocal it first writes csv to hdfs, and then fetches every csv part into one local csv with headers. Kshitij <kshtjkm...@gmail.com> writes: > Is there any way to save it as raw_csv file as we do in pandas? I have a > script that uses the CSV file for further processing. > > On Sat, 22 Feb 2020 at 14:31, rahul c <rchannal1...@gmail.com> wrote: > >> Hi Kshitij, >> >> There are option to suppress the metadata files from get created. >> Set the below properties and try. >> >> 1) To disable the transaction logs of spark >> "spark.sql.sources.commitProtocolClass = >> org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol". >> This will help to disable the "committed<TID>" and "started<TID>" files but >> still _SUCCESS, _common_metadata and _metadata files will generate. >> >> 2) We can disable the _common_metadata and _metadata files using >> "parquet.enable.summary-metadata=false". >> >> 3) We can also disable the _SUCCESS file using >> "mapreduce.fileoutputcommitter.marksuccessfuljobs=false". >> >> On Sat, 22 Feb, 2020, 10:51 AM Kshitij, <kshtjkm...@gmail.com> wrote: >> >>> Hi, >>> >>> There is no dataframe spark API which writes/creates a single file >>> instead of directory as a result of write operation. >>> >>> Below both options will create directory with a random file name. >>> >>> df.coalesce(1).write.csv(<path>) >>> >>> >>> >>> df.write.csv(<path>) >>> >>> >>> Instead of creating directory with standard files (_SUCCESS , _committed >>> , _started). I want a single file with file_name specified. >>> >>> >>> Thanks >>> >> -- nicolas paris --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org