Is there any way to save it as raw_csv file as we do in pandas? I have a script that uses the CSV file for further processing.
On Sat, 22 Feb 2020 at 14:31, rahul c <rchannal1...@gmail.com> wrote: > Hi Kshitij, > > There are option to suppress the metadata files from get created. > Set the below properties and try. > > 1) To disable the transaction logs of spark > "spark.sql.sources.commitProtocolClass = > org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol". > This will help to disable the "committed<TID>" and "started<TID>" files but > still _SUCCESS, _common_metadata and _metadata files will generate. > > 2) We can disable the _common_metadata and _metadata files using > "parquet.enable.summary-metadata=false". > > 3) We can also disable the _SUCCESS file using > "mapreduce.fileoutputcommitter.marksuccessfuljobs=false". > > On Sat, 22 Feb, 2020, 10:51 AM Kshitij, <kshtjkm...@gmail.com> wrote: > >> Hi, >> >> There is no dataframe spark API which writes/creates a single file >> instead of directory as a result of write operation. >> >> Below both options will create directory with a random file name. >> >> df.coalesce(1).write.csv(<path>) >> >> >> >> df.write.csv(<path>) >> >> >> Instead of creating directory with standard files (_SUCCESS , _committed >> , _started). I want a single file with file_name specified. >> >> >> Thanks >> >