That's the alternative ofcourse. But that is costly when we are dealing with bunch of files.
Thanks. On Sat, Feb 22, 2020, 4:15 PM Sebastian Piu <sebastian....@gmail.com> wrote: > I'm not aware of a way to specify the file name on the writer. > Since you'd need to bring all the data into a single node and write from > there to get a single file out you could simple move/rename the file that > spark creates or write the csv yourself with your library of preference? > > On Sat, 22 Feb 2020 at 10:39, Kshitij <kshtjkm...@gmail.com> wrote: > >> Is there any way to save it as raw_csv file as we do in pandas? I have a >> script that uses the CSV file for further processing. >> >> On Sat, 22 Feb 2020 at 14:31, rahul c <rchannal1...@gmail.com> wrote: >> >>> Hi Kshitij, >>> >>> There are option to suppress the metadata files from get created. >>> Set the below properties and try. >>> >>> 1) To disable the transaction logs of spark >>> "spark.sql.sources.commitProtocolClass = >>> org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol". >>> This will help to disable the "committed<TID>" and "started<TID>" files but >>> still _SUCCESS, _common_metadata and _metadata files will generate. >>> >>> 2) We can disable the _common_metadata and _metadata files using >>> "parquet.enable.summary-metadata=false". >>> >>> 3) We can also disable the _SUCCESS file using >>> "mapreduce.fileoutputcommitter.marksuccessfuljobs=false". >>> >>> On Sat, 22 Feb, 2020, 10:51 AM Kshitij, <kshtjkm...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> There is no dataframe spark API which writes/creates a single file >>>> instead of directory as a result of write operation. >>>> >>>> Below both options will create directory with a random file name. >>>> >>>> df.coalesce(1).write.csv(<path>) >>>> >>>> >>>> >>>> df.write.csv(<path>) >>>> >>>> >>>> Instead of creating directory with standard files (_SUCCESS , >>>> _committed , _started). I want a single file with file_name specified. >>>> >>>> >>>> Thanks >>>> >>>