Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

Kshitij Sat, 22 Feb 2020 02:48:50 -0800

That's the alternative ofcourse. But that is costly when we are dealing
with bunch of files.


Thanks.

On Sat, Feb 22, 2020, 4:15 PM Sebastian Piu <sebastian....@gmail.com> wrote:

> I'm not aware of a way to specify the file name on the writer.
> Since you'd need to bring all the data into a single node and write from
> there to get a single file out you could simple move/rename the file that
> spark creates or write the csv yourself with your library of preference?
>
> On Sat, 22 Feb 2020 at 10:39, Kshitij <kshtjkm...@gmail.com> wrote:
>
>> Is there any way to save it as raw_csv file as we do in pandas? I have a
>> script that uses the CSV file for further processing.
>>
>> On Sat, 22 Feb 2020 at 14:31, rahul c <rchannal1...@gmail.com> wrote:
>>
>>> Hi Kshitij,
>>>
>>> There are option to suppress the metadata files from get created.
>>> Set the below properties and try.
>>>
>>> 1) To disable the transaction logs of spark
>>> "spark.sql.sources.commitProtocolClass =
>>> org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol".
>>> This will help to disable the "committed<TID>" and "started<TID>" files but
>>> still _SUCCESS, _common_metadata and _metadata files will generate.
>>>
>>> 2) We can disable the _common_metadata and _metadata files using
>>> "parquet.enable.summary-metadata=false".
>>>
>>> 3) We can also disable the _SUCCESS file using
>>> "mapreduce.fileoutputcommitter.marksuccessfuljobs=false".
>>>
>>> On Sat, 22 Feb, 2020, 10:51 AM Kshitij, <kshtjkm...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> There is no dataframe spark API which writes/creates a single file
>>>> instead of directory as a result of write operation.
>>>>
>>>> Below both options will create directory with a random file name.
>>>>
>>>> df.coalesce(1).write.csv(<path>)
>>>>
>>>>
>>>>
>>>> df.write.csv(<path>)
>>>>
>>>>
>>>> Instead of creating directory with standard files (_SUCCESS ,
>>>> _committed , _started). I want a single file with file_name specified.
>>>>
>>>>
>>>> Thanks
>>>>
>>>

Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

Reply via email to