Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

Kshitij Sat, 22 Feb 2020 02:40:24 -0800

Is there any way to save it as raw_csv file as we do in pandas? I have a
script that uses the CSV file for further processing.


On Sat, 22 Feb 2020 at 14:31, rahul c <rchannal1...@gmail.com> wrote:

> Hi Kshitij,
>
> There are option to suppress the metadata files from get created.
> Set the below properties and try.
>
> 1) To disable the transaction logs of spark
> "spark.sql.sources.commitProtocolClass =
> org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol".
> This will help to disable the "committed<TID>" and "started<TID>" files but
> still _SUCCESS, _common_metadata and _metadata files will generate.
>
> 2) We can disable the _common_metadata and _metadata files using
> "parquet.enable.summary-metadata=false".
>
> 3) We can also disable the _SUCCESS file using
> "mapreduce.fileoutputcommitter.marksuccessfuljobs=false".
>
> On Sat, 22 Feb, 2020, 10:51 AM Kshitij, <kshtjkm...@gmail.com> wrote:
>
>> Hi,
>>
>> There is no dataframe spark API which writes/creates a single file
>> instead of directory as a result of write operation.
>>
>> Below both options will create directory with a random file name.
>>
>> df.coalesce(1).write.csv(<path>)
>>
>>
>>
>> df.write.csv(<path>)
>>
>>
>> Instead of creating directory with standard files (_SUCCESS , _committed
>> , _started). I want a single file with file_name specified.
>>
>>
>> Thanks
>>
>

Re: Does dataframe spark API write/create a single file instead of directory as a result of write operation.

Reply via email to