Spark saveAsTextFile Disk Recommendation

2021-03-21 Thread ranju goel
@spark.apache.org *Subject:* Spark saveAsTextFile Disk Recommendation Hi All, I have a large RDD dataset of around 60-70 GB which I cannot send to driver using *collect* so first writing that to disk using *saveAsTextFile* and then this data gets saved in the form of multiple part files on

RE: Spark saveAsTextFile Disk Recommendation

2021-03-21 Thread Ranju Jain
process this data further within Spark then please consider something way better: a columnar storage format namely ORC or Parquet. Best Regards, Attila From: Ranju Jain Sent: Sunday, March 21, 2021 8:10 AM To: user@spark.apache.org Subject: Spark saveAsTextFile Disk Recommendation Hi All, I have a

Re: Spark saveAsTextFile Disk Recommendation

2021-03-20 Thread Attila Zsolt Piros
Hi! I would like to reflect only to the first part of your mail: I have a large RDD dataset of around 60-70 GB which I cannot send to driver > using *collect* so first writing that to disk using *saveAsTextFile* and > then this data gets saved in the form of multiple part files on each node > of

Spark saveAsTextFile Disk Recommendation

2021-03-20 Thread Ranju Jain
Hi All, I have a large RDD dataset of around 60-70 GB which I cannot send to driver using collect so first writing that to disk using saveAsTextFile and then this data gets saved in the form of multiple part files on each node of the cluster and after that driver reads the data from that stora