@spark.apache.org
*Subject:* Spark saveAsTextFile Disk Recommendation
Hi All,
I have a large RDD dataset of around 60-70 GB which I cannot send to driver
using *collect* so first writing that to disk using *saveAsTextFile* and
then this data gets saved in the form of multiple part files on
process this data
further within Spark then please consider something way better: a columnar
storage format namely ORC or Parquet.
Best Regards,
Attila
From: Ranju Jain
Sent: Sunday, March 21, 2021 8:10 AM
To: user@spark.apache.org
Subject: Spark saveAsTextFile Disk Recommendation
Hi All,
I have a
Hi!
I would like to reflect only to the first part of your mail:
I have a large RDD dataset of around 60-70 GB which I cannot send to driver
> using *collect* so first writing that to disk using *saveAsTextFile* and
> then this data gets saved in the form of multiple part files on each node
> of
Hi All,
I have a large RDD dataset of around 60-70 GB which I cannot send to driver
using collect so first writing that to disk using saveAsTextFile and then this
data gets saved in the form of multiple part files on each node of the cluster
and after that driver reads the data from that stora