Hi, just so that we understand the problem first?
What is the source data (is it JSON, CSV, Parquet, etc)? Where are you reading it from (JDBC, file, etc)? What is the compression format (GZ, BZIP, etc)? What is the SPARK version that you are using? Thanks and Regards, Gourav Sengupta On Fri, Feb 11, 2022 at 9:39 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Well one experiment is worth many times more than asking what/if scenario > question. > > > 1. Try running it first to see how spark handles it > 2. Go to spark GUI (on port 4044) and look at the storage tab and see > what it says > 3. Unless you explicitly persist the data, Spark will read the data > using appropriate partitions given the memory size and cluster count. As > long as there is sufficient disk space (not memory), Spark will handle > files larger than the available memory. However, If you do persist, > you will get an Out of Memory error > > HTH > > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 11 Feb 2022 at 09:23, frakass <capitnfrak...@free.fr> wrote: > >> Hello >> >> I have three nodes with total memory 128G x 3 = 384GB >> But the input data is about 1TB. >> How can spark handle this case? >> >> Thanks. >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>