Re: Will it lead to OOM error?

Enrico Minack Wed, 22 Jun 2022 06:10:47 -0700

The RAM and disk memory consumtion depends on what you do with the dataafter reading them.

Your particular action will read 20 lines from the first partition andshow them. So it will not use any RAM or disk, no matter how large theCSV is.

If you do a count instead of show, it will iterate over the eachpartition and return a count per partition, so no RAM here needed as well.

If you do some real processing of the data, the requirement RAM and diskagain depends on involved shuffles and intermediate results that need tobe store in RAM or on disk.


Enrico


Am 22.06.22 um 14:54 schrieb Deepak Sharma:

It will spill to disk if everything can’t be loaded in memory .


On Wed, 22 Jun 2022 at 5:58 PM, Sid <flinkbyhe...@gmail.com> wrote:

    I have a 150TB CSV file.

    I have a total of 100 TB RAM and 100TB disk. So If I do something
    like this

    spark.read.option("header","true").csv(filepath).show(false)

    Will it lead to an OOM error since it doesn't have enough memory?
    or it will spill data onto the disk and process it?

    Thanks,
    Sid

--
Thanks
Deepak
www.bigdatabig.com <http://www.bigdatabig.com>
www.keosha.net <http://www.keosha.net>

Re: Will it lead to OOM error?

Reply via email to