Re: data size exceeds the total ram

Mich Talebzadeh Fri, 11 Feb 2022 01:39:16 -0800

Well one experiment is worth many times more than asking what/if scenario
question.

   1. Try running it first to see how spark handles it
   2. Go to spark GUI (on port 4044) and look at the storage tab and see
   what it says
   3. Unless you explicitly persist the data, Spark will read the data
   using appropriate partitions given the memory size and cluster count. As
   long as there is sufficient disk space (not memory), Spark will handle
   files larger than the available memory. However, If you do persist, you
   will get an Out of Memory error

HTH

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Fri, 11 Feb 2022 at 09:23, frakass <capitnfrak...@free.fr> wrote:

> Hello
>
> I have three nodes with total memory 128G x 3 = 384GB
> But the input data is about 1TB.
> How can spark handle this case?
>
> Thanks.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: data size exceeds the total ram

Reply via email to