Alternatively, watch Spark Summit talk on Memory Management to get insight from
a developer's perspective.
https://spark-summit.org/2016/events/deep-dive-apache-spark-memory-management/
https://spark-summit.org/2017/events/a-developers-view-into-sparks-memory-model/
Cheers
Jules
Sent from
In genernal, RDD, which is the central concept of Spark, is just
deffinition of how to get data and process data. Each partition of RDD
defines how to get/process each partition of data. A series of
transformation will transform every partitions of data from previous RDD. I
give you very easy
Obviously, you can't store 900GB of data into 80GB memory.
There is a concept in spark called disk spill, it means when your data size
increases and can't fit into memory then it spilled out to disk.
Also, spark doesn't use whole memory for storing the data, some fraction of
memory used for
Hi
I'm a newbie.
In my spark cluster, there are 5 machines, each machine 16G memory, but my data
may be more than 900G, the source may be HDFS or mongodb, I want to know how to
put this 900G data into spark cluster memory because I have a total memory
space of 80G. How does spark work