I could find out the issue. In fact, I did not realize before that when
loaded into memory, the data is deserialized. As a result, what seems to be
a 21Gb dataset occupies 77Gb in memory. 

Details about this is clearly explained in the guide on  serialization and
memory tuning
<http://spark.apache.org/docs/latest/tuning.html#determining-memory-consumption>
 
.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Caching-issue-with-msg-RDD-block-could-not-be-dropped-from-memory-as-it-does-not-exist-tp10248p10677.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to