Hi Michael, I have similar question <http://apache-spark-user-list.1001560.n3.nabble.com/Caching-issue-with-msg-RDD-block-could-not-be-dropped-from-memory-as-it-does-not-exist-td10248.html#a10677> before. My problem was that my data was too large to be cached in memory because of serialization.
But I tried to reproduce your test and I did not experience any memory problem. First, since count operates on the same rdd, it should not increase the memory usage. Second, since you do not cache the rdd, each new action such as count will simply reload the data. I am not sure how much memory you have in your machine, but by default Spark allocates 512M for each executor and spark.memory.fraction is set to 0.6, which means you virtually have about 360Mbyte in reality. If you are running your app on local machine, then you can monitor it by opening the GUI on your browser using localhost:4040 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/memory-leak-query-tp8961p10679.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
