Hi Michael, 

I have  similar question
<http://apache-spark-user-list.1001560.n3.nabble.com/Caching-issue-with-msg-RDD-block-could-not-be-dropped-from-memory-as-it-does-not-exist-td10248.html#a10677>
  
before. My problem was that my data was too large to be cached in memory
because of serialization.

But I tried to reproduce your test and I did not experience any memory
problem. First, since count operates on the same rdd, it should not increase
the memory usage. Second, since you do not cache the rdd, each new action
such as count will simply reload the data.

I am not sure how much memory you have in your machine, but by default Spark
allocates 512M for each executor and spark.memory.fraction is set to 0.6,
which means you virtually have about 360Mbyte in reality. If you are running
your app on local machine, then you can monitor it by opening the GUI on
your browser using localhost:4040



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/memory-leak-query-tp8961p10679.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to