I have a DStream receiving data from a socket. I'm using local mode. I set "spark.streaming.unpersist" to "false" and leave " spark.cleaner.ttl" to be infinite. I can see files for input and shuffle blocks under "spark.local.dir" folder and the size of folder keeps increasing, although JVM's memory usage seems to be stable.
[question] In this case, because input RDDs are persisted but they don't fit into memory, so write to disk, right? And where can I see the details about these RDDs? I don't see them in web UI. Then I set "spark.streaming.unpersist" to "true", the size of "spark.local.dir" folder and JVM's used heap size are reduced regularly. [question] In this case, because I didn't change "spark.cleaner.ttl", which component is doing the cleanup? And what's the difference if I set "spark.cleaner.ttl" to some duration in this case? Thank you!