Hi Rindra, Depending on what you're doing with your groupBy, you may end up inflating your data quite a bit. Even if your machine has 16G, by default spark-shell only uses 512M, and the amount used for storing blocks is only 60% of that (spark.storage.memoryFraction), so this space becomes ~300M. This is still many multiples of the size of your dataset, but not by orders of magnitude. If you are running Spark 1.0+, you can increase the amount of memory used by spark-shell by adding "--driver-memory 1g" as a command line argument in local mode, or "--executor-memory 1g" in any other mode.
(Also, it seems that you set your log level to WARN. The cause is most probably because the cache is not big enough, but setting the log level to INFO will provide you with more information on the exact sizes that are being used by the storage and the blocks). Andrew 2014-07-19 13:01 GMT-07:00 rindra <[email protected]>: > Hi, > > I am working with a small dataset about 13Mbyte on the spark-shell. After > doing a > groupBy on the RDD, I wanted to cache RDD in memory but I keep getting > these warnings: > > scala> rdd.cache() > res28: rdd.type = MappedRDD[63] at repartition at <console>:28 > > > scala> rdd.count() > 14/07/19 12:45:18 WARN BlockManager: Block rdd_63_82 could not be dropped > from memory as it does not exist > 14/07/19 12:45:18 WARN BlockManager: Putting block rdd_63_82 failed > 14/07/19 12:45:18 WARN BlockManager: Block rdd_63_40 could not be dropped > from memory as it does not exist > 14/07/19 12:45:18 WARN BlockManager: Putting block rdd_63_40 failed > res29: Long = 5 > > It seems that I could not cache the data in memory even though my local > machine has > 16Gb RAM and the data is only 13MB with 100 partitions size. > > How to prevent this caching issue from happening? Thanks. > > Rindra > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Caching-issue-with-msg-RDD-block-could-not-be-dropped-from-memory-as-it-does-not-exist-tp10248.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
