I'm seeing the same behavior in Spark 2.0.1. Does anybody have an explanation?
Thanks! Kaspar bmiller1 wrote > Hi All, > > I've recently noticed some caching behavior which I did not understand > and may or may not have indicated a bug. In short, the web UI seemed > to indicate that some blocks were being added to the cache despite > already being in cache. > > As documentation, I have attached two UI screenshots. The PNG > captures enough of the screen to demonstrate the problem; the PDF is > the printout of the full page. Notice that: > > -block rdd_21_1001 is in the cache twice, both times on > letang.research.intel-research.net; many other blocks also occur twice > on a variety of hosts. I've not confirmed that the duplicate block is > *always* the same host but it seems to appear that way. > > -the stated storage level is "Memory Deserialized 1x Replicated" > > -the top left states that the "cached partitions" and "total > partitions" are 4000, but in the table where partitions are enumerated > there are 4534. > > Although not reflected in this screenshot, I believe I have seen this > behavior occur even when double caching of blocks causes eviction of > blocks from other RDDs. I am running the Spark 1.0.0 release and > using pyspark. > > best, > -Brad > > > pyspark_caching.pdf (2M) > <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/8546/0/pyspark_caching.pdf> > Screen Shot 2014-06-30 at 10.03.16 AM.png (292K) > <http://apache-spark-user-list.1001560.n3.nabble.com/attachment/8546/1/Screen%20Shot%202014-06-30%20at%2010.03.16%20AM.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/odd-caching-behavior-or-accounting-tp8546p28376.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org