I too have seen cached RDDs not hit 100%, even when they are DISK_ONLY. Just saw that yesterday in fact. In some cases RDDs I expected didn't show up in the list at all. I have no idea if this is an issue with Spark or something I'm not understanding about how persist works (probably the latter).
If I figure out the reason for this I'll let you know. On Wed, Jun 11, 2014 at 8:54 PM, Shuo Xiang <shuoxiang...@gmail.com> wrote: > Xiangrui, clicking into the RDD link, it gives the same message, say only > 96 of 100 partitions are cached. The disk/memory usage are the same, which > is far below the limit. > Is this what you want to check or other issue? > > > On Wed, Jun 11, 2014 at 4:38 PM, Xiangrui Meng <men...@gmail.com> wrote: > >> Could you try to click one that RDD and see the storage info per >> partition? I tried continuously caching RDDs, so new ones kick old >> ones out when there is not enough memory. I saw similar glitches but >> the storage info per partition is correct. If you find a way to >> reproduce this error, please create a JIRA. Thanks! -Xiangrui >> > > -- Daniel Siegmann, Software Developer Velos Accelerating Machine Learning 440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001 E: daniel.siegm...@velos.io W: www.velos.io