Ah yes, Spark doesn't cache all of your RDDs by default. It turns out that
caching things too aggressively can lead to suboptimal performance because
there might be a lot of churn. If you don't call persist or cache then your
RDDs won't actually be cached. Note that even once they're cached they can
still be kicked out by LRU, however.

-Andrew


2014-08-05 0:13 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>:

> You need to use persist or cache those rdds to appear in the Storage.
> Unless you do it, those rdds will be computed again.
>
> Thanks
> Best Regards
>
>
> On Tue, Aug 5, 2014 at 8:03 AM, binbinbin915 <binbinbin...@live.cn> wrote:
>
>>  Actually, if you don’t use method like persist or cache, it even not
>> store the rdd to the disk. Every time you use this rdd, they just compute
>> it from the original one.
>>
>> In logistic regression from mllib, they don't persist the changed input ,
>> so I can't see the rdd from the web gui.
>>
>> I have changed the code and gained a 10x speed up.
>>
>> --
>> binbinbin915
>> Sent with Airmail
>>
>>
>> ------------------------------
>> View this message in context: Re: Can't see any thing one the storage
>> panel of application UI
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-see-any-thing-one-the-storage-panel-of-application-UI-tp10296p11403.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>
>

Reply via email to