yup, cache is a transformation and hence lazy. you need to run action to
get the data into it.

http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enforce-RDD-to-be-cached-td20230.html


On Fri, Mar 25, 2016 at 2:32 PM Jörn Franke <jornfra...@gmail.com> wrote:

> I am not 100% sure of the root cause, but if you need rdd caching then
> look at Apache Ignite or similar.
>
> On 24 Mar 2016, at 16:22, Daniel Imberman <daniel.imber...@gmail.com>
> wrote:
>
> Hi Takeshi,
>
> Thank you for getting back to me. If this is not possible then perhaps you
> can help me with the root problem that caused me to ask this question.
>
> Basically I have a job where I'm loading/persisting an RDD and running
> queries against it. The problem I'm having is that even though there is
> plenty of space in memory, the RDD is not fully persisting. Once I run
> multiple queries against it the RDD fully persists, but this means that the
> first 4/5 queries I run are extremely slow.
>
> Is there any way I can make sure that the entire RDD ends up in memory the
> first time I load it?
>
> Thank you
> On Thu, Mar 24, 2016 at 1:21 AM Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
>> just re-sent,
>>
>>
>> ---------- Forwarded message ----------
>> From: Takeshi Yamamuro <linguin....@gmail.com>
>> Date: Thu, Mar 24, 2016 at 5:19 PM
>> Subject: Re: Forcing data from disk to memory
>> To: Daniel Imberman <daniel.imber...@gmail.com>
>>
>>
>> Hi,
>>
>> We have no direct approach; we need to unpersist cached data, then
>> re-cache data as MEMORY_ONLY.
>>
>> // maropu
>>
>> On Thu, Mar 24, 2016 at 8:22 AM, Daniel Imberman <
>> daniel.imber...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> So I have a question about persistence. Let's say I have an RDD that's
>>> persisted MEMORY_AND_DISK, and I know that I now have enough memory space
>>> cleared up that I can force the data on disk into memory. Is it possible
>>> to
>>> tell spark to re-evaluate the open RDD memory and move that information?
>>>
>>> Thank you
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Forcing-data-from-disk-to-memory-tp26585.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com
>>> <http://nabble.com>.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>

Reply via email to