@Song, I have called an action but it did not cache as you can see in the provided screenshot on my original email. It has cahced into Disk but not memory.
@Chin Wei Low, I have 15GB memory allocated which is more than the dataset size. Any other suggestion please? Kind regards, Guru On 11 October 2016 at 03:34, Chin Wei Low <lowchin...@gmail.com> wrote: > Hi, > > Your RDD is 5GB, perhaps it is too large to fit into executor's storage > memory. You can refer to the Executors tab in Spark UI to check the > available memory for storage for each of the executor. > > Regards, > Chin Wei > > On Tue, Oct 11, 2016 at 6:14 AM, diplomatic Guru <diplomaticg...@gmail.com > > wrote: > >> Hello team, >> >> Spark version: 1.6.0 >> >> I'm trying to persist done data into memory for reusing them. However, >> when I call rdd.cache() OR rdd.persist(StorageLevel.MEMORY_ONLY()) it >> does not store the data as I can not see any rdd information under WebUI >> (Storage Tab). >> >> Therefore I tried rdd.persist(StorageLevel.MEMORY_AND_DISK()), for which >> it stored the data into Disk only as shown in below screenshot: >> >> [image: Inline images 2] >> >> Do you know why the memory is not being used? >> >> Is there a configuration in cluster level to stop jobs from storing data >> into memory altogether? >> >> >> Please let me know. >> >> Thanks >> >> Guru >> >> >