Only fit the data in memory where you want to run the iterative
algorithm
For map-reduce operations, it's better not to cache if you have a memory
crunch...
Also schedule the persist and unpersist such that you utilize the RAM
well...
On Tue, Sep 30, 2014 at 4:34 PM, Liquan Pei wrote:
> Hi
Hi,
By default, 60% of JVM memory is reserved for RDD caching, so in your case,
72GB memory is available for RDDs which means that your total data may fit
in memory. You can check the RDD memory statistics via the storage tab in
web ui.
Hope this helps!
Liquan
On Tue, Sep 30, 2014 at 4:11 PM,
Hi,
Is there a guidance about for a data of certain data size, how much total
memory should be needed to achieve a relatively good speed?
I have a data of around 200 GB and the current total memory for my 8
machines are around 120 GB. Is that too small to run the data of this big?
Even the read i