I dint mention anything, so by default it should be MEMORY_AND_DISK right? My doubt was, between two different experiments, are the RDDs cached in memory need to be unpersisted??? Or it doesnt matter ?
On Fri, Mar 28, 2014 at 1:43 AM, Syed A. Hashmi <shas...@cloudera.com>wrote: > Which storage scheme are you using? I am guessing it is MEMORY_ONLY. In > large datasets, MEMORY_AND_DISK or MEMORY_AND_DISK_SER work better. > > You can call unpersist on an RDD to remove it from Cache though. > > > On Thu, Mar 27, 2014 at 11:57 AM, Sai Prasanna <ansaiprasa...@gmail.com>wrote: > >> No i am running on 0.8.1. >> Yes i am caching a lot, i am benchmarking a simple code in spark where in >> 512mb, 1g and 2g text files are taken, some basic intermediate operations >> are done while the intermediate result which will be used in subsequent >> operations are cached. >> >> I thought that, we need not manually unpersist, if i need to cache >> something and if cache is found full, automatically space will be created >> by evacuating the earlier. Do i need to unpersist? >> >> Moreover, if i run several times, will the previously cached RDDs still >> remain in the cache? If so can i flush them manually out before the next >> run? [something like complete cache flush] >> >> >> On Thu, Mar 27, 2014 at 11:16 PM, Andrew Or <and...@databricks.com>wrote: >> >>> Are you caching a lot of RDD's? If so, maybe you should unpersist() the >>> ones that you're not using. Also, if you're on 0.9, make sure >>> spark.shuffle.spill is enabled (which it is by default). This allows your >>> application to spill in-memory content to disk if necessary. >>> >>> How much memory are you giving to your executors? The default, >>> spark.executor.memory is 512m, which is quite low. Consider raising this. >>> Checking the web UI is a good way to figure out your runtime memory usage. >>> >>> >>> On Thu, Mar 27, 2014 at 9:22 AM, Ognen Duzlevski < >>> og...@plainvanillagames.com> wrote: >>> >>>> Look at the tuning guide on Spark's webpage for strategies to cope >>>> with this. >>>> I have run into quite a few memory issues like these, some are resolved >>>> by changing the StorageLevel strategy and employing things like Kryo, some >>>> are solved by specifying the number of tasks to break down a given >>>> operation into etc. >>>> >>>> Ognen >>>> >>>> >>>> On 3/27/14, 10:21 AM, Sai Prasanna wrote: >>>> >>>> "java.lang.OutOfMemoryError: GC overhead limit exceeded" >>>> >>>> What is the problem. The same code, i run, one instance it runs in 8 >>>> second, next time it takes really long time, say 300-500 seconds... >>>> I see the logs a lot of GC overhead limit exceeded is seen. What should >>>> be done ?? >>>> >>>> Please can someone throw some light on it ?? >>>> >>>> >>>> >>>> -- >>>> *Sai Prasanna. AN* >>>> *II M.Tech (CS), SSSIHL* >>>> >>>> >>>> * Entire water in the ocean can never sink a ship, Unless it gets >>>> inside. All the pressures of life can never hurt you, Unless you let them >>>> in.* >>>> >>>> >>>> >>> >> >> >> -- >> *Sai Prasanna. AN* >> *II M.Tech (CS), SSSIHL* >> >> >> *Entire water in the ocean can never sink a ship, Unless it gets inside. >> All the pressures of life can never hurt you, Unless you let them in.* >> > > -- *Sai Prasanna. AN* *II M.Tech (CS), SSSIHL* *Entire water in the ocean can never sink a ship, Unless it gets inside.All the pressures of life can never hurt you, Unless you let them in.*