Hi, I have done all of that, but my question is "why should a 62 MB data give memory error when we have over 2 GB of memory available".
Therefore all that is mentioned by Zhoukang is not pertinent at all. Regards, Gourav Sengupta On Fri, Jul 28, 2017 at 4:43 AM, 周康 <[email protected]> wrote: > testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER) maybe > StorageLevel should change.And check you config " > spark.memory.storageFraction" which default value is 0.5 > > 2017-07-28 3:04 GMT+08:00 Gourav Sengupta <[email protected]>: > >> Hi, >> >> I cached in a table in a large EMR cluster and it has a size of 62 MB. >> Therefore I know the size of the table while cached. >> >> But when I am trying to cache in the table in smaller cluster which still >> has a total of 3 GB Driver memory and two executors with close to 2.5 GB >> memory the job still keeps on failing giving JVM out of memory errors. >> >> Is there something that I am missing? >> >> CODE: >> ================================================================= >> sparkSession = spark.builder \ >> .config("spark.rdd.compress", "true") \ >> .config("spark.serializer", >> "org.apache.spark.serializer.KryoSerializer") >> \ >> .config("spark.executor.extraJ >> avaOptions","-XX:+UseCompressedOops -XX:+PrintGCDetails >> -XX:+PrintGCTimeStamps") \ >> .appName("test").enableHiveSupport().getOrCreate() >> >> testdf = sparkSession.sql("select * from tablename") >> testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER) >> ================================================================= >> >> This causes JVM out of memory error. >> >> >> Regards, >> Gourav Sengupta >> > >
