Hi,

I have done all of that, but my question is "why should a 62 MB data give
memory error when we have over 2 GB of memory available".

Therefore all that is mentioned by Zhoukang is not pertinent at all.


Regards,
Gourav Sengupta

On Fri, Jul 28, 2017 at 4:43 AM, 周康 <[email protected]> wrote:

> testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER) maybe
> StorageLevel should change.And check you config "
> spark.memory.storageFraction" which default value is 0.5
>
> 2017-07-28 3:04 GMT+08:00 Gourav Sengupta <[email protected]>:
>
>> Hi,
>>
>> I cached in a table in a large EMR cluster and it has a size of 62 MB.
>> Therefore I know the size of the table while cached.
>>
>> But when I am trying to cache in the table in smaller cluster which still
>> has a total of 3 GB Driver memory and two executors with close to 2.5 GB
>> memory the job still keeps on failing giving JVM out of memory errors.
>>
>> Is there something that I am missing?
>>
>> CODE:
>> =================================================================
>> sparkSession =  spark.builder \
>>                 .config("spark.rdd.compress", "true") \
>>                 .config("spark.serializer", 
>> "org.apache.spark.serializer.KryoSerializer")
>> \
>>                 .config("spark.executor.extraJ
>> avaOptions","-XX:+UseCompressedOops -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps") \
>>                 .appName("test").enableHiveSupport().getOrCreate()
>>
>> testdf = sparkSession.sql("select * from tablename")
>> testdf.persist(pyspark.storagelevel.StorageLevel.MEMORY_ONLY_SER)
>> =================================================================
>>
>> This causes JVM out of memory error.
>>
>>
>> Regards,
>> Gourav Sengupta
>>
>
>

Reply via email to