I get java.lang.OutOfMemoryError: GC overhead limit exceeded when trying
coutn action on a file.

The file is a CSV file 217GB zise

Im using a 10 r3.8xlarge(ubuntu) machines cdh 5.3.6 and spark 1.2.0

configutation:

spark.app.id:local-1443956477103

spark.app.name:Spark shell

spark.cores.max:100

spark.driver.cores:24

spark.driver.extraLibraryPath:/opt/cloudera/parcels/CDH-5.3.6-1.cdh5.3.6.p0.11/lib/hadoop/lib/native
spark.driver.host:ip-172-31-34-242.us-west-2.compute.internal

spark.driver.maxResultSize:300g

spark.driver.port:55123

spark.eventLog.dir:hdfs://ip-172-31-34-242.us-west-2.compute.internal:8020/user/spark/applicationHistory
spark.eventLog.enabled:true

spark.executor.extraLibraryPath:/opt/cloudera/parcels/CDH-5.3.6-1.cdh5.3.6.p0.11/lib/hadoop/lib/native

spark.executor.id:driver spark.executor.memory:200g

spark.fileserver.uri:http://172.31.34.242:51424

spark.jars: spark.master:local[*]

spark.repl.class.uri:http://172.31.34.242:58244

spark.scheduler.mode:FIFO

spark.serializer:org.apache.spark.serializer.KryoSerializer

spark.storage.memoryFraction:0.9

spark.tachyonStore.folderName:spark-88bd9c44-d626-4ad2-8df3-f89df4cb30de

spark.yarn.historyServer.address:http://ip-172-31-34-242.us-west-2.compute.internal:18088

here is what I ran:

val testrdd =
sc.textFile("hdfs://ip-172-31-34-242.us-west-2.compute.internal:8020/user/jethro/tables/edw_fact_lsx_detail/edw_fact_lsx_detail.csv")

testrdd.persist(org.apache.spark.storage.StorageLevel.MEMORY_ONLY_SER)

testrdd.count()

If I dont force it in memeory it sorks fine, but considering the cluster Im
running on it should fit in memory properly.

Any ideas?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-GC-overhead-limit-exceeded-tp24918.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to