Hello, I'm trying to run a simple test program that loads a large file (~12.4GB) into memory of a single many-core machine. The machine I'm using has more than enough memory (1TB RAM) and 64 cores (of which I want to use 16 for worker threads). Even though I set both the executor memory (spark.executor.memory) to 200GB in SparkContext and set the JMV memory to 200GB (-Xmx200g) in spark-env.sh, I keep getting errors when trying to load input: "java.lang.OutOfMemoryError: GC overhead limit exceeded". I believe that the memory configuration parameters I pass do not stick, as I get the following message when running: "14/03/01 22:09:31 INFO storage.MemoryStore: MemoryStore started with capacity 883.2 MB." Obviously I'm missing something when configuring Spark, but I can't figure out what, and I'd appreciate your help.
The test program I'm running (not through shell, but as a standalone scala app): import org.apache.spark._ import org.apache.spark.rdd.RDD import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ object LoadBenchmark { def main(args: Array[String]) { val conf = new SparkConf().setMaster("local[16]").setAppName("Load Benchmark").set("spark.executor.memory", "200g") val sc = new SparkContext(conf) println("LOADING INPUT FILE") val edges = sc.textFile("/lfs/madmax/0/yonathan/half_twitter_rv.txt").cache() val cnt = edges.count() println("edge count: "+ cnt) } } The contents of the spark-env.sh file: # Examples of app-wide options : -Dspark.serializer SPARK_JAVA_OPTS+="-Xms200g -Xmx200g -XX:-UseGCOverheadLimit" export SPARK_JAVA_OPTS # If using the standalone deploy mode, you can also set variables for it here: # - SPARK_MASTER_IP, to bind the master to a different IP address or hostname # - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports # - SPARK_WORKER_CORES, to set the number of cores to use on this machine SPARK_WORKER_CORES=16 export SPARK_WORKER_CORES # - SPARK_WORKER_MEMORY, to set how much memory to use (e.g. 1000m, 2g) SPARK_WORKER_MEMORY=200g export SPARK_WORKER_MEMORY # - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT # - SPARK_WORKER_INSTANCES, to set the number of worker processes per node # - SPARK_WORKER_DIR, to set the working directory of worker processes Thank you!