I am seeking help with a Spark configuration running queries against a
cluster of 6 machines. Each machine has Spark 1.5.1 with slaves started on 6
and 1 acting as master/thriftserver. I query from Beeline 2 tables that have
300M and 31M rows respectively. Results from my queries thus far return up
to 500M rows when queried using Oracle but Spark errors at anything more
than 5.5M rows. 

I believe there is an optimal memory configuration that must be set for each
of the workers in our cluster but I have not been able to determine that
setting. Is there something better than trial and error? Are there settings
to avoid such as making sure not to set spark.driver.maxResultSize >
spark.driver.memory?

Is there a formula or guidelines by which to calculate the correct Spark
configuration values when given a machines available cores and memory
resources? 

This is my current configuration:
BDA v3 server : SUN SERVER X4-2L
Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
CPU cores : 32
GB of memory (>=63): 63
number of disks : 12    spark-defaults.conf

spark.driver.memory                 20g
spark.executor.memory             40g
spark.executor.extraJavaOptions -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
spark.rpc.askTimeout                6000s
spark.rpc.lookupTimeout            3000s
spark.driver.maxResultSize        20g
spark.rdd.compress                   true
spark.storage.memoryFraction    1
spark.core.connection.ack.wait.timeout 600
spark.akka.frameSize                500
spark.shuffle.compress              true
spark.shuffle.file.buffer             128k
spark.shuffle.memoryFraction    0
spark.shuffle.spill.compress       true
spark.shuffle.spill                     true

Thank you,

Chris



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-defaults-conf-optimal-configuration-tp25641.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to