We are running a batch job with the following specifications
•       Building RandomForest with config : maxbins=100, depth=19, num of trees 
=
20
•       Multiple runs with different input data size 2.8 GB, 10 Million records
•       We are running spark application on Yarn in cluster mode, with 3 Node
Managers(each with 16 virtual cores and 96G RAM)
•       Spark config :  
o       spark.driver.cores = 2
o       spark.driver.memory = 32 G
o       spark.executor.instances = 5  and spark.executor.cores = 8 so 40 cores 
in
total.
o       spark.executor.memory= 32G so total executor memory around 160 G.

We are collecting execution times for the tasks using a SparkListener, and
also the total execution time for the application from the Spark Web UI.
Across all the tests we saw consistently that,  sum total of the execution
times of all the tasks is accounting to about 60% of the total application
run time.
We are just kind of wondering where is the rest of the 40% of the time being
spent.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Only-60-of-Total-Spark-Batch-Application-execution-time-spent-in-Task-Processing-tp26703.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to