We are running a batch job with the following specifications • Building RandomForest with config : maxbins=100, depth=19, num of trees = 20 • Multiple runs with different input data size 2.8 GB, 10 Million records • We are running spark application on Yarn in cluster mode, with 3 Node Managers(each with 16 virtual cores and 96G RAM) • Spark config : o spark.driver.cores = 2 o spark.driver.memory = 32 G o spark.executor.instances = 5 and spark.executor.cores = 8 so 40 cores in total. o spark.executor.memory= 32G so total executor memory around 160 G.
We are collecting execution times for the tasks using a SparkListener, and also the total execution time for the application from the Spark Web UI. Across all the tests we saw consistently that, sum total of the execution times of all the tasks is accounting to about 60% of the total application run time. We are just kind of wondering where is the rest of the 40% of the time being spent. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Only-60-of-Total-Spark-Batch-Application-execution-time-spent-in-Task-Processing-tp26703.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org