I would like to test the latency (tasks/s) perceived in a simple application on Apache Spark.
The idea: The workers will generate random data to be placed in a list. The final action (count) will count the total number of data points generated. Below, the numberOfPartitions is equal to the number of datapoints which need to be generated (datapoints are integers). Although the code works as expected, a total of 119 spark executors were killed while running with 64 slaves. I feel this is because since spark assigns executors to each node, the amount of total partitions each node is assigned to compute may be larger than the available memory on that node. This causes these executors to be killed and therefore, the latency measurement I would like to analyze is inaccurate. Any assistance with code cleanup below or how to fix the above issue to decrease the number of killed executors, would be much appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Latency-experiment-without-losing-executors-tp26981.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org