Hi, I am running Spark applications in GCE. I set up cluster with different number of nodes varying from 1 to 7. The machines are single core machines. I set the spark.default.parallelism to the number of nodes in the cluster for each cluster. I ran the four applications available in Spark Examples, SparkTC, SparkALS, SparkLR, SparkPi for each of the configurations. What I notice is the following: In case of SparkTC and SparkALS, the time to complete the job increases with the increase in number of nodes in cluster, where as in SparkLR and SparkPi, the time to complete the job remains the same across all the configurations. Could anyone explain me this?
Thank You Regards, Deep