Hi, Yes, I'm running the executors with 8 cores each. I also have properly configured executor memory, driver memory, num execs and so on in submit cmd. I'm a long time spark user, please lets skip the dummy cmd configuration stuff and dive in the interesting stuff :)
Another strange thing I've noticed is that this behaviour seems to be exposed only for reading jsons: - reading json from remote hdfs -> uneven executor performance - reading parquets from remote hdfs -> even executor performance >>What do you mean by the difference between the nodes is huge ? When I look at the Input column in the Executors tab of the Spark WebUI the values for the nodes that do the work vs all others is huge. For example in the image below the diff is x4, but sometimes I've seen in x10 in the same usecase. <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26504/inputcol.png> -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tp26502p26504.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org