Hello there, I have a spark running in a 20 node cluster. The job is logically simple, just a mapPartition and then sum. The return value of the mapPartitions is an integer for each partition. The tasks got some random failure (which could be caused by a 3rh party key-value store connections. The cause is irrelevant to my question). In more details,
Description: 1. spark 1.1.1. 2. 4096 tasks total. 3. 66 failed tasks. Issue: Spark seems rerunning all the 4096 tasks instead of the 66 failed tasks. It current runs at 469/4096 (stage2). Is this behavior normal? Thanks for your help! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-s-behavior-about-failed-tasks-tp24232.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org