1. If a task complete the operation, it will notify driver. The driver may not 
receive the message due to the network, and think the task is still running. 
Then the child stage won't be scheduled ?
2. how do spark guarantee the downstream-task can receive the shuffle-data 
completely. As fact, I can't find the checksum for blocks in spark. For 
example, the upstream-task may shuffle 100Mb data, but the downstream-task may 
receive 99Mb data due to network. Can spark verify the data is received 
completely based size ?

Reply via email to