If one executor fails, it moves the processing over to another executor. However, if the data is lost, it re-executes the processing that generated the data, and might have to go back to the source.Does this mean that only those tasks that the dead executor was executing at the time need to be rerun to generate the processing stages. If I am correct, It uses RDD lineage to figure out what needs to be re-executed. Remember we are talking about the executor failure not node failure hereI don’t know the details how it determines which tasks to run, but I am guessing that it is a multi-stage job, it might have to rerun all the stages again. For example, if you have done a groupBy, you will have 2 stages. After the first stage, the data will be shuffled by hashing the groupBy key , so that data for the same value of key lands in the same partition. Now, if one of those partitions is lost during execution of the second stage, I am guessing Spark will have to go back and re-execute all the tasks in the first stage.
HTH Mich Talebzadeh, Lead Solutions Architect/Engineering Lead Palantir Technologies Limited London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Tue, 20 Jun 2023 at 20:07, Nikhil Goyal <nownik...@gmail.com> wrote: > Hi folks, > When running Spark on K8s, what would happen to shuffle data if an > executor is terminated or lost. Since there is no shuffle service, does all > the work done by that executor gets recomputed? > > Thanks > Nikhil >