Hi Celebron community,

I wanted to understand better on how Celeborn handles FetchFailed
exceptions. In Spark `DAGScheduler` fetch failure handling, code tries to
unregister the map output for the fetch failed mapIndex

} else if (mapIndex != -1) {
  // Mark the map whose fetch failed as broken in the map stage
  mapOutputTracker.unregisterMapOutput(shuffleId, mapIndex, bmAddress)

}

But in the Celeborn case mapIndex will always be -1. So how does the
shuffle output get cleared for the same.
Ideally mapOutputTracker.unregisterAllMapAndMergeOutput(shuffleId) should
be called for the fetch failed stage but I'm not able to find that code
piece.

Can someone help me understand this, I might be missing something basic
here.

Reply via email to