Do developers have to be aware of Spark's fault tolerance mechanism?

Sung Hwan Chung Fri, 18 Apr 2014 17:13:27 -0700

Are there scenarios where the developers have to be aware of how Spark's
fault tolerance works to implement correct programs?


It seems that if we want to maintain any sort of mutable state in each
worker through iterations, it can have some unintended effect once a
machine goes down.

E.g.,

while (true) {
  rdd.map((row : Array[Double]) => {
    row[numCols - 1] = computeSomething(row)
  }).reduce(...)
}

If it fails at some point, I'd imagine that the intermediate info being
stored in row[numCols - 1] will be lost. And unless Spark runs this whole
thing from the very first iteration, things will get out of sync.

I'd imagine that as long as we don't use mutable tricks inside of worker
tasks, we should be OK, but once we start doing that, things could get
ugly, unless we account for how Spark handles fault tolerance?

Do developers have to be aware of Spark's fault tolerance mechanism?

Reply via email to