How does Spark streaming's Kafka direct stream survive from worker node failure?

Yuhang Chen Wed, 24 Feb 2016 19:06:06 -0800

Hi, as far as I know, there is a 1:1 mapping between Spark partition and
Kafka partition, and in Spark's fault-tolerance mechanism, if a partition
failed, another partition will be used to recompute those data. And my
questions are below:


When a partition (worker node) fails in Spark Streaming,
1. Is its computation passed to another partition, or just waits for the
failed partition to restart?
2. How does the restarted partition know the offset range it should consume
from Kafka? It should consume the some data as the before-failed one, right?

How does Spark streaming's Kafka direct stream survive from worker node failure?

Reply via email to