Re: How does Spark streaming's Kafka direct stream survive from worker node failure?

Cody Koeninger Wed, 24 Feb 2016 19:59:26 -0800

The per partition offsets are part of the rdd as defined on the driver.
Have you read


https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md

and/or watched

https://www.youtube.com/watch?v=fXnNEq1v3VA

On Wed, Feb 24, 2016 at 9:05 PM, Yuhang Chen <yuhang.ch...@gmail.com> wrote:

> Hi, as far as I know, there is a 1:1 mapping between Spark partition and
> Kafka partition, and in Spark's fault-tolerance mechanism, if a partition
> failed, another partition will be used to recompute those data. And my
> questions are below:
>
> When a partition (worker node) fails in Spark Streaming,
> 1. Is its computation passed to another partition, or just waits for the
> failed partition to restart?
> 2. How does the restarted partition know the offset range it should
> consume from Kafka? It should consume the some data as the before-failed
> one, right?
>

Re: How does Spark streaming's Kafka direct stream survive from worker node failure?

Reply via email to