The per partition offsets are part of the rdd as defined on the driver. Have you read
https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md and/or watched https://www.youtube.com/watch?v=fXnNEq1v3VA On Wed, Feb 24, 2016 at 9:05 PM, Yuhang Chen <yuhang.ch...@gmail.com> wrote: > Hi, as far as I know, there is a 1:1 mapping between Spark partition and > Kafka partition, and in Spark's fault-tolerance mechanism, if a partition > failed, another partition will be used to recompute those data. And my > questions are below: > > When a partition (worker node) fails in Spark Streaming, > 1. Is its computation passed to another partition, or just waits for the > failed partition to restart? > 2. How does the restarted partition know the offset range it should > consume from Kafka? It should consume the some data as the before-failed > one, right? >