Yes. On Thu, Feb 25, 2016 at 9:45 PM, yuhang.chenn <yuhang.ch...@gmail.com> wrote:
> Thanks a lot. > And I got another question: What would happen if I didn't set > "spark.streaming.kafka.maxRatePerPartition"? Will Spark Streamning try to > consume all the messages in Kafka? > > 发自WPS邮箱客戶端 > 在 Cody Koeninger <c...@koeninger.org>,2016年2月25日 上午11:58写道: > > The per partition offsets are part of the rdd as defined on the driver. > Have you read > > https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md > > and/or watched > > https://www.youtube.com/watch?v=fXnNEq1v3VA > > On Wed, Feb 24, 2016 at 9:05 PM, Yuhang Chen <yuhang.ch...@gmail.com> > wrote: > >> Hi, as far as I know, there is a 1:1 mapping between Spark partition and >> Kafka partition, and in Spark's fault-tolerance mechanism, if a partition >> failed, another partition will be used to recompute those data. And my >> questions are below: >> >> When a partition (worker node) fails in Spark Streaming, >> 1. Is its computation passed to another partition, or just waits for the >> failed partition to restart? >> 2. How does the restarted partition know the offset range it should >> consume from Kafka? It should consume the some data as the before-failed >> one, right? >> > >