Re: How does Spark streaming's Kafka direct stream survive from worker node failure?

Cody Koeninger Fri, 26 Feb 2016 09:02:21 -0800

Yes.

On Thu, Feb 25, 2016 at 9:45 PM, yuhang.chenn <yuhang.ch...@gmail.com>
wrote:


> Thanks a lot.
> And I got another question: What would happen if I didn't set
> "spark.streaming.kafka.maxRatePerPartition"? Will Spark Streamning try to
> consume all the messages in Kafka?
>
> 发自WPS邮箱客戶端
> 在 Cody Koeninger <c...@koeninger.org>，2016年2月25日 上午11:58写道：
>
> The per partition offsets are part of the rdd as defined on the driver.
> Have you read
>
> https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
>
> and/or watched
>
> https://www.youtube.com/watch?v=fXnNEq1v3VA
>
> On Wed, Feb 24, 2016 at 9:05 PM, Yuhang Chen <yuhang.ch...@gmail.com>
> wrote:
>
>> Hi, as far as I know, there is a 1:1 mapping between Spark partition and
>> Kafka partition, and in Spark's fault-tolerance mechanism, if a partition
>> failed, another partition will be used to recompute those data. And my
>> questions are below:
>>
>> When a partition (worker node) fails in Spark Streaming,
>> 1. Is its computation passed to another partition, or just waits for the
>> failed partition to restart?
>> 2. How does the restarted partition know the offset range it should
>> consume from Kafka? It should consume the some data as the before-failed
>> one, right?
>>
>
>

Re: How does Spark streaming's Kafka direct stream survive from worker node failure?

Reply via email to