Hi Shimi,

0.10.2.1 contains a number of fixes that should make the out of box experience 
better, including resiliency under broker failures and better exception 
handling. If you ever get back to it, and if the problem happens again, please 
do send us the logs and we'll happily have a look.

Thanks
Eno
> On 1 May 2017, at 12:05, Shimi Kiviti <shim...@gmail.com> wrote:
> 
> Hi Eno,
> I am afraid I played too much with the configuration to make this
> productive investigation :(
> 
> This is a QA environment which includes 2 kafka instances and 3 zookeeper
> instances in AWS. There are only 3 partition for this topic.
> Kafka broker and kafka-stream are version 0.10.1.1
> Our kafka-stream app run on docker using kubernetes.
> I played around with with 1 to 3  kafka-stream processes, but I got the
> same results. It is too easy to scale with kubernetes :)
> Since there are only 3 partitions, I didn't start more then 3 instances.
> 
> I was too quick to upgraded only the kafka-stream app to 0.10.2.1 with hope
> that it will solve the problem, It didn't.
> The log I sent before are from this version.
> 
> I did notice "unknown" offset for the main topic with kafka-stream version
> 0.10.2.1
> $ ./bin/kafka-consumer-groups.sh   --bootstrap-server localhost:9092
> --describe --group sa
> GROUP                          TOPIC                          PARTITION
> CURRENT-OFFSET  LOG-END-OFFSET  LAG             OWNER
> sa             sa-events                 0          842199
> 842199          0
> sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9
> sa             sa-events                 1          1078428
> 1078428         0
> sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9
> sa             sa-events                 2          unknown
> 26093910        unknown
> sa-4557bf2d-ba79-42a6-aa05-5b4c9013c022-StreamThread-1-consumer_/10.0.10.9
> 
> After that I downgraded the kafka-stream app back to version 0.10.1.1
> After a LONG startup time (more than an hour) where the status of the group
> was rebalancing, all the 3 processes started processing messages again.
> 
> This all thing started after we hit a bug in our code (NPE) that crashed
> the stream processing thread.
> So now after 4 days, everything is back to normal.
> This worries me since it can happen again
> 
> 
> On Mon, May 1, 2017 at 11:45 AM, Eno Thereska <eno.there...@gmail.com>
> wrote:
> 
>> Hi Shimi,
>> 
>> Could you provide more info on your setup? How many kafka streams
>> processes do you have and from how many partitions are they consuming from.
>> If you have more processes than partitions some of the processes will be
>> idle and won’t do anything.
>> 
>> Eno
>>> On Apr 30, 2017, at 5:58 PM, Shimi Kiviti <shim...@gmail.com> wrote:
>>> 
>>> Hi Everyone,
>>> 
>>> I have a problem and I hope one of you can help me figuring it out.
>>> One of our kafka-streams processes stopped processing messages
>>> 
>>> When I turn on debug log I see lots of these messages:
>>> 
>>> 2017-04-30 15:42:20,228 [StreamThread-1] DEBUG o.a.k.c.c.i.Fetcher:
>> Sending
>>> fetch for partitions [devlast-changelog-2] to broker ip-x-x-x-x
>>> .ec2.internal:9092 (id: 1 rack: null)
>>> 2017-04-30 15:42:20,696 [StreamThread-1] DEBUG o.a.k.c.c.i.Fetcher:
>>> Ignoring fetched records for devlast-changelog-2 at offset 2962649 since
>>> the current position is 2963379
>>> 
>>> After a LONG time, the only messages in the log are these:
>>> 
>>> 2017-04-30 16:46:33,324 [kafka-coordinator-heartbeat-thread | sa] DEBUG
>>> o.a.k.c.c.i.AbstractCoordinator: Sending Heartbeat request for group sa
>> to
>>> coordinator ip-x-x-x-x.ec2.internal:9092 (id: 2147483646 rack: null)
>>> 2017-04-30 16:46:33,425 [kafka-coordinator-heartbeat-thread | sa] DEBUG
>>> o.a.k.c.c.i.AbstractCoordinator: Received successful Heartbeat response
>> for
>>> group same
>>> 
>>> Any idea?
>>> 
>>> Thanks,
>>> Shimi
>> 
>> 

Reply via email to