Hi, All! We use 3 kafka's brokers with replica-factor 2. Today we were doing partitions reassignment and one of our brokers was rebooted due some hardware problem. After broker had returned back to work we found that our consumer doesn't work with errors like:
ERROR java.lang.AssertionError: assumption failed: 765994 exceeds 6339 ERROR java.lang.AssertionError: assumption failed: 1501252 exceeds 416522 ERROR java.lang.AssertionError: assumption failed: 950819 exceeds 805377 Some logs from broker: [2015-09-01 13:00:16,976] ERROR [Replica Manager on Broker 61]: Error when processing fetch request for partition [avro_match,27] offset 208064729 from consumer with correlation id 0. Possible cause: Request for offset 208064729 but we only have log segments in the range 209248794 to 250879159. (kafka.server.ReplicaManager) [2015-09-01 13:01:17,943] ERROR [Replica Manager on Broker 45]: Error when processing fetch request for partition [logs.conv_expired,20] offset 454 from consumer with correlation id 0. Possible cause: Request for offset 454 but we only have log segments in the range 1349769 to 1476231. (kafka.server.ReplicaManager) [2015-09-01 13:21:23,896] INFO Partition [logs.avro_event,29] on broker 61: Expanding ISR for partition [logs.avro_event,29] from 61,77 to 61,77,45 (kafka.cluster.Partition) [2015-09-01 13:21:23,899] INFO Partition [logs.imp_tstvssamza,6] on broker 61: Expanding ISR for partition [logs.imp_tstvssamza,6] from 61,77 to 61,77,45 (kafka.cluster.Partition) [2015-09-01 13:21:23,902] INFO Partition [__consumer_offsets,30] on broker 61: Expanding ISR for partition [__consumer_offsets,30] from 61,77 to 61,77,45 (kafka.cluster.Partition) [2015-09-01 13:21:23,905] INFO Partition [logs.test_imp,44] on broker 61: Expanding ISR for partition [logs.test_imp,44] from 61 to 61,45 (kafka.cluster.Partition) Looks like we lost part of our data. Also kafka started to replicating random partitions (bad broker was already up and running, log recovery was completed): root@kafka2d:~# date && /usr/lib/kafka/bin/kafka-topics.sh --zookeeper zk-pool.gce-eu.kafka/kafka --under-replicated-partitions --describe | wc -l Tue Sep 1 13:02:24 UTC 2015 431 root@kafka2d:~# date && /usr/lib/kafka/bin/kafka-topics.sh --zookeeper zk-pool.gce-eu.kafka/kafka --under-replicated-partitions --describe | wc -l Tue Sep 1 13:02:37 UTC 2015 386 root@kafka2d:~# date && /usr/lib/kafka/bin/kafka-topics.sh --zookeeper zk-pool.gce-eu.kafka/kafka --under-replicated-partitions --describe | wc -l Tue Sep 1 13:02:48 UTC 2015 501 root@kafka2d:~# date && /usr/lib/kafka/bin/kafka-topics.sh --zookeeper zk-pool.gce-eu.kafka/kafka --under-replicated-partitions --describe | wc -l Tue Sep 1 13:02:58 UTC 2015 288 root@kafka2d:~# date && /usr/lib/kafka/bin/kafka-topics.sh --zookeeper zk-pool.gce-eu.kafka/kafka --under-replicated-partitions --describe | wc -l Tue Sep 1 13:03:08 UTC 2015 363 Could anyone throw some light on this situation? We use ext4 on our brokers and these settings: port=9092 num.network.threads=2 num.io.threads=8 socket.send.buffer.bytes=1048576 socket.receive.buffer.bytes=1048576 socket.request.max.bytes=104857600 log.dirs=/mnt/kafka/kafka-data num.partitions=1 default.replication.factor=2 message.max.bytes=10000000 replica.fetch.max.bytes=10000000 auto.create.topics.enable=false log.roll.hours=24 num.replica.fetchers=4 auto.leader.rebalance.enable=true log.retention.hours=168 log.segment.bytes=134217728 log.retention.check.interval.ms=60000 log.cleaner.enable=false delete.topic.enable=true zookeeper.connect=zk1d.gce-eu.kafka:2181,zk2d.gce-eu.kafka:2181,zk3d.gce-eu.kafka:2181/kafka zookeeper.connection.timeout.ms=6000 Shall I do something with these parameters or cluster with 3 brokers and with replica-factor=2 should prevent such issues? log.flush.interval.ms log.flush.interval.messageslog.flush.scheduler.interval.ms THX! -- Best regards, Gleb Zhukov
