Appreciate you efforts On 2 Feb 2018 2:37 p.m., "Zoran" <zoran.ljubi...@bulbtech.com> wrote:
> I just want to say that I have solved the situation by deleting > zookeeper's and kafka's data directories and setting > offsets.topic.replication.factor=3 in kafka server.properties file. > > After that, __consumer_offsets topic is replicated and everything works as > expected. > > I hope this will help to someone. > > > Regards. > > > On 01/30/2018 03:02 PM, Zoran wrote: > >> Sorry, I have attached wrong server.properties file. Now the right one is >> in the attachment. >> >> Regards. >> >> >> On 01/30/2018 02:59 PM, Zoran wrote: >> >>> Hi, >>> >>> I have three servers: >>> >>> blade1 (192.168.112.31), >>> blade2 (192.168.112.32) and >>> blade3 (192.168.112.33). >>> >>> On each of servers kafka_2.11-1.0.0 is installed. >>> On blade3 (192.168.112.33:2181) zookeeper is installed as well. >>> >>> I have created a topic repl3part5 with the following line: >>> >>> bin/kafka-topics.sh --zookeeper 192.168.112.33:2181 --create >>> --replication-factor 3 --partitions 5 --topic repl3part5 >>> >>> When I describe the topic, it looks like this: >>> >>> [root@blade1 kafka]# bin/kafka-topics.sh --describe --topic repl3part5 >>> --zookeeper 192.168.112.33:2181 >>> >>> Topic:repl3part5 PartitionCount:5 ReplicationFactor:3 Configs: >>> Topic: repl3part5 Partition: 0 Leader: 2 Replicas: 2,3,1 >>> Isr: 2,3,1 >>> Topic: repl3part5 Partition: 1 Leader: 3 Replicas: 3,1,2 >>> Isr: 3,1,2 >>> Topic: repl3part5 Partition: 2 Leader: 1 Replicas: 1,2,3 >>> Isr: 1,2,3 >>> Topic: repl3part5 Partition: 3 Leader: 2 Replicas: 2,1,3 >>> Isr: 2,1,3 >>> Topic: repl3part5 Partition: 4 Leader: 3 Replicas: 3,2,1 >>> Isr: 3,2,1 >>> >>> I have a producer for this topic: >>> >>> bin/kafka-console-producer.sh --broker-list 192.168.112.31:9092, >>> 192.168.112.32:9092,192.168.112.33:9092 --topic repl3part5 >>> >>> and single consumer: >>> >>> bin/kafka-console-consumer.sh --bootstrap-server 192.168.112.31:9092, >>> 192.168.112.32:9092,192.168.112.33:9092 --topic repl3part5 >>> --consumer-property group.id=zoran_1 >>> >>> Every message that is sent by producer gets collected by consumer. So >>> far - so good. >>> >>> Now I would like to test fail over of the kafka servers. If I put down >>> blade 3 kafka service, I get consumer warnings but all produced messages >>> are still consumed. >>> >>> [2018-01-30 14:30:01,203] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 3 could not be established. Broker may >>> not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:30:01,299] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 3 could not be established. Broker may >>> not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:30:01,475] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 3 could not be established. Broker may >>> not be available. (org.apache.kafka.clients.NetworkClient) >>> >>> Now I have started up kafka service on blade 3 and I have put down kafka >>> service on blade 2 server. >>> Consumer now showed one warning but all produced messages are still >>> consumed. >>> >>> [2018-01-30 14:31:38,164] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 2 could not be established. Broker may >>> not be available. (org.apache.kafka.clients.NetworkClient) >>> >>> Now I have started up kafka service on blade 2 and I have put down kafka >>> service on blade 1 server. >>> >>> Consumer now shows warnings about node 1/2147483646, but also >>> Asynchronous auto-commit of offsets ... failed: Offset commit failed with a >>> retriable exception. You should retry committing offsets. The underlying >>> error was: null. >>> >>> [2018-01-30 14:33:16,393] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 1 could not be established. Broker may >>> not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:16,469] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 2147483646 could not be established. >>> Broker may not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:16,557] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 1 could not be established. Broker may >>> not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:16,986] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 2147483646 could not be established. >>> Broker may not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:16,991] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 1 could not be established. Broker may >>> not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:17,493] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 2147483646 could not be established. >>> Broker may not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:17,495] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 1 could not be established. Broker may >>> not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:18,002] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 2147483646 could not be established. >>> Broker may not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:18,003] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Asynchronous auto-commit of offsets >>> {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, >>> repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, >>> repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, >>> repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, >>> repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset >>> commit failed with a retriable exception. You should retry committing >>> offsets. The underlying error was: null (org.apache.kafka.clients.cons >>> umer.internals.ConsumerCoordinator) >>> [2018-01-30 14:33:18,611] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 1 could not be established. Broker may >>> not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:18,932] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 2147483646 could not be established. >>> Broker may not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:18,933] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Asynchronous auto-commit of offsets >>> {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, >>> repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, >>> repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, >>> repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, >>> repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset >>> commit failed with a retriable exception. You should retry committing >>> offsets. The underlying error was: null (org.apache.kafka.clients.cons >>> umer.internals.ConsumerCoordinator) >>> [2018-01-30 14:33:19,977] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 2147483646 could not be established. >>> Broker may not be available. (org.apache.kafka.clients.NetworkClient) >>> [2018-01-30 14:33:19,978] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Asynchronous auto-commit of offsets >>> {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, >>> repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, >>> repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, >>> repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, >>> repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset >>> commit failed with a retriable exception. You should retry committing >>> offsets. The underlying error was: null (org.apache.kafka.clients.cons >>> umer.internals.ConsumerCoordinator) >>> [2018-01-30 14:33:19,979] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Connection to node 1 could not be established. Broker may >>> not be available. (org.apache.kafka.clients.NetworkClient) >>> >>> I tried to solve the problem by adding a offsets.topic.replication.factor=2 >>> (or 3) on all three server.properties file (one of them is attached), but >>> with no success. >>> My idea was that topic __consumer_offset wasn't replicated throughout >>> the cluster, but looks like it is not the case here. >>> >>> While blade 1 kafka service was down topic describe showed the following: >>> >>> [root@blade1 kafka]# bin/kafka-topics.sh --describe --topic repl3part5 >>> --zookeeper 192.168.112.33:2181 >>> >>> Topic:repl3part5 PartitionCount:5 ReplicationFactor:3 Configs: >>> Topic: repl3part5 Partition: 0 Leader: 3 Replicas: 2,3,1 >>> Isr: 3 >>> Topic: repl3part5 Partition: 1 Leader: 3 Replicas: 3,1,2 >>> Isr: 3 >>> Topic: repl3part5 Partition: 2 Leader: 3 Replicas: 1,2,3 >>> Isr: 3 >>> Topic: repl3part5 Partition: 3 Leader: 3 Replicas: 2,1,3 >>> Isr: 3 >>> Topic: repl3part5 Partition: 4 Leader: 3 Replicas: 3,2,1 >>> Isr: 3 >>> >>> Producer now shows the following warning, it still puts messages on the >>> topic but messages are just raising lag count on partitions: >>> >>> [2018-01-30 14:37:21,816] WARN [Producer clientId=console-producer] >>> Connection to node 1 could not be established. Broker may not be available. >>> (org.apache.kafka.clients.NetworkClient) >>> >>> I noticed that while kafka service on blade1 is alive, I can put down/up >>> blade 2 and 3 in any combination and consumer will always be able to >>> consume messages. >>> If kafka service on blade 1 is down, than even if kafka services on >>> blade 2 and blade 3 are up and running, consumer cannot consume messages. >>> >>> After bringing kafka service up on blade 1, all messages that producer >>> has sent while kafka service on blade 1 was down are replayed and than the >>> following is showed in consumer terminal: >>> >>> [2018-01-30 14:44:30,817] ERROR [Consumer clientId=consumer-1, >>> groupId=zoran_1] Offset commit failed on partition repl3part5-4 at offset >>> 20: This is not the correct coordinator. (org.apache.kafka.clients.cons >>> umer.internals.ConsumerCoordinator) >>> [2018-01-30 14:44:30,817] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Asynchronous auto-commit of offsets >>> {repl3part5-4=OffsetAndMetadata{offset=20, metadata=''}, >>> repl3part5-3=OffsetAndMetadata{offset=22, metadata=''}, >>> repl3part5-2=OffsetAndMetadata{offset=20, metadata=''}, >>> repl3part5-1=OffsetAndMetadata{offset=22, metadata=''}, >>> repl3part5-0=OffsetAndMetadata{offset=22, metadata=''}} failed: Offset >>> commit failed with a retriable exception. You should retry committing >>> offsets. The underlying error was: This is not the correct coordinator. >>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) >>> [2018-01-30 14:44:31,202] ERROR [Consumer clientId=consumer-1, >>> groupId=zoran_1] Offset commit failed on partition repl3part5-4 at offset >>> 22: This is not the correct coordinator. (org.apache.kafka.clients.cons >>> umer.internals.ConsumerCoordinator) >>> [2018-01-30 14:44:31,202] WARN [Consumer clientId=consumer-1, >>> groupId=zoran_1] Asynchronous auto-commit of offsets >>> {repl3part5-4=OffsetAndMetadata{offset=22, metadata=''}, >>> repl3part5-3=OffsetAndMetadata{offset=24, metadata=''}, >>> repl3part5-2=OffsetAndMetadata{offset=22, metadata=''}, >>> repl3part5-1=OffsetAndMetadata{offset=24, metadata=''}, >>> repl3part5-0=OffsetAndMetadata{offset=24, metadata=''}} failed: Offset >>> commit failed with a retriable exception. You should retry committing >>> offsets. The underlying error was: This is not the correct coordinator. >>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) >>> >>> From now on everything works with no problems or warnings and the system >>> is fully functional. >>> >>> Can someone explain to me why kafka server on blade 1 is so important, >>> and what are my options in order to be able to stop any of the two servers >>> (including kafka server on blade 1) and be able to consume messages with no >>> delay? >>> This thing drives me crazy. :) >>> >>> Can you please help? >>> >>> Regards. >>> >> >> >