Appreciate you efforts

On 2 Feb 2018 2:37 p.m., "Zoran" <zoran.ljubi...@bulbtech.com> wrote:

> I just want to say that I have solved the situation by deleting
> zookeeper's and kafka's data directories and setting
> offsets.topic.replication.factor=3 in kafka server.properties file.
>
> After that, __consumer_offsets topic is replicated and everything works as
> expected.
>
> I hope this will help to someone.
>
>
> Regards.
>
>
> On 01/30/2018 03:02 PM, Zoran wrote:
>
>> Sorry, I have attached wrong server.properties file. Now the right one is
>> in the attachment.
>>
>> Regards.
>>
>>
>> On 01/30/2018 02:59 PM, Zoran wrote:
>>
>>> Hi,
>>>
>>> I have three servers:
>>>
>>> blade1 (192.168.112.31),
>>> blade2 (192.168.112.32) and
>>> blade3 (192.168.112.33).
>>>
>>> On each of servers kafka_2.11-1.0.0 is installed.
>>> On blade3 (192.168.112.33:2181) zookeeper is installed as well.
>>>
>>> I have created a topic repl3part5 with the following line:
>>>
>>> bin/kafka-topics.sh --zookeeper 192.168.112.33:2181 --create
>>> --replication-factor 3 --partitions 5 --topic repl3part5
>>>
>>> When I describe the topic, it looks like this:
>>>
>>> [root@blade1 kafka]# bin/kafka-topics.sh --describe --topic repl3part5
>>> --zookeeper 192.168.112.33:2181
>>>
>>> Topic:repl3part5    PartitionCount:5    ReplicationFactor:3 Configs:
>>>     Topic: repl3part5    Partition: 0    Leader: 2    Replicas: 2,3,1
>>> Isr: 2,3,1
>>>     Topic: repl3part5    Partition: 1    Leader: 3    Replicas: 3,1,2
>>> Isr: 3,1,2
>>>     Topic: repl3part5    Partition: 2    Leader: 1    Replicas: 1,2,3
>>> Isr: 1,2,3
>>>     Topic: repl3part5    Partition: 3    Leader: 2    Replicas: 2,1,3
>>> Isr: 2,1,3
>>>     Topic: repl3part5    Partition: 4    Leader: 3    Replicas: 3,2,1
>>> Isr: 3,2,1
>>>
>>> I have a producer for this topic:
>>>
>>> bin/kafka-console-producer.sh --broker-list 192.168.112.31:9092,
>>> 192.168.112.32:9092,192.168.112.33:9092 --topic repl3part5
>>>
>>> and single consumer:
>>>
>>> bin/kafka-console-consumer.sh --bootstrap-server 192.168.112.31:9092,
>>> 192.168.112.32:9092,192.168.112.33:9092 --topic repl3part5
>>> --consumer-property group.id=zoran_1
>>>
>>> Every message that is sent by producer gets collected by consumer. So
>>> far - so good.
>>>
>>> Now I would like to test fail over of the kafka servers. If I put down
>>> blade 3 kafka service, I get consumer warnings but all produced messages
>>> are still consumed.
>>>
>>> [2018-01-30 14:30:01,203] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 3 could not be established. Broker may
>>> not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:30:01,299] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 3 could not be established. Broker may
>>> not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:30:01,475] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 3 could not be established. Broker may
>>> not be available. (org.apache.kafka.clients.NetworkClient)
>>>
>>> Now I have started up kafka service on blade 3 and I have put down kafka
>>> service on blade 2 server.
>>> Consumer now showed one warning but all produced messages are still
>>> consumed.
>>>
>>> [2018-01-30 14:31:38,164] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 2 could not be established. Broker may
>>> not be available. (org.apache.kafka.clients.NetworkClient)
>>>
>>> Now I have started up kafka service on blade 2 and I have put down kafka
>>> service on blade 1 server.
>>>
>>> Consumer now shows warnings about node 1/2147483646, but also
>>> Asynchronous auto-commit of offsets ... failed: Offset commit failed with a
>>> retriable exception. You should retry committing offsets. The underlying
>>> error was: null.
>>>
>>> [2018-01-30 14:33:16,393] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 1 could not be established. Broker may
>>> not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:16,469] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 2147483646 could not be established.
>>> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:16,557] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 1 could not be established. Broker may
>>> not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:16,986] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 2147483646 could not be established.
>>> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:16,991] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 1 could not be established. Broker may
>>> not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:17,493] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 2147483646 could not be established.
>>> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:17,495] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 1 could not be established. Broker may
>>> not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:18,002] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 2147483646 could not be established.
>>> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:18,003] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Asynchronous auto-commit of offsets
>>> {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''},
>>> repl3part5-3=OffsetAndMetadata{offset=20, metadata=''},
>>> repl3part5-2=OffsetAndMetadata{offset=19, metadata=''},
>>> repl3part5-1=OffsetAndMetadata{offset=20, metadata=''},
>>> repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset
>>> commit failed with a retriable exception. You should retry committing
>>> offsets. The underlying error was: null (org.apache.kafka.clients.cons
>>> umer.internals.ConsumerCoordinator)
>>> [2018-01-30 14:33:18,611] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 1 could not be established. Broker may
>>> not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:18,932] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 2147483646 could not be established.
>>> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:18,933] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Asynchronous auto-commit of offsets
>>> {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''},
>>> repl3part5-3=OffsetAndMetadata{offset=20, metadata=''},
>>> repl3part5-2=OffsetAndMetadata{offset=19, metadata=''},
>>> repl3part5-1=OffsetAndMetadata{offset=20, metadata=''},
>>> repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset
>>> commit failed with a retriable exception. You should retry committing
>>> offsets. The underlying error was: null (org.apache.kafka.clients.cons
>>> umer.internals.ConsumerCoordinator)
>>> [2018-01-30 14:33:19,977] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 2147483646 could not be established.
>>> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
>>> [2018-01-30 14:33:19,978] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Asynchronous auto-commit of offsets
>>> {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''},
>>> repl3part5-3=OffsetAndMetadata{offset=20, metadata=''},
>>> repl3part5-2=OffsetAndMetadata{offset=19, metadata=''},
>>> repl3part5-1=OffsetAndMetadata{offset=20, metadata=''},
>>> repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset
>>> commit failed with a retriable exception. You should retry committing
>>> offsets. The underlying error was: null (org.apache.kafka.clients.cons
>>> umer.internals.ConsumerCoordinator)
>>> [2018-01-30 14:33:19,979] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Connection to node 1 could not be established. Broker may
>>> not be available. (org.apache.kafka.clients.NetworkClient)
>>>
>>> I tried to solve the problem by adding a offsets.topic.replication.factor=2
>>> (or 3) on all three server.properties file (one of them is attached), but
>>> with no success.
>>> My idea was that topic __consumer_offset wasn't replicated throughout
>>> the cluster, but looks like it is not the case here.
>>>
>>> While blade 1 kafka service was down topic describe showed the following:
>>>
>>> [root@blade1 kafka]# bin/kafka-topics.sh --describe --topic repl3part5
>>> --zookeeper 192.168.112.33:2181
>>>
>>> Topic:repl3part5    PartitionCount:5    ReplicationFactor:3 Configs:
>>>     Topic: repl3part5    Partition: 0    Leader: 3    Replicas: 2,3,1
>>> Isr: 3
>>>     Topic: repl3part5    Partition: 1    Leader: 3    Replicas: 3,1,2
>>> Isr: 3
>>>     Topic: repl3part5    Partition: 2    Leader: 3    Replicas: 1,2,3
>>> Isr: 3
>>>     Topic: repl3part5    Partition: 3    Leader: 3    Replicas: 2,1,3
>>> Isr: 3
>>>     Topic: repl3part5    Partition: 4    Leader: 3    Replicas: 3,2,1
>>> Isr: 3
>>>
>>> Producer now shows the following warning, it still puts messages on the
>>> topic but messages are just raising lag count on partitions:
>>>
>>> [2018-01-30 14:37:21,816] WARN [Producer clientId=console-producer]
>>> Connection to node 1 could not be established. Broker may not be available.
>>> (org.apache.kafka.clients.NetworkClient)
>>>
>>> I noticed that while kafka service on blade1 is alive, I can put down/up
>>> blade 2 and 3 in any combination and consumer will always be able to
>>> consume messages.
>>> If kafka service on blade 1 is down, than even if kafka services on
>>> blade 2 and blade 3 are up and running, consumer cannot consume messages.
>>>
>>> After bringing kafka service up on blade 1, all messages that producer
>>> has sent while kafka service on blade 1 was down are replayed and than the
>>> following is showed in consumer terminal:
>>>
>>> [2018-01-30 14:44:30,817] ERROR [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Offset commit failed on partition repl3part5-4 at offset
>>> 20: This is not the correct coordinator. (org.apache.kafka.clients.cons
>>> umer.internals.ConsumerCoordinator)
>>> [2018-01-30 14:44:30,817] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Asynchronous auto-commit of offsets
>>> {repl3part5-4=OffsetAndMetadata{offset=20, metadata=''},
>>> repl3part5-3=OffsetAndMetadata{offset=22, metadata=''},
>>> repl3part5-2=OffsetAndMetadata{offset=20, metadata=''},
>>> repl3part5-1=OffsetAndMetadata{offset=22, metadata=''},
>>> repl3part5-0=OffsetAndMetadata{offset=22, metadata=''}} failed: Offset
>>> commit failed with a retriable exception. You should retry committing
>>> offsets. The underlying error was: This is not the correct coordinator.
>>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
>>> [2018-01-30 14:44:31,202] ERROR [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Offset commit failed on partition repl3part5-4 at offset
>>> 22: This is not the correct coordinator. (org.apache.kafka.clients.cons
>>> umer.internals.ConsumerCoordinator)
>>> [2018-01-30 14:44:31,202] WARN [Consumer clientId=consumer-1,
>>> groupId=zoran_1] Asynchronous auto-commit of offsets
>>> {repl3part5-4=OffsetAndMetadata{offset=22, metadata=''},
>>> repl3part5-3=OffsetAndMetadata{offset=24, metadata=''},
>>> repl3part5-2=OffsetAndMetadata{offset=22, metadata=''},
>>> repl3part5-1=OffsetAndMetadata{offset=24, metadata=''},
>>> repl3part5-0=OffsetAndMetadata{offset=24, metadata=''}} failed: Offset
>>> commit failed with a retriable exception. You should retry committing
>>> offsets. The underlying error was: This is not the correct coordinator.
>>> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
>>>
>>> From now on everything works with no problems or warnings and the system
>>> is fully functional.
>>>
>>> Can someone explain to me why kafka server on blade 1 is so important,
>>> and what are my options in order to be able to stop any of the two servers
>>> (including kafka server on blade 1) and be able to consume messages with no
>>> delay?
>>> This thing drives me crazy. :)
>>>
>>> Can you please help?
>>>
>>> Regards.
>>>
>>
>>
>

Reply via email to