I just want to say that I have solved the situation by deleting
zookeeper's and kafka's data directories and setting
offsets.topic.replication.factor=3 in kafka server.properties file.
After that, __consumer_offsets topic is replicated and everything works
as expected.
I hope this will help to someone.
Regards.
On 01/30/2018 03:02 PM, Zoran wrote:
Sorry, I have attached wrong server.properties file. Now the right one
is in the attachment.
Regards.
On 01/30/2018 02:59 PM, Zoran wrote:
Hi,
I have three servers:
blade1 (192.168.112.31),
blade2 (192.168.112.32) and
blade3 (192.168.112.33).
On each of servers kafka_2.11-1.0.0 is installed.
On blade3 (192.168.112.33:2181) zookeeper is installed as well.
I have created a topic repl3part5 with the following line:
bin/kafka-topics.sh --zookeeper 192.168.112.33:2181 --create
--replication-factor 3 --partitions 5 --topic repl3part5
When I describe the topic, it looks like this:
[root@blade1 kafka]# bin/kafka-topics.sh --describe --topic
repl3part5 --zookeeper 192.168.112.33:2181
Topic:repl3part5 PartitionCount:5 ReplicationFactor:3 Configs:
Topic: repl3part5 Partition: 0 Leader: 2 Replicas:
2,3,1 Isr: 2,3,1
Topic: repl3part5 Partition: 1 Leader: 3 Replicas:
3,1,2 Isr: 3,1,2
Topic: repl3part5 Partition: 2 Leader: 1 Replicas:
1,2,3 Isr: 1,2,3
Topic: repl3part5 Partition: 3 Leader: 2 Replicas:
2,1,3 Isr: 2,1,3
Topic: repl3part5 Partition: 4 Leader: 3 Replicas:
3,2,1 Isr: 3,2,1
I have a producer for this topic:
bin/kafka-console-producer.sh --broker-list
192.168.112.31:9092,192.168.112.32:9092,192.168.112.33:9092 --topic
repl3part5
and single consumer:
bin/kafka-console-consumer.sh --bootstrap-server
192.168.112.31:9092,192.168.112.32:9092,192.168.112.33:9092 --topic
repl3part5 --consumer-property group.id=zoran_1
Every message that is sent by producer gets collected by consumer. So
far - so good.
Now I would like to test fail over of the kafka servers. If I put
down blade 3 kafka service, I get consumer warnings but all produced
messages are still consumed.
[2018-01-30 14:30:01,203] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 3 could not be established.
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:30:01,299] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 3 could not be established.
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:30:01,475] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 3 could not be established.
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Now I have started up kafka service on blade 3 and I have put down
kafka service on blade 2 server.
Consumer now showed one warning but all produced messages are still
consumed.
[2018-01-30 14:31:38,164] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 2 could not be established.
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Now I have started up kafka service on blade 2 and I have put down
kafka service on blade 1 server.
Consumer now shows warnings about node 1/2147483646, but also
Asynchronous auto-commit of offsets ... failed: Offset commit failed
with a retriable exception. You should retry committing offsets. The
underlying error was: null.
[2018-01-30 14:33:16,393] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 1 could not be established.
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:16,469] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 2147483646 could not be
established. Broker may not be available.
(org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:16,557] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 1 could not be established.
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:16,986] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 2147483646 could not be
established. Broker may not be available.
(org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:16,991] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 1 could not be established.
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:17,493] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 2147483646 could not be
established. Broker may not be available.
(org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:17,495] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 1 could not be established.
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:18,002] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 2147483646 could not be
established. Broker may not be available.
(org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:18,003] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Asynchronous auto-commit of offsets
{repl3part5-4=OffsetAndMetadata{offset=18, metadata=''},
repl3part5-3=OffsetAndMetadata{offset=20, metadata=''},
repl3part5-2=OffsetAndMetadata{offset=19, metadata=''},
repl3part5-1=OffsetAndMetadata{offset=20, metadata=''},
repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed:
Offset commit failed with a retriable exception. You should retry
committing offsets. The underlying error was: null
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:33:18,611] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 1 could not be established.
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:18,932] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 2147483646 could not be
established. Broker may not be available.
(org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:18,933] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Asynchronous auto-commit of offsets
{repl3part5-4=OffsetAndMetadata{offset=18, metadata=''},
repl3part5-3=OffsetAndMetadata{offset=20, metadata=''},
repl3part5-2=OffsetAndMetadata{offset=19, metadata=''},
repl3part5-1=OffsetAndMetadata{offset=20, metadata=''},
repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed:
Offset commit failed with a retriable exception. You should retry
committing offsets. The underlying error was: null
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:33:19,977] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 2147483646 could not be
established. Broker may not be available.
(org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:19,978] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Asynchronous auto-commit of offsets
{repl3part5-4=OffsetAndMetadata{offset=18, metadata=''},
repl3part5-3=OffsetAndMetadata{offset=20, metadata=''},
repl3part5-2=OffsetAndMetadata{offset=19, metadata=''},
repl3part5-1=OffsetAndMetadata{offset=20, metadata=''},
repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed:
Offset commit failed with a retriable exception. You should retry
committing offsets. The underlying error was: null
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:33:19,979] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Connection to node 1 could not be established.
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
I tried to solve the problem by adding a
offsets.topic.replication.factor=2 (or 3) on all three
server.properties file (one of them is attached), but with no success.
My idea was that topic __consumer_offset wasn't replicated throughout
the cluster, but looks like it is not the case here.
While blade 1 kafka service was down topic describe showed the
following:
[root@blade1 kafka]# bin/kafka-topics.sh --describe --topic
repl3part5 --zookeeper 192.168.112.33:2181
Topic:repl3part5 PartitionCount:5 ReplicationFactor:3 Configs:
Topic: repl3part5 Partition: 0 Leader: 3 Replicas:
2,3,1 Isr: 3
Topic: repl3part5 Partition: 1 Leader: 3 Replicas:
3,1,2 Isr: 3
Topic: repl3part5 Partition: 2 Leader: 3 Replicas:
1,2,3 Isr: 3
Topic: repl3part5 Partition: 3 Leader: 3 Replicas:
2,1,3 Isr: 3
Topic: repl3part5 Partition: 4 Leader: 3 Replicas:
3,2,1 Isr: 3
Producer now shows the following warning, it still puts messages on
the topic but messages are just raising lag count on partitions:
[2018-01-30 14:37:21,816] WARN [Producer clientId=console-producer]
Connection to node 1 could not be established. Broker may not be
available. (org.apache.kafka.clients.NetworkClient)
I noticed that while kafka service on blade1 is alive, I can put
down/up blade 2 and 3 in any combination and consumer will always be
able to consume messages.
If kafka service on blade 1 is down, than even if kafka services on
blade 2 and blade 3 are up and running, consumer cannot consume
messages.
After bringing kafka service up on blade 1, all messages that
producer has sent while kafka service on blade 1 was down are
replayed and than the following is showed in consumer terminal:
[2018-01-30 14:44:30,817] ERROR [Consumer clientId=consumer-1,
groupId=zoran_1] Offset commit failed on partition repl3part5-4 at
offset 20: This is not the correct coordinator.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:44:30,817] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Asynchronous auto-commit of offsets
{repl3part5-4=OffsetAndMetadata{offset=20, metadata=''},
repl3part5-3=OffsetAndMetadata{offset=22, metadata=''},
repl3part5-2=OffsetAndMetadata{offset=20, metadata=''},
repl3part5-1=OffsetAndMetadata{offset=22, metadata=''},
repl3part5-0=OffsetAndMetadata{offset=22, metadata=''}} failed:
Offset commit failed with a retriable exception. You should retry
committing offsets. The underlying error was: This is not the correct
coordinator.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:44:31,202] ERROR [Consumer clientId=consumer-1,
groupId=zoran_1] Offset commit failed on partition repl3part5-4 at
offset 22: This is not the correct coordinator.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:44:31,202] WARN [Consumer clientId=consumer-1,
groupId=zoran_1] Asynchronous auto-commit of offsets
{repl3part5-4=OffsetAndMetadata{offset=22, metadata=''},
repl3part5-3=OffsetAndMetadata{offset=24, metadata=''},
repl3part5-2=OffsetAndMetadata{offset=22, metadata=''},
repl3part5-1=OffsetAndMetadata{offset=24, metadata=''},
repl3part5-0=OffsetAndMetadata{offset=24, metadata=''}} failed:
Offset commit failed with a retriable exception. You should retry
committing offsets. The underlying error was: This is not the correct
coordinator.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
From now on everything works with no problems or warnings and the
system is fully functional.
Can someone explain to me why kafka server on blade 1 is so
important, and what are my options in order to be able to stop any of
the two servers (including kafka server on blade 1) and be able to
consume messages with no delay?
This thing drives me crazy. :)
Can you please help?
Regards.