I didn't get a response to this, but I've been investigating more and can
now frame the problem slightly differently (hopefully, more accurately).
According to this document
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+
data+structures+in+Zookeeper
Which defines broker data structures in zookeeper, the following is the
broker schema (from version 0.10 onwards - I am using version 0.11)
{ "fields":
[ {"name": "version", "type": "int", "doc": "version id"},
{"name": "host", "type": "string", "doc": "ip address or host name
of the broker"},
{"name": "port", "type": "int", "doc": "port of the broker"},
{"name": "jmx_port", "type": "int", "doc": "port for jmx"}
{"name": "endpoints", "type": "array", "items": "string", "doc":
"endpoints supported by the broker"}
{"name": "rack", "type": "string", "doc": "Rack of the broker.
Optional. This will be used in rack aware replication assignment for fault
tolerance."}
]
}
when I check my broker data in zookeeper (which has a non-null broker.rack
setting in the properties file), I have the following;
{"endpoints":["PLAINTEXT://x.x.x.x.abcd:9092"],"jmx_port":-1
,"host":"x.x.x.x.abc","timestamp":"1537527988341","port":9092,"version":2}
there is no 'rack'.
In the server.log file on my kafka broker I see;
----
[2018-09-21 13:00:40,227] INFO KafkaConfig values:
advertised.host.name = null
.
.
broker.id = 1234567
broker.rack = rack1
compression.type = producer
.
-----
so it looks fine from the broker side. However, when I restart kafka on
the host, it doesn't load any rack information into zookeeper.
Can someone please confirm to me, if I have rack awareness, should I
expect to see a value for 'rack' in zookeeper? If so, do I need to do
something else on the broker side to get it to include it as part of the
meta-data it writes (as far as I can see it writes the metadata each time
kafka is restarted).
thanks
Bryan
On 20/09/2018 11:31, Bryan Duggan wrote:
Hi,
I have a kafka cluster consisting of 3 brokers across 3 different AWS
availability zones. It hosts several topics, each of which has a
replication factor of 3. The cluster is currently not 'rack-aware'.
I am trying to do the following;
- add 3 additional brokers (one in each of the 3 AZs)
- make the cluster 'rack-aware'. (ie: create 3 racks on a per-AZ
basic, each containing 2 brokers)
- reassign the topics with the intention of having 1 replica in each
of the 3 racks.
To achieve this I've added 'broker.rack' to the properties file for each
broker. The rack name is the same as the AZ name each broker is in. I've
restarted kafka on all brokers (in case that's required for rack-awareness
to take effect).
Following restart I've attempted to reassign topics across all 6 brokers
by running the following;
- ./kafka-reassign-partitions.sh --zookeeper $ZK
--topics-to-move-json-file topics-to-move.json --broker-list '1,2,3,4,5,6'
(where topics-to-move.json is a simple json file containing the topics to
reassign)
The problem I am having is, after running 'kafka-reassign-partitions.sh'
with 6 brokers listed in the broker-list, it doesn't honour
rack-awareness, and instead assigns 2 partitions to brokers in a single
rack with a 3rd being assigned elsewhere.
The version of kafka I am using is 2.11-1.1.1.
Any documentation I've read suggests the above should have achieved what
I want. However, it is not working as expected.
Has anyone else make their kafka cluster 'rack-aware'? If so, did you
experience any issues doing so?
Or, can anyone tell me if there's some step I'm missing to make this work.
TIA
Bryan