Hello,
After almost a year of running kafka on a single node, we are in the process of
migrating to a 3 node cluster. To test the process we followed the following
process:
* Stop our current kafka instance, copy the entire data directory and
zookeeper data directories to one of the new nodes
* Configured zookeeper as a three node cluster.
* Started the node where we placed the copied over data first, and then
the others.
* Use the zookeeper shell to see if we can see the kafka topics listed
on each of the three nodes.
* The primary (assuming the first one with all data is primary as it was
started first) and the second node had the topics data, but the third one did
not.
* Waited some time, but no data on third node. The zookeeper shell
always exited with an Exception if we tried to execute any commad:
org.apache.zookeeper.KeeperException$ConnectionLossException:
* Shut down the second zookeeper instance, and almost instantaneously
the third node picked up the data. Restarted the second node, and all three
nodes seemed to be operating fine.
* Used the zookeeper shell to create a test node on the first/primary
node, and was able to see it on the other two nodes.
* Configured kafka as a three node cluster.
* Started the node where we placed the copied over data first, and then
the others.
* Created a test kafka topic with replication factor 3, and saw it
appear on all three zookeeper topics list.
* Used kafka-reassign-partitions.sh to modify the replication factor
from 1 to 3 for one of our topics.
* Almost immediately saw the new topic directory being created under the
logs directory on second node. Nothing on the third node.
* kafka-reassign-partitions.sh with verify option still lists the
partition reassignment as in progress. Left it like that over-night, and still
the same. The topic has very little data, but still nothing on third node.
* Shut down kafka on second node, to see if the earlier behaviour with
zookeeper is replicated, but no such luck.
* Shut down both kafka and zookeeper on second node to see if any data
shows up on third node, again no go.
Any ideas as to what may be going on? Should we try by copying zookeeper/kafka
data directory to all three nodes and then starting them up?
Thanks
Rakesh