kaushik srinivas created KAFKA-16370: ----------------------------------------
Summary: offline rollback procedure from kraft mode to zookeeper mode. Key: KAFKA-16370 URL: https://issues.apache.org/jira/browse/KAFKA-16370 Project: Kafka Issue Type: Improvement Reporter: kaushik srinivas >From the KIP, >[https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,] h2. Finalizing the Migration Once the cluster has been fully upgraded to KRaft mode, the controller will still be running in migration mode and making dual writes to KRaft and ZK. Since the data in ZK is still consistent with that of the KRaft metadata log, it is still possible to revert back to ZK. *_The time that the cluster is running all KRaft brokers/controllers, but still running in migration mode, is effectively unbounded._* Once the operator has decided to commit to KRaft mode, the final step is to restart the controller quorum and take it out of migration mode by setting _zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The active controller will only finalize the migration once it detects that all members of the quorum have signaled that they are finalizing the migration (again, using the tagged field in ApiVersionsResponse). Once the controller leaves migration mode, it will write a ZkMigrationStateRecord to the log and no longer perform writes to ZK. It will also disable its special handling of ZK RPCs. *At this point, the cluster is fully migrated and is running in KRaft mode. A rollback to ZK is still possible after finalizing the migration, but it must be done offline and it will cause metadata loss (which can also cause partition data loss).* Trying out the same in a kafka cluster which is migrated from zookeeper into kraft mode. We observe the rollback is possible by deleting the "/controller" node in the zookeeper before the rollback from kraft mode to zookeeper is done. The above snippet indicates that the rollback from kraft to zk after migration is finalized is still possible in offline method. Is there any already known steps to be done as part of this offline method of rollback ? >From our experience, we currently know of the step "deletion of /controller >node in zookeeper to force zookeper based brokers to be elected as new >controller after the rollback is done". Are there any additional steps/actions >apart from this ? -- This message was sent by Atlassian Jira (v8.20.10#820010)