We are running about 40 HBase clusters, with over 5000 regionservers total. These are all running cdh5.16.2. We also have thousands of clients (from APIs to kafka workers to hadoop jobs, etc) hitting these various clusters, also running cdh5.16.2.
We are starting to plan an upgrade to hbase 2.x and hadoop 3.x. I've read through the docs on https://hbase.apache.org/book.html#_upgrade_paths, and am starting to plan our approach. More than a few seconds of downtime is not an option, but rolling upgrade also seems risky (if not impossible for our version). One thought I had is whether replication is compatible between these two versions. If so, we probably would consider swapping onto upgraded clusters using backup/restore + replication. If we were to go this route we'd probably want to consider bi-directional replication so that we can roll back to the old cluster if there's a regression. Does anyone have any experience with this approach? Is replication protocol compatible across the seversions? Any concerns, tips or other considerations to keep in mind? We do the backup/restore + replication approach pretty regularly to move tables between clusters. Thanks!
