Hi all, I have created a proposal, https://github.com/apache/druid/issues/9816, that regards adding new functionality to the Druid segment replication infrastructure. I wanted to share it in the dev list to try and get some more eyes on it and drive discussion. I won't repeat too much of what already is described in the proposal. But the general idea is to add a new logical grouping config to the Historical Servers that is specified at runtime. If the operator chooses to use this new functionality, the Coordinator will do its best to load replicants across 2+ of these new historical groups. The motivating factors being increased quality of life for cluster operators. Having best effort replication across groups will allow opportunities for increased data availability (perhaps replicating across physical racks in a datacenter to avoid unavailability due to switch failure) as well as improved cluster operations work (being able to restart a group of historicals knowing that the cluster has made best effort to not have all replicants for a segment live within that group).
My proposal is based off of POC code that I have been working on. That POC is linked in the proposal for people who want to look at the potential implementation. There was some discussion of folding this into druid tiering, but after some analysis I came away thinking this would not be a wise choice. I think the patterns and motivating factors behind tiers are too disconnected from those of my proposal. And trying to rig up tiering to meet all of the requirements that exist today plus the ones I propose, would result in a cumbersome and confusing product. I appreciate any and all feedback! Thanks, Lucas