There is ObserverMaster feature contributed back in ZOOKEEPER-3140 <https://issues.apache.org/jira/browse/ZOOKEEPER-3140> could be used to scale the number of observers and traffics a single ensemble can support.
It allows followers to serve observers as well, which relieves the fanout load on leader. But as Michael mentioned, there is server id limit given lowest 8 bits are used guarantee the session id uniqueness, so max servers are limited to 255. Internally, we use local sessions only on observers, so we use dynamic observer id (-1) for all observers, which is not part of the dynamic config. It helps us scale more observers, but this may not be a good solution for community since there is limitation here. Thanks, Fangmin On Fri, Apr 10, 2020 at 1:43 PM Michael Han <[email protected]> wrote: > If you have 100s of 1000s of ZK clients then having observer in each pod > will presumably reduce traffic as most of the fan out traffic, from server > to clients is localized to each pod. > > Observer is not part of quorum, and a quorum can't scale pass a few servers > (typical just 5 or 7). Observers can scale from 100s to 1000s (depends on > whether only leader hosts them, or follower can host them) but actual > number depends on workload and hardware capacity. Although it's recommended > myid being [0,255] but I vaguely remember we can pass this limit, just need > to make sure the lower 8 bits of the myid always to be unique as that's > used to construct session id. > > On Fri, Apr 10, 2020 at 12:09 PM James Arbo <[email protected]> wrote: > > > That was my instinct as well. I *think* any ZK writes would require a > > quorum before the transaction is committed. Getting a quorum over a > several > > hundred/thousand node ensemble seems like a lot of traffic. > > Plus, from what I've read - though not 100% certain, it seems the number > ZK > > nodes is capped at 255. > > > > On Fri, Apr 10, 2020 at 2:52 PM Bram Van Dam <[email protected]> > wrote: > > > > > On 10/04/2020 20:13, James Arbo wrote: > > > > When we proposed this, there was great concern from the software > > > architects > > > > that network traffic between the kubernetes pods and the ZK ensemble > > must > > > > be minimized. > > > > > > > This means that, at a minimum, we would be running at least 1 ZK > > ensemble > > > > member on every node of our K8S cluster. > > > > > > Sounds to me like this would *increase* network traffic, not decrease > > > it. Instead of having communication between the pod and ZK whenever > > > needed (which likely isn't very frequently?), you'll now be having > > > constant communication between the ensemble and your hundreds of > > > observers in order to keep the observers in sync. > > > > > > Maybe I'm missing something? > > > > > > - Bram > > > > > > > > >
