There is ObserverMaster feature contributed back in ZOOKEEPER-3140
<https://issues.apache.org/jira/browse/ZOOKEEPER-3140> could be used to
scale the
number of observers and traffics a single ensemble can support.

It allows followers to serve observers as well, which relieves the fanout
load on leader.

But as Michael mentioned, there is server id limit given lowest 8 bits are
used guarantee the session id
uniqueness, so max servers are limited to 255.

Internally, we use local sessions only on observers, so we use dynamic
observer id (-1) for all observers,
which is not part of the dynamic config. It helps us scale more observers,
but this may not be a good
solution for community since there is limitation here.

Thanks,
Fangmin

On Fri, Apr 10, 2020 at 1:43 PM Michael Han <[email protected]> wrote:

> If you have 100s of 1000s of ZK clients then having observer in each pod
> will presumably reduce traffic as most of the fan out traffic, from server
> to clients is localized to each pod.
>
> Observer is not part of quorum, and a quorum can't scale pass a few servers
> (typical just 5 or 7). Observers can scale from 100s to 1000s (depends on
> whether only leader hosts them, or follower can host them) but actual
> number depends on workload and hardware capacity. Although it's recommended
> myid being [0,255] but I vaguely remember we can pass this limit, just need
> to make sure the lower 8 bits of the myid always to be unique as that's
> used to construct session id.
>
> On Fri, Apr 10, 2020 at 12:09 PM James Arbo <[email protected]> wrote:
>
> > That was my instinct as well. I *think* any ZK writes would require a
> > quorum before the transaction is committed. Getting a quorum over a
> several
> > hundred/thousand node ensemble seems like a lot of traffic.
> > Plus, from what I've read - though not 100% certain, it seems the number
> ZK
> > nodes is capped at 255.
> >
> > On Fri, Apr 10, 2020 at 2:52 PM Bram Van Dam <[email protected]>
> wrote:
> >
> > > On 10/04/2020 20:13, James Arbo wrote:
> > > > When we proposed this, there was great concern from the software
> > > architects
> > > > that network traffic between the kubernetes pods and the ZK ensemble
> > must
> > > > be minimized.
> > >
> > > > This means that, at a minimum, we would be running at least 1 ZK
> > ensemble
> > > > member on every node of our K8S cluster.
> > >
> > > Sounds to me like this would *increase* network traffic, not decrease
> > > it. Instead of having communication between the pod and ZK whenever
> > > needed (which likely isn't very frequently?), you'll now be having
> > > constant communication between the ensemble and your hundreds of
> > > observers in order to keep the observers in sync.
> > >
> > > Maybe I'm missing something?
> > >
> > >  - Bram
> > >
> > >
> >
>

Reply via email to