One other thing comes to mind. "service instances restart at the same time" - have you considered that the issue may be on the ZK server and not on the client? For example are you co-locating ZK servers with other services and/or sharing spindles? You might be running into resource contention on the server, for example slow fsyncs due to disk IO contention can be a real killer for latency. (that one is easy to check for in the server logs).
Patrick On Wed, Oct 24, 2018 at 7:40 AM Patrick Hunt <[email protected]> wrote: > Nothing says that you have to handle client side notifications one-by-one, > blocking the ZK client notification thread. You can do the aggregation > yourself if you like, and this can be done very quickly. You can have > separate threads to process the results (separate from the ZK client > notification thread). That aspect of the architecture is up to you. > > This is some time back, but at the time I was processing 50 thousand > watches in a similar situation - trigger/herd. > https://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview > > Regards, > > Patrick > > On Tue, Oct 23, 2018 at 3:25 PM Michael Han <[email protected]> wrote: > >> Hi Jun, >> >> >> will it only notify the client of the 100th event or all events from 2 >> - >> 100 will be notified? >> >> All events will be notified. Each watched event will be materialized as a >> server side response and on client side, each watched event will be >> processed individually. >> >> Depend on how your set watches and the scale of watch (e.g. a single >> client >> watch a million znode, or a million clients watch a single znode, or a mix >> of both), there could be thundering herd effects if multiple watches are >> triggered. >> >> >> On Tue, Oct 23, 2018 at 4:40 AM Jun Liu <[email protected]> wrote: >> >> > Hi, >> > >> > Our project, Dubbo[1], an RPC framework, has using Zookeeper as the >> > service discovery and config centre for a long time. Recently, we >> received >> > performance reports from users when a batch of service instances >> restart at >> > the same time. One thing I can figure out is that the change of one >> > instance status will trigger one change event to the Registry Centre - >> > Zookeeper, so, 100 instances will trigger 100 change events at the same >> > time. >> > >> > AFAIK, Zookeeper client uses a single thread to handle all these events >> > one by one, if that is the case, will zookeeper merge the following >> events >> > and only notify once? For example, if the Zookeeper client is handling >> the >> > 1st event, the rest 2-100 events are created, when the 1st event is >> > finished, will it only notify the client of the 100th event or all >> events >> > from 2 - 100 will be notified? >> > >> > 1. https://github.com/apache/incubator-dubbo >> > >> > Best regards, >> > Jun from Apache Dubbo (Incubating) >> > >> > >> >
