I was going to open an issue in Jira for this, but I figured I should discuss it here before I do that, to make sure that's a reasonable course of action.
I was thinking about a problem that we encounter with SolrCloud, where our overseer queue (stored in zookeeper) will greatly exceed the default jute.maxbuffer size. I encountered this personally while researching something for a Solr issue: https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14347834 It seems silly that a znode could get to 14 times the allowed size without notifying the code *inserting* the data. The structure of our queue is such that entries in the queue are children of the znode. This means that the data stored directly in the znode is not the problem (which is pretty much nonexistent in this case), it's the number of children. It seems like it would be a good idea to reject the creation of new children if that would cause the znode size to exceed jute.maxbuffer. This moves the required error handling to the code that *updates* ZK, rather than the code that is watching and/or reading ZK, which seems more appropriate to me. Alternately, the mechanisms involved could be changed so that the client can handle accessing a znode with millions of children, without complaining about the packet length. Thoughts? Thanks, Shawn