Hi Jon, Can you share the exceptions related to zookeeper? Are you doing some heavy network activity during prepare? On my topology I see one connection established to zookeeper from every worker process and the supervisor. And as far as I know there are a some writes every few seconds to zookeeper from each worker process.
Regards, Srinath. On Sat, May 10, 2014 at 5:03 AM, Jon Logan <jmlo...@buffalo.edu> wrote: > Hi, > > I am wondering if there is a concise list of all of the uses of Zookeeper > throughout Storm. I had thought that the only use was for supervisor > discovery, node assignments, and the like, but I am running into issues > with Zookeeper dying when I launch a new topology that is slow to start up, > and generates moderate load during startup. > > This is baffling to me, because I had thought that workers did not > communicate with Zookeeper at all, and that was all contained to the > appropriate supervisor. > > > Behavior is essentially after submission and launching of the job, during > the prepare method, all zookeeper connections begin to fail/reconnect, > causing thrashing amongst Storm (reassigning what it thought were dead > nodes), even among separate topologies. Eventually, it works itself out, > and deploys successfully. It does seem to actually kill zookeeper in the > process though. > > > > Any thoughts? Everything works normally with any other topology that I > have tried to deploy to the same system. >