Re: Zookeeper connection errors in Helix Controller

2019-06-01 Thread Lei Xia
Before our new release is out, if you see that is a problem in your prod deployment, one thing you may try is to add a newer zookeeper version as an explicit dependency in your project, then during the build time, maven (or other build tool) will pick new version instead the one specified in hel

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Hi Lee, Understood and thanks for the heads up. We are currently in middle of production deployment with 0.8.2 and most of the users are already notified with the schedule. Basically we are a happy with the stability and functional correctness of 0.8.2 except for above mentioned case where we pus

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread Hunter Lee
Hey Dimuthu - We are actually in the process of preparing a new release, and this will come with the previously mentioned bug fixes in Task Framework. It also contains various ZK-related fixes - I don't know what your deployment schedule is but it might be worth the wait of another week or so. Hu

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Now I'm seeing following error in controller log. Restarting the controller fixed the issue. We are time to time seeing this in controller with zk connection issues. Is this also something to do with zk client version? 2019-05-31 13:21:46,669 [Thread-0-SendThread(localhost:2181)] WARN o.apache.zo

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Hi Lei, We use 0.8.2. We initially had 0.8.4 but it contains an issue with task retry logic so we downgraded to 0.8.2. We are planning to go into production with 0.8.2 by next week so can you please advice a better way to solve this without upgrading to 0.8.4. Thanks Dimuthu On Fri, May 31, 2019

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread Lei Xia
Which Helix version do you use? This may caused by this Zookeeper bug ( https://issues.apache.org/jira/browse/ZOOKEEPER-706). We have upgraded ZkClient in later Helix versions. Lei On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha wrote: > Hi Folks, > > I'm getting following error in controlle

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Hi Kishore, Adding -Djute.maxbuffer=49107800 fixed the issue but now I can see a whole lot of logs printing with following line and participant is executing a bulk of Tasks once in a while with around 5 minute delay in between. 2019-05-31 12:45:58,804 [GenericHelixController-event_process] WARN

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Hi Kishore, Please find below log and I think the issue is "Packet len4194362 is out of range!". Currently we have around 1 unprocessed workflows. Will that be the reason for that? 2019-05-31 12:17:15,221 [Thread-0-EventThread] INFO o.a.h.m.zk.zookeeper.ZkClient - zookeeper state changed (S

Re: Zookeeper connection errors in Helix Controller

2019-05-31 Thread kishore g
can you grep for zookeeper state in controller log. On Fri, May 31, 2019 at 7:52 AM DImuthu Upeksha wrote: > Hi Folks, > > I'm getting following error in controller log and seems like controller is > not moving froward after that point > > 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixCo

Zookeeper connection errors in Helix Controller

2019-05-31 Thread DImuthu Upeksha
Hi Folks, I'm getting following error in controller log and seems like controller is not moving froward after that point 2019-05-31 10:47:37,084 [main] INFO o.a.a.h.i.c.HelixController - Starting helix controller 2019-05-31 10:47:37,089 [main] INFO o.a.a.c.u.ApplicationSettings - Settings loa