Hi Karl, Can i see zookeeper connection reset messages due to system running on top of memory limits as i have 12G of RAM and can see its using 11.5G while job is running?
Is there any way i should ascertain memory to zookeeper nodes & if so, is there any yardstick? Regards. On Mon, Sep 15, 2014 at 7:16 PM, Karl Wright <daddy...@gmail.com> wrote: > Hi Lalit, > > Looks like this is the result of a tomcat shutdown, and is a probable race > condition bug in Zookeeper: > > > http://mail-archives.apache.org/mod_mbox/tomcat-users/201306.mbox/%3cbay174-w32b2284bedae503e9d22d3a8...@phx.gbl%3E > > Karl > > > On Mon, Sep 15, 2014 at 9:41 AM, lalit jangra <lalit.j.jan...@gmail.com> > wrote: > >> Hi Karl, >> >> Along with this, i could see below errors in tomcat catalina.out. >> >> Sep 15, 2014 1:06:14 PM org.apache.catalina.loader.WebappClassLoader >> loadClass >> >> INFO: Illegal access: this web application instance has been stopped >> already. Could not load org.apache.zookeeper.server.ZooTrace. The >> eventual following stack trace is caused by an error thrown for debugging >> purposes as well as to attempt to terminate the thread which caused the >> illegal access, and has no functional impact. >> >> java.lang.IllegalStateException >> >> at >> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1612) >> >> at >> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) >> >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115) >> >> >> >> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2183)] ERROR >> org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread( >> iwdc2preecma04.iwater.ie:2183) >> >> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace >> >> at >> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1115) >> >> Caused by: java.lang.ClassNotFoundException: >> org.apache.zookeeper.server.ZooTrace >> >> at >> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720) >> >> at >> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) >> >> ... 1 more >> >> [http-bio-80-exec-1-SendThread(iwdc2preecma04.iwater.ie:2182)] ERROR >> org.apache.zookeeper.ClientCnxn - from http-bio-80-exec-1-SendThread( >> iwdc2preecma04.iwater.ie:2182) >> >> Sep 15, 2014 1:06:14 PM org.apache.coyote.AbstractProtocol destroy >> >> INFO: Destroying ProtocolHandler ["http-bio-80"] >> >> java.lang.NoClassDefFoundError: org/apache/zookeeper/server/ZooTrace >> >> Regards. >> >> On Mon, Sep 15, 2014 at 7:05 PM, lalit jangra <lalit.j.jan...@gmail.com> >> wrote: >> >>> Thanks Karl, >>> >>> While crawling is very slow, its taking long so a bit of frustrating and >>> as i have multiple high volume jobs that too in parallel, it does not seem >>> to be a good thing. >>> >>> I have also raised it on Zookeeper forums @ >>> http://zookeeper-user.578899.n2.nabble.com/Getting-errors-in-zookeeper-logs-td7580260.html >>> but waiting for reply. >>> >>> Regards. >>> >>> On Mon, Sep 15, 2014 at 6:51 PM, Karl Wright <daddy...@gmail.com> wrote: >>> >>>> HI Lalit, >>>> >>>> When MCF cannot reach zookeeper, MCF crawls will pause until the >>>> zookeeper connections are reestablished. Then the crawls should resume. >>>> This should *not* abort your crawls, but it will make them very slow. >>>> >>>> I am not a zookeeper expert, so I would post on their message boards to >>>> see if there is any adjustment that can be made to zookeeper parameters >>>> that would improve zookeeper behavior when you have a flaky network. >>>> However, since the obvious solution is to fix your network, they may not >>>> have a code solution for you. >>>> >>>> Thanks, >>>> Karl >>>> >>>> >>>> On Mon, Sep 15, 2014 at 9:15 AM, lalit jangra <lalit.j.jan...@gmail.com >>>> > wrote: >>>> >>>>> Thanks Karl, >>>>> >>>>> Ideally resetting connections should be taken care by zookeeper itself >>>>> as i could see re-establishment of connections later in logs. >>>>> >>>>> Can you suggest any way to overcome this in addition to network issue >>>>> resolution as my crawls are not working again and again? Anything in >>>>> config >>>>> files etc.? >>>>> >>>>> Regards. >>>>> >>>>> >>>>> On Mon, Sep 15, 2014 at 6:39 PM, Karl Wright <daddy...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Lalit, >>>>>> >>>>>> Zookeeper will keep working, but you should understand that you are >>>>>> dropping connections to your zookeeper members for unknown reasons, which >>>>>> is causing your crawl to stall when it happens. This argues that perhaps >>>>>> you have some network flakiness of some kind. >>>>>> >>>>>> Karl >>>>>> >>>>>> >>>>>> On Mon, Sep 15, 2014 at 8:59 AM, lalit jangra < >>>>>> lalit.j.jan...@gmail.com> wrote: >>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am running cluster of two Apache ManifoldCF nodes on two separate >>>>>>> machines each of which having 3 zookeeper instances (total 6 instances >>>>>>> in >>>>>>> cluster). When i am running up manifoldCF agents, i see below warning >>>>>>> during startup. >>>>>>> >>>>>>> [http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO >>>>>>> org.apache.zookeeper.ClientCnxn - Unable to read additional data from >>>>>>> server sessionid 0x0, likely server has closed socket, closing socket >>>>>>> connection and attempting reconnect >>>>>>> >>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO >>>>>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >>>>>>> authenticate using SASL (unknown error) >>>>>>> >>>>>>> >>>>>>> Also i could see below error in logs in while agents are running. >>>>>>> >>>>>>> [http-bio-80-exec-2] INFO org.apache.zookeeper.ZooKeeper - >>>>>>> Initiating client connection, >>>>>>> connectString=iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183 >>>>>>> sessionTimeout=4000 >>>>>>> watcher=org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection$ZooKeeperWatcher@51d83fd7 >>>>>>> >>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO >>>>>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >>>>>>> authenticate using SASL (unknown error) >>>>>>> >>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO >>>>>>> org.apache.zookeeper.ClientCnxn - Socket connection established to >>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, initiating session >>>>>>> >>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] WARN >>>>>>> org.apache.zookeeper.ClientCnxn - Session 0x0 for server >>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2182, unexpected error, >>>>>>> closing socket connection and attempting reconnect >>>>>>> >>>>>>> java.io.IOException: Connection reset by peer >>>>>>> >>>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >>>>>>> >>>>>>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>>>>>> >>>>>>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) >>>>>>> >>>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:193) >>>>>>> >>>>>>> at >>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) >>>>>>> >>>>>>> at >>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) >>>>>>> >>>>>>> at >>>>>>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >>>>>>> >>>>>>> at >>>>>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) >>>>>>> >>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO >>>>>>> org.apache.zookeeper.ClientCnxn - Opening socket connection to server >>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183. Will not attempt to >>>>>>> authenticate using SASL (unknown error) >>>>>>> >>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO >>>>>>> org.apache.zookeeper.ClientCnxn - Socket connection established to >>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, initiating session >>>>>>> >>>>>>> [http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2183)] INFO >>>>>>> org.apache.zookeeper.ClientCnxn - Session establishment complete on >>>>>>> server >>>>>>> iwdc2preecma04.iwater.ie/10.231.72.25:2183, sessionid = >>>>>>> 0x6487851bd330078, negotiated timeout = 4000 >>>>>>> >>>>>>> >>>>>>> Below are configurations for 1. zookeeper nodes & 2. MCF nodes for >>>>>>> zookeeper. >>>>>>> >>>>>>> >>>>>>> *zoo.cfg : Same for all six zookeeper nodes.* >>>>>>> >>>>>>> >>>>>>> # The number of milliseconds of each tick >>>>>>> >>>>>>> tickTime=2000 >>>>>>> >>>>>>> dataDir=/app/IW/zookeeper/data/data.1 >>>>>>> >>>>>>> dataLogDir=/app/IW/zookeeper/logs/log.1 >>>>>>> >>>>>>> clientPort=2181 >>>>>>> >>>>>>> server.1=iwdc1preecma03:2888:3888 >>>>>>> >>>>>>> server.2=iwdc1preecma03:2889:3889 >>>>>>> >>>>>>> server.3=iwdc1preecma03:2890:3890 >>>>>>> >>>>>>> server.4=iwdc2preecma04:2891:3891 >>>>>>> >>>>>>> server.5=iwdc2preecma04:2892:3892 >>>>>>> >>>>>>> server.6=iwdc2preecma04:2893:3893 >>>>>>> >>>>>>> # The number of ticks that the initial >>>>>>> >>>>>>> # synchronization phase can take >>>>>>> >>>>>>> initLimit=10 >>>>>>> >>>>>>> # The number of ticks that can pass between >>>>>>> >>>>>>> # sending a request and getting an acknowledgement >>>>>>> >>>>>>> syncLimit=5 >>>>>>> >>>>>>> # the directory where the snapshot is stored. >>>>>>> >>>>>>> # do not use /tmp for storage, /tmp here is just >>>>>>> >>>>>>> # example sakes. >>>>>>> >>>>>>> #dataDir=/tmp/zookeeper >>>>>>> >>>>>>> # the port at which the clients will connect >>>>>>> >>>>>>> #clientPort=2181 >>>>>>> >>>>>>> # the maximum number of client connections. >>>>>>> >>>>>>> # increase this if you need to handle more clients >>>>>>> >>>>>>> #maxClientCnxns=60 >>>>>>> >>>>>>> # >>>>>>> >>>>>>> # Be sure to read the maintenance section of the >>>>>>> >>>>>>> # administrator guide before turning on autopurge. >>>>>>> >>>>>>> # >>>>>>> >>>>>>> # >>>>>>> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance >>>>>>> >>>>>>> # >>>>>>> >>>>>>> # The number of snapshots to retain in dataDir >>>>>>> >>>>>>> autopurge.snapRetainCount=3 >>>>>>> >>>>>>> # Purge task interval in hours >>>>>>> >>>>>>> # Set to "0" to disable auto purge feature >>>>>>> >>>>>>> autopurge.purgeInterval=1 >>>>>>> >>>>>>> >>>>>>> >>>>>>> *ManifoldCF configurations : same for both ManifoldCF nodes.* >>>>>>> >>>>>>> >>>>>>> <property name="org.apache.manifoldcf.lockmanagerclass" >>>>>>> value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/> >>>>>>> >>>>>>> <property name="org.apache.manifoldcf.zookeeper.connectstring" >>>>>>> value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/> >>>>>>> >>>>>>> <property name="org.apache.manifoldcf.zookeeper.sessiontimeout" >>>>>>> value="4000"/> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *I want to know if due to above warnings/errors, will zookeeper stop >>>>>>> working or will zookeeper will work and these are non-failing messages, >>>>>>> because ManifoldCF jobs are stuck while i can see these errors.* >>>>>>> >>>>>>> Please suggest. >>>>>>> >>>>>>> Regards, >>>>>>> Lalit. >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Lalit. >>>>> >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Lalit. >>> >> >> >> >> -- >> Regards, >> Lalit. >> > > -- Regards, Lalit.