Hi Ted, I did dump and stat after I shutdown ZooKeeper (ZK) clients. The output is as below (it seems no session with Ephemerals any more), but via the YourKIT java profiler (YJP) memory snapshot I still saw the memory (increased/allocated due to the ZK calls) was held up (not reclaimed). Let me know if you are interested in looking at the YJP snapshot. ****dump**** -logbash-3.2$ telnet localhost 2181 Trying 127.0.0.1... Connected to localhost.localdomain (127.0.0.1). Escape character is '^]'. dump SessionTracker dump: Session Sets (0): ephemeral nodes dump: Sessions with Ephemerals (0): Connection closed by foreign host.
****stat**** -logbash-3.2$ telnet localhost 2181 Trying 127.0.0.1... Connected to localhost.localdomain (127.0.0.1). Escape character is '^]'. stat Zookeeper version: 3.3.2-1031432, built on 11/05/2010 05:32 GMT Clients: /127.0.0.1:12532[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/70 Received: 62861 Sent: 62858 Outstanding: 0 Zxid: 0xeea798 Mode: standalone Node count: 246 Connection closed by foreign host. I was thinking about the "snapCount" config param. The default value is 10000. Maybe those data are just held in memory since the snapCount hasn't been reached yet? Any need to tune this config param? Thanks, Victor --- On Wed, 6/1/11, Victor <[email protected]> wrote: From: Victor <[email protected]> Subject: Re: Memory leak in zookeeper 3.3.2 and 3.3.3? To: [email protected] Cc: [email protected] Date: Wednesday, June 1, 2011, 4:26 PM Hi Ted, Many thanks for your response. I was investigating a production issue and therefore not responding your earlier. Sorry about that. I tried dump and stat commands via telnet and below is the output: *************dump*************: -logbash-3.2$ telnet localhost 2181 Trying 127.0.0.1... Connected to localhost.localdomain (127.0.0.1). Escape character is '^]'. dump SessionTracker dump: Session Sets (3): 0 expire at Wed Jun 01 20:11:44 GMT+00:00 2011: 0 expire at Wed Jun 01 20:11:46 GMT+00:00 2011: 2 expire at Wed Jun 01 20:11:48 GMT+00:00 2011: 0x13047d8de8a0000 0x13047d8de8a0001 ephemeral nodes dump: Sessions with Ephemerals (1): 0x13047d8de8a0001: Connection closed by foreign host. *************stat************** -logbash-3.2$ telnet localhost 2181 Trying 127.0.0.1... Connected to localhost.localdomain (127.0.0.1). Escape character is '^]'. stat Zookeeper version: 3.3.2-1031432, built on 11/05/2010 05:32 GMT Clients: /10.151.78.31:47473[1](queued=0,recved=29879,sent=29879) /127.0.0.1:41928[0](queued=0,recved=1,sent=0) /10.151.74.36:18484[1](queued=0,recved=30067,sent=30067) Latency min/avg/max: 0/0/61 Received: 59951 Sent: 59950 Outstanding: 0 Zxid: 0xeea794 Mode: standalone Node count: 246 Connection closed by foreign host. I forgot to mention yesterday that we also monitor ZK once every minute by making below call: ZkSessionManager.instance().getZooKeeper().exists("/DependencyCheck", false); This is just a read so I don't think it really has memory impact. Please let me know otherwise though. I will try shutting down ZK client and watch the memory via YourKit profiler. Will post afterwards. Thanks, Victor --- On Wed, 6/1/11, Ted Dunning <[email protected]> wrote: From: Ted Dunning <[email protected]> Subject: Re: Memory leak in zookeeper 3.3.2 and 3.3.3? To: [email protected] Date: Wednesday, June 1, 2011, 2:07 AM What happens if you stop the client (either an orderly shutdown, closing ZK or a hard stop with enough time for ephemerals to go away)? Does GC then reclaim the memory? What does the dump command show in terms of how many connections and ephemerals there are? What does ls in the command line client show for how many znodes there are? Usually when I see this sort of behavior means that I have been accumulating data in ZK in a way that I didn't intend. I have had ZK up for months to years without seeing this behavior. On Tue, May 31, 2011 at 11:00 PM, Victor <[email protected]> wrote: > Hi, > I apologize for the broadcasting. I searched the archive before I send > this email to the mailing list. > > We are using Cages library + Zookeeper 3.3.3 to synchronize creation of > forum user name (which needs to be unique). > This is the only use case that we write in ZooKeeper. > > We used Cages ZkWriteLock to obtain write lock and below is the code > (which is very straightforward): > org.wyki.zookeeper.cages.ZkWriteLock lock = new > org.wyki.zookeeper.cages.ZkWriteLock("/User/ForumUsername/" + > forumUsername); > try { > boolean lockAcquired = lock.acquire(5, TimeUnit.SECONDS); > ...... > } finally { > lock.release(); > } > > In our load test (1 single Zookeeper server), even with 3GB max heap > size, ZooKeeper runs out of memory after ~3 hours. > So I decided to profile Zookeeper with YourKit Jave profile (9.5) against > one Zookeeper server. After every 6 above calls to ZooKeeper (I randomly > picked 6), I saw the memory usage increased ~200K (in Yourkit retained size > increased ~220K and shallow size increased > Even after 1 hour or more, even if I forced garbage collection (done in > YourKit), the memory increased due to the 6 calls didn't get released. I ran > the 6 calls for a few times and observed the same. So I suspect there is a > memory leak (and that is why we got OutOfMemory in load test) > Looking further at the hotspot using Yourkit, the memory increase are in > the form of String, char[], Class, HashMap$Entry (java classes or types) and > maily from below method invocation: > > org.apache.zookeeper.server.FinalRequestProcessor.processRequest(Request) > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(Request) > org.apache.zookeeper.server.NIOServerCnxn$Factory.run() > > But from the code, leakage is not obvious. > > > We used below JVM (for ZooKeeper) startup options/flags: > -server -Xms1536m -Xmx3072m -Xloggc:/var/zookeeper/logs/gc.log > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime > -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintGCDetails > -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC > -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false > -Dcom.sun.management.jmxremote.port=54321 -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.local.only=false > > I looked at ZooKeeper documentation thoroughly (especially the > administration guide), but couldn't find a way to tune this (to avoid above > suspected memory leak) > > Is there a memory leak in Zookeeper 3.3.3 (or 3.3.2)? If there is, How > could we configure ZooKeeper to avoid/reduce that leak? What is the > stable version to use? did we misconfigure anything? > > Please advise or help. Thanks a lot! > > Victor
