Depending on finalize() is really not want you want todo, so I think the API change would be preferable.
Bye, Norman 2010/10/26 Bill Au <bill.w...@gmail.com>: > I would be happy to submit a patch but is it a bit more trickier than simply > calling JMXConenctor.close(). NodeProbe's use of the JMXConnector is not > exposed in its API The JMX connection is created in NodeProbe's > constructor. Without changing the API, the only place to call close() would > be in NodeProbe's finalize(). I am not sure if that's the best thing to > do. I think this warrant a discussion on the developer mailing list. I > will start an new mail thread there. > > Anyways, I am still trying to understand why the JMX server connection > timeout threads pile up rather quickly when I restart a node in a live > cluster. I took a look at the Cassandra source and see that NodeProbe is > the only place that creates and uses a JMX connection. And NobeProbe is > only used by the tools. So it seems that there is another JMX thread leak > in Cassandra. > > Bill > > On Fri, Oct 22, 2010 at 4:33 PM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> Is the fix as simple as calling close() then? Can you submit a patch for >> that? >> >> On Fri, Oct 22, 2010 at 2:49 PM, Bill Au <bill.w...@gmail.com> wrote: >> > Not with the nodeprobe or nodetool command because the JVM these two >> > commands spawn has a very short life span. >> > >> > I am using a webapp to monitor my cassandra cluster. It pretty much >> > uses >> > the same code as NodeCmd class. For each incoming request, it creates >> > an >> > NodeProbe object and use it to get get various status of the cluster. I >> > can >> > reproduce the Cassandra JVM crash by issuing requests to this webapp in >> > a >> > bash while loop. I took a deeper look and here is what I discovered: >> > >> > In the webapp when NodeProbe creates a JMXConnector to connect to the >> > Cassandra JMX port, a thread >> > (com.sun.jmx.remote.internal.ClientCommunicatorAdmin$Checker) is created >> > and >> > run in the webapp's JVM. Meanwhile in the Cassamdra JVM there is a >> > com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout thread to >> > timeout remote JMX connection. However, since NodeProbe does not call >> > JMXConnector.close(), the JMX client checker threads remains in the >> > webapp's >> > JVM even after the NobeProbe object has been garbage collected. So this >> > JMX >> > connection is still considered open and that keeps the JMX timeout >> > thread >> > running inside the Cassandra JVM. The number of JMX client checker >> > threads >> > in my webapp's JVM matches up with the number of JMX server timeout >> > threads >> > in my Cassandra's JVM. If I stop my webapp's JVM, >> > all the JMX server timeout threads in my Cassandra's JVM all disappear >> > after >> > 2 minutes, the default timeout for a JMX connection. This is why the >> > problem cannot be reproduced by nodeprobe or nodetool. Even though >> > JMXConnector.close() is not called, the JVM exits shortly so the JMX >> > client >> > checker thread do not stay around. So their corresponding JMX server >> > timeout thread goes away after two minutes. This is not the case with >> > my >> > weabpp since its JVM keeps running, so all the JMX client checker >> > threads >> > keep running as well. The threads keep piling up until it crashes >> > Casssandra's JVM. >> > >> > In my case I think I can change my webapp to use a static NodeProbe >> > instead >> > of creating a new one for every request. That should get around the >> > leak. >> > >> > However, I have seen the leak occurs in another situation. On more than >> > one >> > occasions when I restarted one node in a live multi-node clusters, I see >> > that the JMX server timeout threads quickly piled up (number in the >> > thousands) in Cassandra's JVM. It only happened on a live cluster that >> > is >> > servicing read and write requests. I am guessing the hinted hand off >> > might >> > have something to do with it. I am still trying to understand what is >> > happening there. >> > >> > Bill >> > >> > >> > On Wed, Oct 20, 2010 at 5:16 PM, Jonathan Ellis <jbel...@gmail.com> >> > wrote: >> >> >> >> can you reproduce this by, say, running nodeprobe ring in a bash while >> >> loop? >> >> >> >> On Wed, Oct 20, 2010 at 3:09 PM, Bill Au <bill.w...@gmail.com> wrote: >> >> > One of my Cassandra server crashed with the following: >> >> > >> >> > ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419 >> >> > CassandraDaemon.java (line 82) Uncaught exception in thread >> >> > Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main] >> >> > java.lang.OutOfMemoryError: unable to create new native thread >> >> > at java.lang.Thread.start0(Native Method) >> >> > at java.lang.Thread.start(Thread.java:597) >> >> > at >> >> > >> >> > >> >> > org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:533) >> >> > >> >> > >> >> > I took threads dump in the JVM on all the other Cassandra severs in >> >> > my >> >> > cluster. They all have thousand of threads looking like this: >> >> > >> >> > "JMX server connection timeout 183373" daemon prio=10 >> >> > tid=0x00002aad230db800 >> >> > nid=0x5cf6 in Object.wait() [0x00002aad7a316000] >> >> > java.lang.Thread.State: TIMED_WAITING (on object monitor) >> >> > at java.lang.Object.wait(Native Method) >> >> > at >> >> > >> >> > >> >> > com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150) >> >> > - locked <0x00002aab056ccee0> (a [I) >> >> > at java.lang.Thread.run(Thread.java:619) >> >> > >> >> > It seems to me that there is a JMX threads leak in Cassandra. >> >> > NodeProbe >> >> > creates a JMXConnector but never calls its close() method. I tried >> >> > setting >> >> > jmx.remote.x.server.connection.timeout to 0 hoping that would disable >> >> > the >> >> > JMX server connection timeout threads. But that did not make any >> >> > difference. >> >> > >> >> > Has anyone else seen this? >> >> > >> >> > Bill >> >> > >> >> >> >> >> >> >> >> -- >> >> Jonathan Ellis >> >> Project Chair, Apache Cassandra >> >> co-founder of Riptano, the source for professional Cassandra support >> >> http://riptano.com >> > >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > >