One of my Cassandra server crashed with the following: ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419 CassandraDaemon.java (line 82) Uncaught exception in thread Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main] java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:533)
I took threads dump in the JVM on all the other Cassandra severs in my cluster. They all have thousand of threads looking like this: "JMX server connection timeout 183373" daemon prio=10 tid=0x00002aad230db800 nid=0x5cf6 in Object.wait() [0x00002aad7a316000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150) - locked <0x00002aab056ccee0> (a [I) at java.lang.Thread.run(Thread.java:619) It seems to me that there is a JMX threads leak in Cassandra. NodeProbe creates a JMXConnector but never calls its close() method. I tried setting jmx.remote.x.server.connection.timeout to 0 hoping that would disable the JMX server connection timeout threads. But that did not make any difference. Has anyone else seen this? Bill