One of my Cassandra server crashed with the following:

ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419
CassandraDaemon.java (line 82) Uncaught exception in thread
Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:597)
        at
org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:533)


I took threads dump in the JVM on all the other Cassandra severs in my
cluster.  They all have thousand of threads looking like this:

"JMX server connection timeout 183373" daemon prio=10 tid=0x00002aad230db800
nid=0x5cf6 in Object.wait() [0x00002aad7a316000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at
com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150)
        - locked <0x00002aab056ccee0> (a [I)
        at java.lang.Thread.run(Thread.java:619)

It seems to me that there is a JMX threads leak in Cassandra.  NodeProbe
creates a JMXConnector but never calls its close() method.  I tried setting
jmx.remote.x.server.connection.timeout to 0 hoping that would disable the
JMX server connection timeout threads.  But that did not make any
difference.

Has anyone else seen this?

Bill

Reply via email to