[ https://issues.apache.org/jira/browse/CASSANDRA-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157199#comment-13157199 ]
Jonathan Ellis edited comment on CASSANDRA-3530 at 11/25/11 3:43 PM: --------------------------------------------------------------------- Because in 0.8 there is only one thread touching any given Message in StorageProxy + MessagingService. Once OTC gets it, MS has already turned it into a byte[] (which OTC then copies to its Socket buffer). To avoid the unnecessary intermediate byte[], for 1.0 we switch to OTC getting the Message objects. So in the multi-DC case you can have a 2nd thread (the OTC one) sending a Message, while the SP thread updates its Header. was (Author: jbellis): Because in 0.8 there is only one thread touching any given Message in StorageProxy + MessagingService. Once OTC gets it, MS has already turned it into a byte[] (which it then copies to its Socket buffer). To avoid the unnecessary intermediate byte[], for 1.0 we switch to OTC getting the Message objects. So in the multi-DC case you can have a 2nd thread (the OTC one) sending a Message, while the SP thread updates its Header. > Header class not thread safe, but mutated by multiple threads > ------------------------------------------------------------- > > Key: CASSANDRA-3530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3530 > Project: Cassandra > Issue Type: Bug > Affects Versions: 0.8.8, 1.0.3 > Reporter: Sean Bridges > Assignee: Jonathan Ellis > Fix For: 0.8.8, 1.0.4 > > Attachments: 3530-0.8.txt, 3530-v2.txt, CASSANDRA-3530.patch > > > With Cassandra 1.0.3 we are getting exceptions like, > Fatal exception in thread > Thread[WRITE-/xx.xx.xx.xx,5,main]java.util.ConcurrentModificationException > > at java.util.Hashtable$Enumerator.next(Unknown Source) > at org.apache.cassandra.net.Header.serializedSize(Header.java:97) > > at > org.apache.cassandra.net.OutboundTcpConnection.messageLength(OutboundTcpConnection.java:164) > at > org.apache.cassandra.net.OutboundTcpConnection.write(OutboundTcpConnection.java:154) > > at > org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:115) > > at > org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:94) > and, > ERROR [WRITE-/xx.xx.xx.xx] 2011-11-24 22:08:28,981 > AbstractCassandraDaemon.java (line 133) Fatal exception in thread > Thread[WRITE-/10.30.12.79,5,main]java.lang.NullPointerException > at org.apache.cassandra.net.Header.serializedSize(Header.java:101) > at > org.apache.cassandra.net.OutboundTcpConnection.messageLength(OutboundTcpConnection.java:164) > at > org.apache.cassandra.net.OutboundTcpConnection.write(OutboundTcpConnection.java:154) > at > org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:115) > > at > org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:94) > It looks like Header is not thread safe, but the same header instance is > modified concurrently while being sent to several threads in > StorageProxy.sendMessages. > This bug eventually causes the node to OOM, as it kills the > OutboundTcpConnection thread, which means nothing is dequeing from queue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira