[ 
https://issues.apache.org/jira/browse/CASSANDRA-11974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo reassigned CASSANDRA-11974:
-------------------------------------------

    Assignee: Edward Capriolo

> Failed assert causes OutboundTcpConnection to exit
> --------------------------------------------------
>
>                 Key: CASSANDRA-11974
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11974
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Sean Thornton
>            Assignee: Edward Capriolo
>
> I am seeing the following in a client's cluster:
> {noformat}
> ERROR [MessagingService-Outgoing-/10.0.0.1] 2016-06-06 03:38:19,305  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MessagingService-Outgoing-/10.0.0.1,5,main]
> java.lang.AssertionError: 635174
>         at 
> org.apache.cassandra.utils.ByteBufferUtil.writeWithShortLength(ByteBufferUtil.java:290)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.serialize(AbstractCType.java:392)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.composites.AbstractCType$Serializer.serialize(AbstractCType.java:381)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.filter.ColumnSlice$Serializer.serialize(ColumnSlice.java:271)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.filter.ColumnSlice$Serializer.serialize(ColumnSlice.java:259)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.filter.SliceQueryFilter$Serializer.serialize(SliceQueryFilter.java:503)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.filter.SliceQueryFilter$Serializer.serialize(SliceQueryFilter.java:490)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.SliceFromReadCommandSerializer.serialize(SliceFromReadCommand.java:168)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.ReadCommandSerializer.serialize(ReadCommand.java:143) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.db.ReadCommandSerializer.serialize(ReadCommand.java:132) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at org.apache.cassandra.net.MessageOut.serialize(MessageOut.java:121) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.net.OutboundTcpConnection.writeInternal(OutboundTcpConnection.java:330)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:282)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
>         at 
> org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:218)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> {noformat}
> Obviously they somehow exceeded a 64K limit (quick and dirty suspects - 
> https://docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html) but 
> that is neither here nor there.
> The problem I see when this happens is 
> {{ByteBufferUtil.writeWithShortLength}} can throw a 
> {{java.lang.AssertionError}} which is a true {{Error}} that bubbles up and 
> totally bypasses the {{catch (Exception e)}} clause in the message processing 
> loop in {{OutboundTcpConnection.run()}} _which causes the thread to exit and 
> that node to no longer communicate outgoing messages to other nodes_.
> At least from my perspective, there are two things I would like to see 
> handled differently -
> * In the event of _any_ problem, I would like to see whatever details 
> possible be logged about the problem Message - partition key, CF data, 
> anything.  Right now it can be very difficult to track this down
> * The {{java.lang.Error}} possibility needs to be handled somehow.  If it's 
> an assertion error, it seems like we could continue the processing loop.  But 
> shutting down the JVM would be better than what I get now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to