[ https://issues.apache.org/jira/browse/CASSANDRA-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams reopened CASSANDRA-3533: ----------------------------------------- Something's wrong here, because I'm randomly seeing these in the dtests: {noformat} INFO [main] 2013-04-05 04:53:22,574 ThriftServer.java (line 90) Binding thrift service to /127.0.0.2:9160 INFO [main] 2013-04-05 04:53:22,622 ThriftServer.java (line 102) Using TFramedTransport with a max frame size of 15728640 bytes. ERROR [GossipStage:1] 2013-04-05 04:53:23,048 CassandraDaemon.java (line 179) Exception in thread Thread[GossipStage:1,5,main] java.lang.AssertionError at org.apache.cassandra.service.EchoVerbHandler.doVerb(EchoVerbHandler.java:17) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) {noformat} > TimeoutException when there is a firewall issue. > ------------------------------------------------ > > Key: CASSANDRA-3533 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3533 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Vijay > Assignee: Vijay > Priority: Minor > Fix For: 2.0 > > Attachments: 0001-CASSANDRA-3533.patch, 3533.txt > > > When one node in the cluster is not able to talk to the other DC/RAC due to > firewall or network related issue (StorageProxy calls fail), and the nodes > are NOT marked down because at least one node in the cluster can talk to the > other DC/RAC, we get timeoutException instead of throwing a > unavailableException. > The problem with this: > 1) It is hard to monitor/identify these errors. > 2) It is hard to diffrentiate from the client if the node being bad vs a bad > query. > 3) when this issue happens we have to wait for at-least the RPC timeout time > to know that the query wont succeed. > Possible Solution: when marking a node down we might want to check if the > node is actually alive by trying to communicate to it? So we can be sure that > the node is actually alive. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira