Hinted handoffs isn't delivered if/when HintedHandOffManager ends up in invalid 
state.
--------------------------------------------------------------------------------------

                 Key: CASSANDRA-3546
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3546
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.0.3
            Reporter: Fredrik L Stigbäck


Running Cassandra 1.0.3.
I've done some testing with 2 nodes (node A, node B), replication factor 2.
I take node A down, writing some data to node B and then take node A up.
Sometimes hints aren't delivered when node A comes up.

I've done some debugging in org.apache.cassandra.db.HintedHandOffManager and 
sometimes node B ends up in a strange state in method 
org.apache.cassandra.db.HintedHandOffManager.deliverHints(final InetAddress 
to), where org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries 
already has node A in it's Set and therefore no hints will ever be delivered to 
node A.

The only reason for this that I can see is that in 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(InetAddress 
endpoint) the hintStore.isEmpty() check returns true and the endpoint (node A)  
isn't removed from 
org.apache.cassandra.db.HintedHandOffManager.queuedDeliveries. Then no hints 
will ever be delivered again until node B is restarted.

During what conditions will hintStore.isEmpty() return true?
Shouldn't the hintStore.isEmpty() check be inside the try {} finally{} clause, 
removing the endpoint from queuedDeliveries in the finally block?

{code}
public void deliverHints(final InetAddress to)
{
    logger_.debug("deliverHints to {}", to);
    if (!queuedDeliveries.add(to))
        return;
    .......
}
{code}

{code}
private void deliverHintsToEndpoint(InetAddress endpoint) 
    throws IOException, DigestMismatchException, InvalidRequestException, 
TimeoutException, InterruptedException
{
     ColumnFamilyStore hintStore = 
Table.open(Table.SYSTEM_TABLE).getColumnFamilyStore(HINTS_CF);
     if (hintStore.isEmpty())
         return; // nothing to do, don't confuse users by logging a no-op 
handoff
     try
     {
         ......
     }
     finally
     {
         queuedDeliveries.remove(endpoint);
     }
}
{code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to