[ https://issues.apache.org/jira/browse/CASSANDRA-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pavel Yaskevich resolved CASSANDRA-4583. ---------------------------------------- Resolution: Duplicate This looks like it was caused by the same problem as CASSANDRA-4129 and timestamp problems related to nanoTime usage for schema, all of that is fixed in 1.1.4 > Some nodes forget schema when 1 node fails > ------------------------------------------ > > Key: CASSANDRA-4583 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4583 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.1.2 > Environment: CentOS release 6.3 (Final) > Reporter: Edward Sargisson > Attachments: cass-4583-2-system.log, cass-4583-5-system.log > > > At present we do not have a complete reproduction for this defect but am > raising this defect as request by Aaron Morton. We will update as we find out > more. If any additional logging or tests are requested we will do them if we > can. > We have experienced 2 failures ascribed to this defect. On the cassandra user > mailing list Peter Schuller (2012-08-28) describes an additional failure. > Reproduction steps as currently known: > 1. Setup a cluster with 6 nodes (call them #1 through #6). > 2. Have #5 fail completely. One failure was when the node was stopped to > replace the battery in the hard disk cache. The second failure was when the > hardware monitoring recorded a problem, CPU usage was increasing without > explanation and the server console was frozen so the machine was restarted. > 3. Bring #5 back > Expected behaviour: > * #5 should rejoin the ring. > Actual behaviour (based on the incident we saw yesterday): > * #5 didn't rejoin the ring. > * We stopped all nodes and started them one by one. > * Nodes #2, #4, #6 had forgotten most of their column families. They had the > keys space but with only one column family instead of the usual 9 or so. > * We ran nodetool resetlocalschema on #2, #4 and #6. > * We ran nodetool repair -pr on #2, #4, #5 and #6 > * On #2 nodetool repair appeared to crash in that there were no messages in > the logs from it for 10min+. Nodetool compactionstats and nodetool netstats > showed no activity. > * Restarting nodetool repair -pr fixed the problem and ran to completion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira