Re: How to solve this kind of schema disagreement...

aaron morton Wed, 10 Aug 2011 02:47:46 -0700

I don't have time to look into the reasons for that error, but that does not 
sound good. It kind of sounds like there are multiple migration chains out 
there in the cluster. This could come from apply changes to different nodes at 
the same time.


Is this a prod system ? If not I would shut it down, wipe all the Schema and 
Migration SSTables and then apply the schema again one CF at a time (it will 
take time to read the data). 

If it's a prod system it may need some delicate surgery on the Migrations and 
Schema CF's. 

Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Aug 2011, at 15:41, Dikang Gu wrote:

> And a lot of "not apply" logs.
> 
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,376 
> DefinitionsUpdateVerbHandler.java (line 70) Applying AddColumnFamily from 
> /192.168.1.9
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,376 
> DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous 
> version mismatch. cannot apply.
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,379 
> DefinitionsUpdateVerbHandler.java (line 70) Applying AddColumnFamily from 
> /192.168.1.9
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,379 
> DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous 
> version mismatch. cannot apply.
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,382 
> DefinitionsUpdateVerbHandler.java (line 70) Applying AddColumnFamily from 
> /192.168.1.9
> DEBUG [MigrationStage:1] 2011-08-10 11:36:29,382 
> DefinitionsUpdateVerbHandler.java (line 80) Migration not applied Previous 
> version mismatch. cannot apply.
> 
> -- 
> Dikang Gu
> 0086 - 18611140205
> On Wednesday, August 10, 2011 at 11:35 AM, Dikang Gu wrote:
> 
>> Hi Aaron,
>> 
>> I set the log level to be DEBUG, and find a lot of forceFlush debug info in 
>> the log:
>> 
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 
>> 725) forceFlush requested but everything is clean
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 
>> 725) forceFlush requested but everything is clean
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 
>> 725) forceFlush requested but everything is clean
>> DEBUG [StreamStage:1] 2011-08-10 11:31:56,345 ColumnFamilyStore.java (line 
>> 725) forceFlush requested but everything is clean
>> 
>> What does this mean?
>> 
>> Thanks.
>>  
>> 
>> -- 
>> Dikang Gu
>> 0086 - 18611140205
>> On Wednesday, August 10, 2011 at 6:42 AM, aaron morton wrote:
>> 
>>> um. There has got to be something stopping the migration from completing. 
>>> 
>>> Turn the logging up to DEBUG before starting and look for messages from 
>>> MigrationManager.java
>>> 
>>> Provide all the log messages from Migration.java on the 1.27 node
>>> 
>>> Cheers
>>> 
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 8 Aug 2011, at 15:52, Dikang Gu wrote:
>>> 
>>>> Hi Aaron, 
>>>> 
>>>> I repeat the whole procedure:
>>>> 
>>>> 1. kill the cassandra instance on 1.27.
>>>> 2. rm the data/system/Migrations-g-*
>>>> 3. rm the data/system/Schema-g-*
>>>> 4. bin/cassandra to start the cassandra.
>>>> 
>>>> Now, the migration seems stop and I do not find any error in the 
>>>> system.log yet.
>>>> 
>>>> The ring looks good:
>>>> [root@yun-phy2 apache-cassandra-0.8.1]# bin/nodetool -h192.168.1.27 -p8090 
>>>> ring
>>>> Address         DC          Rack        Status State   Load            
>>>> Owns    Token                                       
>>>>                                                                            
>>>>     127605887595351923798765477786913079296     
>>>> 192.168.1.28    datacenter1 rack1       Up     Normal  8.38 GB         
>>>> 25.00%  1                                           
>>>> 192.168.1.25    datacenter1 rack1       Up     Normal  8.54 GB         
>>>> 34.01%  57856537434773737201679995572503935972      
>>>> 192.168.1.27    datacenter1 rack1       Up     Normal  1.78 GB         
>>>> 24.28%  99165710459060760249270263771474737125      
>>>> 192.168.1.9     datacenter1 rack1       Up     Normal  8.75 GB         
>>>> 16.72%  127605887595351923798765477786913079296  
>>>> 
>>>> But the schema still does not correct:
>>>> Cluster Information:
>>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>    Schema versions: 
>>>>    75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 
>>>> 192.168.1.25]
>>>>    5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>>> 
>>>> The 5a54ebd0-bd90-11e0-0000-9510c23fceff is same as last time…
>>>> 
>>>> And in the log, the last Migration.java log is:
>>>>  INFO [MigrationStage:1] 2011-08-08 11:41:30,293 Migration.java (line 116) 
>>>> Applying migration 5a54ebd0-bd90-11e0-0000-9510c23fceff Add keyspace: 
>>>> SimpleDB_4E38DAA64894A9146100000500000000rep 
>>>> strategy:SimpleStrategy{}durable_writes: true
>>>> 
>>>> Could you explain this? 
>>>> 
>>>> If I change the token given to 1.27 to another one, will it help?
>>>> 
>>>> Thanks.
>>>> 
>>>> -- 
>>>> Dikang Gu
>>>> 0086 - 18611140205
>>>> On Sunday, August 7, 2011 at 4:14 PM, aaron morton wrote:
>>>> 
>>>>> did you check the logs in 1.27 for errors ? 
>>>>> 
>>>>> Could you be seeing this ? 
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-2867
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> -----------------
>>>>> Aaron Morton
>>>>> Freelance Cassandra Developer
>>>>> @aaronmorton
>>>>> http://www.thelastpickle.com
>>>>> 
>>>>> On 7 Aug 2011, at 16:24, Dikang Gu wrote:
>>>>> 
>>>>>> I restart both nodes, and deleted the shcema* and migration* and 
>>>>>> restarted them.
>>>>>> 
>>>>>> The current cluster looks like this:
>>>>>> [default@unknown] describe cluster;         
>>>>>> Cluster Information:
>>>>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>>>    Schema versions: 
>>>>>>  75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.28, 192.168.1.9, 
>>>>>> 192.168.1.25]
>>>>>>  5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>>>>> 
>>>>>> the 1.28 looks good, and the 1.27 still can not get the schema 
>>>>>> agreement...
>>>>>> 
>>>>>> I have tried several times, even delete all the data on 1.27, and rejoin 
>>>>>> it as a new node, but it is still unhappy.
>>>>>> 
>>>>>> And the ring looks like this: 
>>>>>> 
>>>>>> Address         DC          Rack        Status State   Load            
>>>>>> Owns    Token                                       
>>>>>>                                                                          
>>>>>>       127605887595351923798765477786913079296     
>>>>>> 192.168.1.28    datacenter1 rack1       Up     Normal  8.38 GB         
>>>>>> 25.00%  1                                           
>>>>>> 192.168.1.25    datacenter1 rack1       Up     Normal  8.55 GB         
>>>>>> 34.01%  57856537434773737201679995572503935972     
>>>>>> 192.168.1.27    datacenter1 rack1       Up     Joining 1.81 GB         
>>>>>> 24.28%  99165710459060760249270263771474737125      
>>>>>> 192.168.1.9     datacenter1 rack1       Up     Normal  8.75 GB         
>>>>>> 16.72%  127605887595351923798765477786913079296 
>>>>>> 
>>>>>> The 1.27 seems can not join the cluster, and it just hangs there...
>>>>>> 
>>>>>> Any suggestions?
>>>>>> 
>>>>>> Thanks.
>>>>>> 
>>>>>> 
>>>>>> On Sun, Aug 7, 2011 at 10:01 AM, aaron morton <aa...@thelastpickle.com> 
>>>>>> wrote:
>>>>>> After there restart you what was in the  logs for the 1.27 machine  from 
>>>>>> the Migration.java logger ? Some of the messages will start with 
>>>>>> "Applying migration"
>>>>>> 
>>>>>> You should have shut down both of the nodes, then deleted the schema* 
>>>>>> and migration* system sstables, then restarted one of them and watched 
>>>>>> to see if it got to schema agreement. 
>>>>>> 
>>>>>> Cheers
>>>>>>   
>>>>>> -----------------
>>>>>> Aaron Morton
>>>>>> Freelance Cassandra Developer
>>>>>> @aaronmorton
>>>>>> http://www.thelastpickle.com
>>>>>> 
>>>>>> On 6 Aug 2011, at 22:56, Dikang Gu wrote:
>>>>>> 
>>>>>>> I have tried this, but the schema still does not agree in the cluster:
>>>>>>> 
>>>>>>> [default@unknown] describe cluster;
>>>>>>> Cluster Information:
>>>>>>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>>>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>>>>    Schema versions: 
>>>>>>>         UNREACHABLE: [192.168.1.28]
>>>>>>>         75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 
>>>>>>> 192.168.1.25]
>>>>>>>         5a54ebd0-bd90-11e0-0000-9510c23fceff: [192.168.1.27]
>>>>>>> 
>>>>>>> Any other suggestions to solve this?
>>>>>>> 
>>>>>>> Because I have some production data saved in the cassandra cluster, so 
>>>>>>> I can not afford data lost...
>>>>>>> 
>>>>>>> Thanks.
>>>>>>> 
>>>>>>> On Fri, Aug 5, 2011 at 8:55 PM, Benoit Perroud <ben...@noisette.ch> 
>>>>>>> wrote:
>>>>>>>> Based on http://wiki.apache.org/cassandra/FAQ#schema_disagreement,
>>>>>>>> 75eece10-bf48-11e0-0000-4d205df954a7 own the majority, so shutdown and
>>>>>>>> remove the schema* and migration* sstables from both 192.168.1.28 and
>>>>>>>> 192.168.1.27
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2011/8/5 Dikang Gu <dikan...@gmail.com>:
>>>>>>>> > [default@unknown] describe cluster;
>>>>>>>> > Cluster Information:
>>>>>>>> >    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>>>>>>> >    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>>>>>>> >    Schema versions:
>>>>>>>> > 743fe590-bf48-11e0-0000-4d205df954a7: [192.168.1.28]
>>>>>>>> > 75eece10-bf48-11e0-0000-4d205df954a7: [192.168.1.9, 192.168.1.25]
>>>>>>>> > 06da9aa0-bda8-11e0-0000-9510c23fceff: [192.168.1.27]
>>>>>>>> >
>>>>>>>> >  three different schema versions in the cluster...
>>>>>>>> > --
>>>>>>>> > Dikang Gu
>>>>>>>> > 0086 - 18611140205
>>>>>>>> >
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Dikang Gu
>>>>>>> 
>>>>>>> 0086 - 18611140205
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Dikang Gu
>>>>>> 
>>>>>> 0086 - 18611140205
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: How to solve this kind of schema disagreement...

Reply via email to