Re: Lost data after expanding cluster c* 1.2.3-1

aaron morton Sun, 31 Mar 2013 16:59:28 -0700

Please do not rely on colour in your emails, the best way to get your emails 
accepted by the Apache mail servers is to use plain text.


> At this moment the errors started, we see that members and other data are 
> gone, at this moment the nodetool status return (in red color the 3 new nodes)
What errors?

> I put for each of them seeds = A ip, and start each with two minutes 
> intervals. 
When I'm making changes I tend to change a single node first, confirm 
everything is OK and then do a bulk change.

> Now the cluster seem to work normally, but i can use the secondary for the 
> moment, the queryanswer are random
run nodetool repair -pr on each node, let it finish before starting the next 
one. 
if you are using secondary indexes use nodetool rebuild_index to rebuild those. 
Add one node new node to the cluster and confirm everything is ok, then add the 
remaining ones. 

I'm not sure what or why it went wrong, but that should get you to a stable 
place. If you have any problems keep an eye on the logs for errors or warnings. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/03/2013, at 10:01 PM, Kais Ahmed <k...@neteck-fr.com> wrote:

> Hi aaron,
> 
> Thanks for reply, i will try to explain what append exactly
> 
> I had 4 C* called [A,B,C,D] cluster (1.2.3-1 version) start with ec2 ami 
> (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2) with 
> this config --clustername myDSCcluster --totalnodes 4--version community
> 
> Two days after this cluster in production, i saw that the cluster was 
> overload, I wanted to extend it by adding 3 another nodes.
> 
> I create a new cluster with 3 C* [D,E,F]  
> (https://aws.amazon.com/amis/datastax-auto-clustering-ami-2-2)
> 
> And follow the documentation 
> (http://www.datastax.com/docs/1.2/install/expand_ami) for adding them in the 
> ring.
> I put for each of them seeds = A ip, and start each with two minutes 
> intervals. 
> 
> At this moment the errors started, we see that members and other data are 
> gone, at this moment the nodetool status return (in red color the 3 new nodes)
> 
> Datacenter: eu-west
> ===================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/
>> Moving
>> --  Address           Load       Tokens  Owns   Host ID                      
>>          Rack
>> UN  10.34.142.xxx     10.79 GB   256     15.4%  
>> 4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>> UN  10.32.49.xxx       1.48 MB    256        13.7%  
>> e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>> UN  10.33.206.xxx      2.19 MB    256    11.9%  
>> 92af17c3-954a-4511-bc90-29a9657623e4  1b
>> UN  10.32.27.xxx       1.95 MB    256      14.9%  
>> 862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>> UN  10.34.139.xxx     11.67 GB   256    15.5%  
>> 0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>> UN  10.34.147.xxx     11.18 GB   256     13.9%  
>> cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>> UN  10.33.193.xxx     10.83 GB   256      14.7%  
>> 59f440db-cd2d-4041-aab4-fc8e9518c954  1b
> 
> I saw that the 3 nodes have join the ring but they had no data, i put the 
> website in maintenance and lauch a nodetool repair on
> the 3 new nodes, during 5 hours i see in opcenter the data streamed to the 
> new nodes (very nice :))
> 
> During this time, i write a script to check if all members are present 
> (relative to a copy of members in mysql).
> 
> After data streamed seems to be finish, but i'm not sure because nodetool 
> compactionstats show pending task but nodetool netstats seems to be ok.
> 
> I ran my script to check if the data, but members are still missing.
> 
> I decide to roolback by running nodetool decommission node D, E, F
> 
> I re run my script, all seems to be ok but secondary index have strange 
> behavior, 
> some time the row was returned some times no result.
> 
> the user kais can be retrieve using his key with cassandra-cli but if i use 
> cqlsh :
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:database> SELECT login FROM userdata where login='kais' ; //empty
> cqlsh:database> SELECT login FROM userdata where login='kais' ;
> 
>  login
> ----------------
>  kais
> 
> cqlsh:mydatabase>Tracing on;
> When tracing is activate i have this error but not all time
> cqlsh:mydatabase> SELECT * FROM userdata where login='kais' ;
> unsupported operand type(s) for /: 'NoneType' and 'float'
> 
> 
> NOTE : When the cluster contained 7 nodes, i see that my table userdata (RF 
> 3) on node D was replicated on E and F, that would seem strange because its 3 
> node was not correctly filled
> 
> Now the cluster seem to work normally, but i can use the secondary for the 
> moment, the query answer are random
> 
> Thanks a lot for any help,
> Kais
> 
> 
> 
> 
> 
> 2013/3/31 aaron morton <aa...@thelastpickle.com>
> First thought is the new nodes were marked as seeds. 
> Next thought is check the logs for errors. 
> 
> You can always run a nodetool repair if you are concerned data is not where 
> you think it should be. 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/03/2013, at 8:01 PM, Kais Ahmed <k...@neteck-fr.com> wrote:
> 
>> Hi all,
>> 
>> I follow this tutorial for expanding a 4 c* cluster (production) and add 3 
>> new nodes.
>> 
>> Datacenter: eu-west
>> ===================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address           Load       Tokens  Owns   Host ID                      
>>          Rack
>> UN  10.34.142.xxx     10.79 GB   256     15.4%  
>> 4e2e26b8-aa38-428c-a8f5-e86c13eb4442  1b
>> UN  10.32.49.xxx       1.48 MB    256        13.7%  
>> e86f67b6-d7cb-4b47-b090-3824a5887145  1b
>> UN  10.33.206.xxx      2.19 MB    256    11.9%  
>> 92af17c3-954a-4511-bc90-29a9657623e4  1b
>> UN  10.32.27.xxx       1.95 MB    256      14.9%  
>> 862e6b39-b380-40b4-9d61-d83cb8dacf9e  1b
>> UN  10.34.139.xxx     11.67 GB   256    15.5%  
>> 0324e394-b65f-46c8-acb4-1e1f87600a2c  1b
>> UN  10.34.147.xxx     11.18 GB   256     13.9%  
>> cfc09822-5446-4565-a5f0-d25c917e2ce8  1b
>> UN  10.33.193.xxx     10.83 GB   256      14.7%  
>> 59f440db-cd2d-4041-aab4-fc8e9518c954  1b
>> 
>> The data are not streamed.
>> 
>> Can any one help me, our web site is down.
>> 
>> Thanks a lot,
>> 
>> 
> 
>

Re: Lost data after expanding cluster c* 1.2.3-1

Reply via email to