There are so many pending transfer on production server . That is different 
between production and developing, that is my concern.

Sent from my iPhone

> On 2015年8月15日, at 上午3:00, Dmitri Zagidulin <dzagidu...@basho.com> wrote:
> 
> Pending 0% just means no pending transfers, the cluster state is stable. 
> 
> If you've successfully tested the process on a test cluster, there's no 
> reason why it'd be different in production. 
> 
>> On Friday, August 14, 2015, changmao wang <wang.chang...@gmail.com> wrote:
>> During last three days, I setup a developing riak cluster with five nodes, 
>> and used "s3cmd" to upload 18GB testing data(maybe 20 thousands of files).
>> After that, I tried to let one node leaving the cluster, and then shutdown 
>> and mark down it. Replacing the IP address and joining the cluster again.
>> The above whole processes were successful. However, I'm not sure whether no 
>> not it can be done on production environment. 
>> 
>> I followed below the docs to do above steps:
>> 
>> http://docs.basho.com/riak/latest/ops/running/nodes/renaming/
>> 
>> After I run "riak-admin cluster leave riak@'x.x.x.x'" ,"riak-admin cluster 
>> plan", "riak-admin cluster commit", then checked the member-status, the main 
>> difference of leaving cluster on production and developing environment are 
>> as below:
>> 
>> root@cluster-s3-dev-hd1:~# riak-admin member-status
>> ================================= Membership 
>> ==================================
>> Status     Ring    Pending    Node
>> -------------------------------------------------------------------------------
>> leaving    18.8%      0.0%    'riak@10.21.236.185'
>> valid      21.9%     25.0%    'riak@10.21.236.181'
>> valid      21.9%     25.0%    'riak@10.21.236.182'
>> valid      18.8%     25.0%    'riak@10.21.236.183'
>> valid      18.8%     25.0%    'riak@10.21.236.184'
>> -------------------------------------------------------------------------------
>> 
>> several minutes elapsed, the then checking the status as below:
>> 
>> 
>> root@cluster-s3-dev-hd1:~# riak-admin member-status
>> ================================= Membership 
>> ==================================
>> Status     Ring    Pending    Node
>> -------------------------------------------------------------------------------
>> leaving    12.5%      0.0%    'riak@10.21.236.185'
>> valid      21.9%     25.0%    'riak@10.21.236.181'
>> valid      28.1%     25.0%    'riak@10.21.236.182'
>> valid      18.8%     25.0%    'riak@10.21.236.183'
>> valid      18.8%     25.0%    'riak@10.21.236.184'
>> -------------------------------------------------------------------------------
>> Valid:4 / Leaving:1 / Exiting:0 / Joining:0 / Down:0
>> 
>> After that, I shutdown riak  with "riak stop", and mark down it on active 
>> nodes.
>> My question is what's the meaning ot "Pending 0.0%"?
>> 
>> On production cluster, the status are as below:
>> root@cluster1-hd12:/root/scripts# riak-admin transfers
>> 'riak@10.21.136.94' waiting to handoff 5 partitions
>> 'riak@10.21.136.93' waiting to handoff 5 partitions
>> 'riak@10.21.136.92' waiting to handoff 5 partitions
>> 'riak@10.21.136.91' waiting to handoff 5 partitions
>> 'riak@10.21.136.86' waiting to handoff 5 partitions
>> 'riak@10.21.136.81' waiting to handoff 2 partitions
>> 'riak@10.21.136.76' waiting to handoff 3 partitions
>> 'riak@10.21.136.71' waiting to handoff 5 partitions
>> 'riak@10.21.136.66' waiting to handoff 5 partitions
>> 
>> And there're active transfers.  On developing environment, there're no 
>> active transfers after my running of "riak-admin cluster commit".
>> Can I follow the same steps as developing environment to run it on 
>> production cluster?
>> 
>> 
>> 
>>> On Wed, Aug 12, 2015 at 10:39 PM, Dmitri Zagidulin <dzagidu...@basho.com> 
>>> wrote:
>>> Responses inline.
>>> 
>>> 
>>>> On Tue, Aug 11, 2015 at 12:53 PM, changmao wang <wang.chang...@gmail.com> 
>>>> wrote:
>>>> 1. About backuping new nodes of four and then using 'riak-admin 
>>>> force-replace'. what's the status of new added nodes?
>>>> as you know, we want to replace one of leaving nodes.
>>> 
>>> I don't understand the question. Doing 'riak-admin force-replace' on one of 
>>> the nodes that's leaving should overwrite the leave request and tell it to 
>>> change its node id / ip address. (If that doesn't work, stop the leaving 
>>> node, and do a 'riak-admin reip' command instead).
>>> 
>>>  
>>>> 2. what's the risk of 'riak-admin force-remove' 'riak@10.21.136.91' 
>>>> without backup?
>>>> As you know, now the node(riak@10.21.136.91) is a member of the cluster, 
>>>> and keeping almost 2.5TB data, maybe 10 percent of the whole cluster.
>>>  
>>> The only reason I asked about backup is because it sounded like you cleared 
>>> the disk on it. If it currently has the data, then it'll be fine. 
>>> Force-remove just changes the IP address, and doesn't delete the data or 
>>> anything.
>>> 
>>> 
>>>> On Tue, Aug 11, 2015 at 7:32 PM, Dmitri Zagidulin <dzagidu...@basho.com> 
>>>> wrote:
>>>> 1. How to force leave "leaving"'s nodes without data loss?
>>>> 
>>>> This depends on - did you back up the data directory of the 4 new nodes, 
>>>> before you reformatted them? 
>>>> If you backed them up (and then restored the data directory once you 
>>>> reformatted them), you can try:
>>>> 
>>>> riak-admin force-replace 'riak@10.21.136.91' 'riak@<whatever your new ip 
>>>> address is for that node>'
>>>> (same for the other 3)
>>>> 
>>>> If you did not back up those nodes, the only thing you can do is force 
>>>> them to leave, and then join the new ones. So, for each of the 4:
>>>> 
>>>> riak-admin force-remove 'riak@10.21.136.91' 'riak@10.21.136.66'
>>>> (same for the other 3)
>>>> 
>>>> In either case, after force-replacing or force-removing, you have to join 
>>>> the new nodes to the cluster, before you commit.
>>>> 
>>>> riak-admin join 'riak@new node' 'riak@10.21.136.66'
>>>> (same for the other 3)
>>>> and finally:
>>>> riak-cluster plan
>>>> riak-cluster commit
>>>> 
>>>> As for the error, the reason you're seeing it, is because the other nodes 
>>>> can't contact the 4 that are supposed to be leaving. (Since you wiped 
>>>> them).
>>>> The amount of time that passed doesn't matter, the cluster will be waiting 
>>>> for those nodes to leave indefinitely, unless you force-remove or 
>>>> force-replace.
>>>> 
>>>> 
>>>> 
>>>>> On Tue, Aug 11, 2015 at 1:32 AM, changmao wang <wang.chang...@gmail.com> 
>>>>> wrote:
>>>>> HI Dmitri,
>>>>> 
>>>>> For your question,
>>>>> 3) Re-formatted those four nodes and re-installed Riak. Here is where it 
>>>>> gets tricky though. Several questions for you:
>>>>> - Did you attempt to re-join those 4 reinstalled nodes into the cluster? 
>>>>> What was the output of the cluster join and cluster plan commands?
>>>>> - Did the IP address change, after they were reformatted? If so, you 
>>>>> probably need to use something like 'reip' at this point: 
>>>>> http://docs.basho.com/riak/latest/ops/running/tools/riak-admin/#reip
>>>>> 
>>>>> I did  NOT try to re-join those 4 re-join those 4 reinstalled nodes into 
>>>>> the cluster. As you know, member-status shows 'they're leaving" as below:
>>>>> riak-admin member-status
>>>>> ================================= Membership 
>>>>> ==================================
>>>>> Status     Ring    Pending    Node
>>>>> -------------------------------------------------------------------------------
>>>>> leaving    10.9%     10.9%    'riak@10.21.136.91'
>>>>> leaving     9.4%     10.9%    'riak@10.21.136.92'
>>>>> leaving     7.8%     10.9%    'riak@10.21.136.93'
>>>>> leaving     7.8%     10.9%    'riak@10.21.136.94'
>>>>> valid      10.9%     10.9%    'riak@10.21.136.66'
>>>>> valid      10.9%     10.9%    'riak@10.21.136.71'
>>>>> valid      14.1%     10.9%    'riak@10.21.136.76'
>>>>> valid      17.2%     12.5%    'riak@10.21.136.81'
>>>>> valid      10.9%     10.9%    'riak@10.21.136.86'
>>>>> -------------------------------------------------------------------------------
>>>>> Valid:5 / Leaving:4 / Exiting:0 / Joining:0 / Down:0
>>>>> 
>>>>> two weeks elapsed, 'riak-admin member-status' shows same result. I don't 
>>>>> know which step ring hand off?
>>>>> 
>>>>> I did not changed the IP address of four newly adding nodes. 
>>>>> 
>>>>> My questions:
>>>>> 
>>>>> 1. How to force leave "leaving"'s nodes without data loss?
>>>>> 2. I have found some errors related to handoff of partition in 
>>>>> /etc/riak/log/errors.
>>>>> Details are as below:
>>>>> 
>>>>> 2015-07-30 16:04:33.643 [error] 
>>>>> <0.12872.15>@riak_core_handoff_sender:start_fold:262 ownership_transfer 
>>>>> transfer of riak_kv_vnode from 'riak@10.21.136.76' 
>>>>> 45671926166590716193865151022383844364247891968 to 'riak@10.21.136.93' 
>>>>> 45671926166590716193865151022383844364247891968 failed because of enotconn
>>>>> 2015-07-30 16:04:33.643 [error] 
>>>>> <0.197.0>@riak_core_handoff_manager:handle_info:289 An outbound handoff 
>>>>> of partition riak_kv_vnode 
>>>>> 45671926166590716193865151022383844364247891968 was terminated for 
>>>>> reason: {shutdown,{error,enotconn}}
>>>>>                                                                           
>>>>>             
>>>>> 
>>>>> I have searched it with google and found related articles. However, 
>>>>> there's no solution.
>>>>> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2014-October/016052.html
>>>>> 
>>>>> 
>>>>>> On Mon, Aug 10, 2015 at 10:09 PM, Dmitri Zagidulin 
>>>>>> <dzagidu...@basho.com> wrote:
>>>>>> Hi Changmao,
>>>>>> 
>>>>>> The state of the cluster can be determined from running 'riak-admin 
>>>>>> member-status' and 'riak-admin ring-status'.
>>>>>> If I understand the sequence of events, you:
>>>>>> 1) Joined four new nodes to the cluster. (Which crashed due to not 
>>>>>> enough disk space)
>>>>>> 2) Removed them from the cluster via 'riak-admin cluster leave'.  This 
>>>>>> is a "planned remove" command, and expects for the nodes to gradually 
>>>>>> hand off their partitions (to transfer ownership) before actually 
>>>>>> leaving.  So this is probably the main problem - the ring is stuck 
>>>>>> waiting for those nodes to properly hand off.
>>>>>> 
>>>>>> 3) Re-formatted those four nodes and re-installed Riak. Here is where it 
>>>>>> gets tricky though. Several questions for you:
>>>>>> - Did you attempt to re-join those 4 reinstalled nodes into the cluster? 
>>>>>> What was the output of the cluster join and cluster plan commands?
>>>>>> - Did the IP address change, after they were reformatted? If so, you 
>>>>>> probably need to use something like 'reip' at this point: 
>>>>>> http://docs.basho.com/riak/latest/ops/running/tools/riak-admin/#reip
>>>>>> 
>>>>>> The 'failed because of enotconn' error message is happening because the 
>>>>>> cluster is waiting to hand off partitions to .94, but cannot connect to 
>>>>>> it.
>>>>>> 
>>>>>> Anyways, here's what I recommend. If you can lose the data, it's 
>>>>>> probably easier to format and reinstall the whole cluster.
>>>>>> If not, you can 'force-remove' those four nodes, one by one (see 
>>>>>> http://docs.basho.com/riak/latest/ops/running/cluster-admin/#force-remove
>>>>>>  )
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Thu, Aug 6, 2015 at 11:55 PM, changmao wang 
>>>>>>> <wang.chang...@gmail.com> wrote:
>>>>>>> Dmitri,
>>>>>>> 
>>>>>>> Thanks for your quick reply.
>>>>>>> my question are as below:
>>>>>>> 1. what's the current status of the whole cluster? Is't doing data 
>>>>>>> balance?
>>>>>>> 2. there's so many errors during one of the node error log. how to 
>>>>>>> handle it?
>>>>>>> 2015-08-05 01:38:59.717 [error] 
>>>>>>> <0.23000.298>@riak_core_handoff_sender:start_fold:262 
>>>>>>> ownership_transfer transfer of riak_kv_vnode from 'riak@10.21.136.81' 
>>>>>>> 525227150915793236229449236757414210188850757632 to 'riak@10.21.136.94' 
>>>>>>> 525227150915793236229449236757414210188850757632 failed because of 
>>>>>>> enotconn
>>>>>>> 2015-08-05 01:38:59.718 [error] 
>>>>>>> <0.195.0>@riak_core_handoff_manager:handle_info:289 An outbound handoff 
>>>>>>> of partition riak_kv_vnode 
>>>>>>> 525227150915793236229449236757414210188850757632 was terminated for 
>>>>>>> reason: {shutdown,{error,enotconn}}
>>>>>>> 
>>>>>>> During the last 5 days, there's no changes of the "riak-admin member 
>>>>>>> status" output.
>>>>>>> 3. how to accelerate the data balance? 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Fri, Aug 7, 2015 at 6:41 AM, Dmitri Zagidulin 
>>>>>>>> <dzagidu...@basho.com> wrote:
>>>>>>>> Ok, I think I understand so far. So what's the question?
>>>>>>>> 
>>>>>>>>> On Thursday, August 6, 2015, Changmao.Wang 
>>>>>>>>> <changmao.w...@datayes.com> wrote:
>>>>>>>>> Hi Riak users,
>>>>>>>>> 
>>>>>>>>> Before adding new nodes, the cluster only have five nodes. The member 
>>>>>>>>> list are as below:
>>>>>>>>> 10.21.136.66,10.21.136.71,10.21.136.76,10.21.136.81,10.21.136.86.
>>>>>>>>> We did not setup http proxy for the cluster, only one node of the 
>>>>>>>>> cluster provide the http service.  so the CPU load is always high on 
>>>>>>>>> this node.
>>>>>>>>> 
>>>>>>>>> After that, I added four nodes (10.21.136.[91-94]) to those cluster. 
>>>>>>>>> During the ring/data balance progress, each node failed(riak stopped) 
>>>>>>>>> because of disk 100% full.
>>>>>>>>> I used multi-disk path to "data_root" parameter in 
>>>>>>>>> '/etc/riak/app.config'. Each disk is only 580MB size. 
>>>>>>>>> As you know, bitcask storage engine did not support multi-disk path. 
>>>>>>>>> After one of the disks is 100% full, it can not switch next idle 
>>>>>>>>> disk. So the "riak" service is down.
>>>>>>>>> 
>>>>>>>>> After that, I removed the new add four nodes at active nodes with 
>>>>>>>>> "riak-admin cluster leave riak@'10.21.136.91'".
>>>>>>>>> and then stop "riak" service on other active new nodes, reformat the 
>>>>>>>>> above new nodes with LVM disk management (bind 6 disk with virtual 
>>>>>>>>> disk group).
>>>>>>>>> Replace the "data-root" parameter with one folder, and then start 
>>>>>>>>> "riak" service again. After that, the cluster began the data balance 
>>>>>>>>> again. 
>>>>>>>>> That's the whole story.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Amao
>>>>>>>>> 
>>>>>>>>> From: "Dmitri Zagidulin" <dzagidu...@basho.com>
>>>>>>>>> To: "Changmao.Wang" <changmao.w...@datayes.com>
>>>>>>>>> Sent: Thursday, August 6, 2015 10:46:59 PM
>>>>>>>>> Subject: Re: why leaving riak cluster so slowly and how to accelerate 
>>>>>>>>> the speed
>>>>>>>>> 
>>>>>>>>> Hi Amao,
>>>>>>>>> 
>>>>>>>>> Can you explain a bit more which steps you've taken, and what the 
>>>>>>>>> problem is?
>>>>>>>>> 
>>>>>>>>> Which nodes have been added, and which nodes are leaving the cluster?
>>>>>>>>> 
>>>>>>>>>> On Tue, Jul 28, 2015 at 11:03 PM, Changmao.Wang 
>>>>>>>>>> <changmao.w...@datayes.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Raik user group,
>>>>>>>>>> 
>>>>>>>>>>  I'm using riak and riak-cs 1.4.2. Last weekend, I added four nodes 
>>>>>>>>>> to cluster with 5 nodes. However, it's failed with one of disks 100% 
>>>>>>>>>> full.
>>>>>>>>>> As you know bitcask storage engine can not support multifolders.
>>>>>>>>>> 
>>>>>>>>>> After that, I restarted the "riak" and leave the cluster with the 
>>>>>>>>>> command "riak-admin cluster leave" and "riak-admin cluster plan", 
>>>>>>>>>> and the commit.
>>>>>>>>>> However, riak is always doing KV balance after my submit leaving 
>>>>>>>>>> command. I guess that it's doing join cluster progress.
>>>>>>>>>> 
>>>>>>>>>> Could you show us how to accelerate the leaving progress? I have 
>>>>>>>>>> tuned the "transfer-limit" parameters on 9 nodes.
>>>>>>>>>> 
>>>>>>>>>> below is some commands output:
>>>>>>>>>> riak-admin member-status
>>>>>>>>>> ================================= Membership 
>>>>>>>>>> ==================================
>>>>>>>>>> Status     Ring    Pending    Node
>>>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>>>> leaving     6.3%     10.9%    'riak@10.21.136.91'
>>>>>>>>>> leaving     9.4%     10.9%    'riak@10.21.136.92'
>>>>>>>>>> leaving     6.3%     10.9%    'riak@10.21.136.93'
>>>>>>>>>> leaving     6.3%     10.9%    'riak@10.21.136.94'
>>>>>>>>>> valid      10.9%     10.9%    'riak@10.21.136.66'
>>>>>>>>>> valid      12.5%     10.9%    'riak@10.21.136.71'
>>>>>>>>>> valid      18.8%     10.9%    'riak@10.21.136.76'
>>>>>>>>>> valid      18.8%     12.5%    'riak@10.21.136.81'
>>>>>>>>>> valid      10.9%     10.9%    'riak@10.21.136.86'
>>>>>>>>>> 
>>>>>>>>>>  riak-admin transfer_limit
>>>>>>>>>> =============================== Transfer Limit 
>>>>>>>>>> ================================
>>>>>>>>>> Limit        Node
>>>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>>>>   200        'riak@10.21.136.66'
>>>>>>>>>>   200        'riak@10.21.136.71'
>>>>>>>>>>   100        'riak@10.21.136.76'
>>>>>>>>>>   100        'riak@10.21.136.81'
>>>>>>>>>>   200        'riak@10.21.136.86'
>>>>>>>>>>   500        'riak@10.21.136.91'
>>>>>>>>>>   500        'riak@10.21.136.92'
>>>>>>>>>>   500        'riak@10.21.136.93'
>>>>>>>>>>   500        'riak@10.21.136.94'
>>>>>>>>>> 
>>>>>>>>>> Any more details for your diagnosing the problem?
>>>>>>>>>> 
>>>>>>>>>> Amao
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> riak-users mailing list
>>>>>>>>>> riak-users@lists.basho.com
>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> riak-users mailing list
>>>>>>>> riak-users@lists.basho.com
>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> Amao Wang
>>>>>>> Best & Regards
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users@lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Amao Wang
>>>>> Best & Regards
>>> 
>>> 
>>> 
>>> -- 
>>> Amao Wang
>>> Best & Regards
>> 
>> 
>> 
>> -- 
>> Amao Wang
>> Best & Regards
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to