help with control.sh script

facundo.maldonado Thu, 15 Apr 2021 11:15:44 -0700

I'm testing a 32 nodes cluster with a partitioned cache with one backup.
If 2 of them crashed (not if, when) I have the lost partitions problem.


Now I ssh to one of the nodes and execute *control.sh --baseline.*
>From every node other than the one marked as "coordinator" (?) I get this
output:

--------------------------------------------------------------------------------
Failed to execute baseline command='collect'
Failed to communicate with grid nodes (maximum count of retries reached).
Connection to cluster failed. Failed to communicate with grid nodes (maximum
count of retries reached).

Ok, I went to every node and do the same until I found the 'coordinator'.
Once I made the failing nodes get online again I execute:
*control.sh --cache reset_lost_partitions mycache*

To my surprise, I'm getting 
--------------------------------------------------------------------------------
Connection to cluster failed. Failed to communicate with grid nodes (maximum
count of retries reached).

So, started again looking for the nodes where that command actually works.

I'm sure I'm doing something wrong. Could someone help me?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

help with control.sh script

Reply via email to