Hi Andrew,

We were having problems with gossip TCP connections being held open and
changed our SOP for stopping cassandra to being:

nodetool disablegossip
nodetool drain
service cassandra stop

This seemed to close down the gossip cleanly (the nodetool drain is advised
as well) and meant that the node rejoined the cluster fine after issuing
"service cassandra start".

*Ben*

On 1 March 2017 at 16:29, Andrew Jorgensen <and...@andrewjorgensen.com>
wrote:

> Helllo,
>
> I have a cassandra cluster running on cassandra 3.0.3 and am seeing some
> strange behavior that I cannot explain when restarting cassandra nodes. The
> cluster is currently setup in a single datacenter and consists of 55 nodes.
> I am currently in the process of restarting nodes in the cluster but have
> noticed that after restarting the cassandra process with `service cassandra
> start; service cassandra stop` when the node comes back and I run `nodetool
> status` there is usually a non-zero number of nodes in the rest of the
> cluster that are marked as DN. If I got to another node in the cluster,
> from its perspective all nodes included the restarted one are marked as UN.
> It seems to take ~15 to 20 minutes before the restarted node is updated to
> show all nodes as UN. During the 15 minutes writes and reads . to the
> cluster appear to be degraded and do not recover unless I stop the
> cassandra process again or wait for all nodes to be marked as UN. The
> cluster also has 3 seed nodes which during this process are up and
> available the whole time.
>
> I have also tried doing `gossipinfo` on the restarted node and according
> to the output all nodes have a status of NORMAL. Has anyone seen this
> before and is there anything I can do to fix/reduce the impact of running a
> restart on a cassandra node?
>
> Thanks,
> Andrew Jorgensen
> @ajorgensen
>

Reply via email to