I can reproduce the issue.

I did drain Cassandra node then stop and started Cassandra instance .
Cassandra instance comes up but other nodes will be in DN state around 10
minutes.

I don't see error in the systemlog

DN  xx.xx.xx.59   420.85 MiB  256          48.2%             id  2
UN  xx.xx.xx.30   432.14 MiB  256          50.0%             id  0
UN  xx.xx.xx.79   447.33 MiB  256          51.1%             id  4
DN  xx.xx.xx.144  452.59 MiB  256          51.6%             id  1
DN  xx.xx.xx.19   431.7 MiB  256          50.1%             id  5
UN  xx.xx.xx.6    421.79 MiB  256          48.9%

when i do nodetool status 3 nodes still showing down. and i dont see errors
in system.log

and after 10 mins it shows the other node is up as well.


INFO  [HANDSHAKE-/10.72.100.156] 2019-11-05 15:05:09,133
OutboundTcpConnection.java:561 - Handshaking version with /stopandstarted
node
INFO  [RequestResponseStage-7] 2019-11-05 15:16:27,166 Gossiper.java:1019 -
InetAddress /nodewhichitwasshowing down is now UP

what is causing delay for 10mins to be able to say that node is reachable

On Wed, Oct 30, 2019, 8:37 AM Rahul Reddy <rahulreddy1...@gmail.com> wrote:

> And also aws ec2 stop and start comes with new instance with same ip and
> all our file systems are in ebs mounted fine.  Does coming new instance
> with same ip cause any gossip issues?
>
> On Tue, Oct 29, 2019, 6:16 PM Rahul Reddy <rahulreddy1...@gmail.com>
> wrote:
>
>> Thanks Alex. We have 6 nodes in each DC with RF=3  with CL local qourum .
>> and we stopped and started only one instance at a time . Tough nodetool
>> status says all nodes UN and system.log says canssandra started and started
>> listening . Jmx explrter shows instance stayed down longer how do we
>> determine what caused  the Cassandra unavialbe though log says its stared
>> and listening ?
>>
>> On Tue, Oct 29, 2019, 4:44 PM Oleksandr Shulgin <
>> oleksandr.shul...@zalando.de> wrote:
>>
>>> On Tue, Oct 29, 2019 at 9:34 PM Rahul Reddy <rahulreddy1...@gmail.com>
>>> wrote:
>>>
>>>>
>>>> We have our infrastructure on aws and we use ebs storage . And aws was
>>>> retiring on of the node. Since our storage was persistent we did nodetool
>>>> drain and stopped and start the instance . This caused 500 errors in the
>>>> service. We have local_quorum and rf=3 why does stopping one instance cause
>>>> application to have issues?
>>>>
>>>
>>> Can you still look up what was the underlying error from Cassandra
>>> driver in the application logs?  Was it request timeout or not enough
>>> replicas?
>>>
>>> For example, if you only had 3 Cassandra nodes, restarting one of them
>>> reduces your cluster capacity by 33% temporarily.
>>>
>>> Cheers,
>>> --
>>> Alex
>>>
>>>

Reply via email to