Hi Paul,

>From the gossipinfo output, it looks like the node's IP address and
rpc_address are different.
/192.168.*187*.121 vs RPC_ADDRESS:192.168.*185*.121
You can also see that there's a schema disagreement between nodes, e.g.
schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002
it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801.
You can run nodetool describecluster to see it as well.
So I suggest to change the rpc_address to the ip_address of the node or set
it to 0.0.0.0 and it should resolve the issue.

Hope this helps!


On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen <inquial...@gmail.com>
wrote:

> Hello ,
>
> Check and compare everything parameters
>
> 1. Java version should ideally match across all nodes in the cluster
> 2. Check if port 7000 is open between the nodes. Use telnet or nc commands
> 3. You must see some clues in system logs, why the gossip is failing.
>
> Do confirm on the above things.
>
> Thanks
>
>
> On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, <pm...@whoi.edu> wrote:
>
>> NTP was restarted on the Cassandra nodes, but unfortunately I’m still
>> getting the same result: the restarted node does not appear to be rejoining
>> the cluster.
>>
>>
>>
>> Here’s another data point: “nodetool gossipinfo”, when run from the
>> restarted node (“node001”) shows a status of “normal”:
>>
>>
>>
>> user@node001=> nodetool -u gossipinfo
>>
>> /192.168.187.121
>>
>>   generation:1574364410
>>
>>   heartbeat:209150
>>
>>   NET_VERSION:8
>>
>>   RACK:rack1
>>
>>   STATUS:NORMAL,-104847506331695918
>>
>>   RELEASE_VERSION:2.1.9
>>
>>   SEVERITY:0.0
>>
>>   LOAD:5.78684155614E11
>>
>>   HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
>>
>>   SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
>>
>>   DC:datacenter1
>>
>>   RPC_ADDRESS:192.168.185.121
>>
>>
>>
>> When run from one of the other nodes, however, node001’s status is shown
>> as “shutdown”:
>>
>>
>>
>> user@node002=> nodetool gossipinfo
>>
>> /192.168.187.121
>>
>>   generation:1491825076
>>
>>   heartbeat:2147483647
>>
>>   STATUS:shutdown,true
>>
>>   RACK:rack1
>>
>>   NET_VERSION:8
>>
>>   LOAD:5.78679987693E11
>>
>>   RELEASE_VERSION:2.1.9
>>
>>   DC:datacenter1
>>
>>   SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
>>
>>   HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
>>
>>   RPC_ADDRESS:192.168.185.121
>>
>>   SEVERITY:0.0
>>
>>
>>
>>
>>
>> *Paul Mena*
>>
>> Senior Application Administrator
>>
>> WHOI - Information Services
>>
>> 508-289-3539
>>
>>
>>
>> *From:* Paul Mena
>> *Sent:* Monday, November 25, 2019 9:29 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* RE: Cassandra is not showing a node up hours after restart
>>
>>
>>
>> I’ve just discovered that NTP is not running on any of these Cassandra
>> nodes, and that the timestamps are all over the map. Could this be causing
>> my issue?
>>
>>
>>
>> user@remote=> ansible pre-prod-cassandra -a date
>>
>> node001.intra.myorg.org | CHANGED | rc=0 >>
>>
>> Mon Nov 25 13:58:17 UTC 2019
>>
>>
>>
>> node004.intra.myorg.org | CHANGED | rc=0 >>
>>
>> Mon Nov 25 14:07:20 UTC 2019
>>
>>
>>
>> node003.intra.myorg.org | CHANGED | rc=0 >>
>>
>> Mon Nov 25 13:57:06 UTC 2019
>>
>>
>>
>> node001.intra.myorg.org | CHANGED | rc=0 >>
>>
>> Mon Nov 25 14:07:22 UTC 2019
>>
>>
>>
>> *Paul Mena*
>>
>> Senior Application Administrator
>>
>> WHOI - Information Services
>>
>> 508-289-3539
>>
>>
>>
>> *From:* Inquistive allen <inquial...@gmail.com>
>> *Sent:* Monday, November 25, 2019 2:46 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Cassandra is not showing a node up hours after restart
>>
>>
>>
>> Hello team,
>>
>>
>>
>> Just to add on to the discussion, one may run,
>>
>> Nodetool disablebinary followed by a nodetool disablethrift followed by
>> nodetool drain.
>>
>> Nodetool drain also does the work of nodetool flush+ declaring in the
>> cluster that I'm down and not accepting traffic.
>>
>>
>>
>> Thanks
>>
>>
>>
>>
>>
>> On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, <surbhi.gupt...@gmail.com>
>> wrote:
>>
>> Before Cassandra shutdown, nodetool drain should be executed first. As
>> soon as you do nodetool drain, others node will see this node down and no
>> new traffic will come to this node.
>>
>> I generally gives 10 seconds gap between nodetool drain and Cassandra
>> stop.
>>
>>
>>
>> On Sun, Nov 24, 2019 at 9:52 AM Paul Mena <pm...@whoi.edu> wrote:
>>
>> Thank you for the replies. I had made no changes to the config before the
>> rolling restart.
>>
>>
>>
>> I can try another restart but was wondering if I should do it
>> differently. I had simply done "service cassandra stop" followed by
>> "service cassandra start".  Since then I've seen some suggestions to
>> proceed the shutdown with "nodetool disablegossip" and/or "nodetool drain".
>> Are these commands advisable? Are any other commands recommended either
>> before the shutdown or after the startup?
>>
>>
>>
>> Thanks again!
>>
>>
>>
>> Paul
>> ------------------------------
>>
>> *From:* Naman Gupta <naman.gu...@girnarsoft.com>
>> *Sent:* Sunday, November 24, 2019 11:18:14 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Cassandra is not showing a node up hours after restart
>>
>>
>>
>> Did you change the name of datacenter or any other config changes before
>> the rolling restart?
>>
>>
>>
>> On Sun, Nov 24, 2019 at 8:49 PM Paul Mena <pm...@whoi.edu> wrote:
>>
>> I am in the process of doing a rolling restart on a 4-node cluster
>> running Cassandra 2.1.9. I stopped and started Cassandra on node 1 via
>> "service cassandra stop/start", and noted nothing unusual in either
>> system.log or cassandra.log. Doing a "nodetool status" from node 1 shows
>> all four nodes up:
>>
>>
>>
>> user@node001=> nodetool status
>>
>> Datacenter: datacenter1
>>
>> =======================
>>
>> Status=Up/Down
>>
>> |/ State=Normal/Leaving/Joining/Moving
>>
>> --  Address          Load       Tokens  Owns    Host ID                      
>>          Rack
>>
>> UN  192.168.187.121  538.95 GB  256     ?       
>> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>>
>> UN  192.168.187.122  630.72 GB  256     ?       
>> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>>
>> UN  192.168.187.123  572.73 GB  256     ?       
>> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>>
>> UN  192.168.187.124  625.05 GB  256     ?       
>> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>>
>> But doing the same command from any other of the 3 nodes shows node 1
>> still down.
>>
>>
>>
>> user@node002=> nodetool status
>>
>> Datacenter: datacenter1
>>
>> =======================
>>
>> Status=Up/Down
>>
>> |/ State=Normal/Leaving/Joining/Moving
>>
>> --  Address          Load       Tokens  Owns    Host ID                      
>>          Rack
>>
>> DN  192.168.187.121  538.94 GB  256     ?       
>> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>>
>> UN  192.168.187.122  630.72 GB  256     ?       
>> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>>
>> UN  192.168.187.123  572.73 GB  256     ?       
>> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>>
>> UN  192.168.187.124  625.04 GB  256     ?       
>> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>>
>> Is there something I can do to remedy this current situation - so that I
>> can continue with the rolling restart?
>>
>>
>>
>>

Reply via email to