Hi Paul, >From the gossipinfo output, it looks like the node's IP address and rpc_address are different. /192.168.*187*.121 vs RPC_ADDRESS:192.168.*185*.121 You can also see that there's a schema disagreement between nodes, e.g. schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002 it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801. You can run nodetool describecluster to see it as well. So I suggest to change the rpc_address to the ip_address of the node or set it to 0.0.0.0 and it should resolve the issue.
Hope this helps! On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen <inquial...@gmail.com> wrote: > Hello , > > Check and compare everything parameters > > 1. Java version should ideally match across all nodes in the cluster > 2. Check if port 7000 is open between the nodes. Use telnet or nc commands > 3. You must see some clues in system logs, why the gossip is failing. > > Do confirm on the above things. > > Thanks > > > On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, <pm...@whoi.edu> wrote: > >> NTP was restarted on the Cassandra nodes, but unfortunately I’m still >> getting the same result: the restarted node does not appear to be rejoining >> the cluster. >> >> >> >> Here’s another data point: “nodetool gossipinfo”, when run from the >> restarted node (“node001”) shows a status of “normal”: >> >> >> >> user@node001=> nodetool -u gossipinfo >> >> /192.168.187.121 >> >> generation:1574364410 >> >> heartbeat:209150 >> >> NET_VERSION:8 >> >> RACK:rack1 >> >> STATUS:NORMAL,-104847506331695918 >> >> RELEASE_VERSION:2.1.9 >> >> SEVERITY:0.0 >> >> LOAD:5.78684155614E11 >> >> HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b >> >> SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 >> >> DC:datacenter1 >> >> RPC_ADDRESS:192.168.185.121 >> >> >> >> When run from one of the other nodes, however, node001’s status is shown >> as “shutdown”: >> >> >> >> user@node002=> nodetool gossipinfo >> >> /192.168.187.121 >> >> generation:1491825076 >> >> heartbeat:2147483647 >> >> STATUS:shutdown,true >> >> RACK:rack1 >> >> NET_VERSION:8 >> >> LOAD:5.78679987693E11 >> >> RELEASE_VERSION:2.1.9 >> >> DC:datacenter1 >> >> SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 >> >> HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b >> >> RPC_ADDRESS:192.168.185.121 >> >> SEVERITY:0.0 >> >> >> >> >> >> *Paul Mena* >> >> Senior Application Administrator >> >> WHOI - Information Services >> >> 508-289-3539 >> >> >> >> *From:* Paul Mena >> *Sent:* Monday, November 25, 2019 9:29 AM >> *To:* user@cassandra.apache.org >> *Subject:* RE: Cassandra is not showing a node up hours after restart >> >> >> >> I’ve just discovered that NTP is not running on any of these Cassandra >> nodes, and that the timestamps are all over the map. Could this be causing >> my issue? >> >> >> >> user@remote=> ansible pre-prod-cassandra -a date >> >> node001.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 13:58:17 UTC 2019 >> >> >> >> node004.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 14:07:20 UTC 2019 >> >> >> >> node003.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 13:57:06 UTC 2019 >> >> >> >> node001.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 14:07:22 UTC 2019 >> >> >> >> *Paul Mena* >> >> Senior Application Administrator >> >> WHOI - Information Services >> >> 508-289-3539 >> >> >> >> *From:* Inquistive allen <inquial...@gmail.com> >> *Sent:* Monday, November 25, 2019 2:46 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Cassandra is not showing a node up hours after restart >> >> >> >> Hello team, >> >> >> >> Just to add on to the discussion, one may run, >> >> Nodetool disablebinary followed by a nodetool disablethrift followed by >> nodetool drain. >> >> Nodetool drain also does the work of nodetool flush+ declaring in the >> cluster that I'm down and not accepting traffic. >> >> >> >> Thanks >> >> >> >> >> >> On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, <surbhi.gupt...@gmail.com> >> wrote: >> >> Before Cassandra shutdown, nodetool drain should be executed first. As >> soon as you do nodetool drain, others node will see this node down and no >> new traffic will come to this node. >> >> I generally gives 10 seconds gap between nodetool drain and Cassandra >> stop. >> >> >> >> On Sun, Nov 24, 2019 at 9:52 AM Paul Mena <pm...@whoi.edu> wrote: >> >> Thank you for the replies. I had made no changes to the config before the >> rolling restart. >> >> >> >> I can try another restart but was wondering if I should do it >> differently. I had simply done "service cassandra stop" followed by >> "service cassandra start". Since then I've seen some suggestions to >> proceed the shutdown with "nodetool disablegossip" and/or "nodetool drain". >> Are these commands advisable? Are any other commands recommended either >> before the shutdown or after the startup? >> >> >> >> Thanks again! >> >> >> >> Paul >> ------------------------------ >> >> *From:* Naman Gupta <naman.gu...@girnarsoft.com> >> *Sent:* Sunday, November 24, 2019 11:18:14 AM >> *To:* user@cassandra.apache.org >> *Subject:* Re: Cassandra is not showing a node up hours after restart >> >> >> >> Did you change the name of datacenter or any other config changes before >> the rolling restart? >> >> >> >> On Sun, Nov 24, 2019 at 8:49 PM Paul Mena <pm...@whoi.edu> wrote: >> >> I am in the process of doing a rolling restart on a 4-node cluster >> running Cassandra 2.1.9. I stopped and started Cassandra on node 1 via >> "service cassandra stop/start", and noted nothing unusual in either >> system.log or cassandra.log. Doing a "nodetool status" from node 1 shows >> all four nodes up: >> >> >> >> user@node001=> nodetool status >> >> Datacenter: datacenter1 >> >> ======================= >> >> Status=Up/Down >> >> |/ State=Normal/Leaving/Joining/Moving >> >> -- Address Load Tokens Owns Host ID >> Rack >> >> UN 192.168.187.121 538.95 GB 256 ? >> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >> >> UN 192.168.187.122 630.72 GB 256 ? >> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >> >> UN 192.168.187.123 572.73 GB 256 ? >> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >> >> UN 192.168.187.124 625.05 GB 256 ? >> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >> >> But doing the same command from any other of the 3 nodes shows node 1 >> still down. >> >> >> >> user@node002=> nodetool status >> >> Datacenter: datacenter1 >> >> ======================= >> >> Status=Up/Down >> >> |/ State=Normal/Leaving/Joining/Moving >> >> -- Address Load Tokens Owns Host ID >> Rack >> >> DN 192.168.187.121 538.94 GB 256 ? >> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >> >> UN 192.168.187.122 630.72 GB 256 ? >> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >> >> UN 192.168.187.123 572.73 GB 256 ? >> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >> >> UN 192.168.187.124 625.04 GB 256 ? >> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >> >> Is there something I can do to remedy this current situation - so that I >> can continue with the rolling restart? >> >> >> >>