[ https://issues.apache.org/jira/browse/CASSANDRA-18075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721543#comment-17721543 ]
Ryan Koski commented on CASSANDRA-18075: ---------------------------------------- Aaron these nodes should not have any firewalls between them. The only thing that has any knowledge of ports is the LB that sit in front of the cluster. > Upgraded (C* 4.0.4) node stops communicating with older version (3.11.4) > nodes during upgrade > --------------------------------------------------------------------------------------------- > > Key: CASSANDRA-18075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18075 > Project: Cassandra > Issue Type: Bug > Components: Feature/Encryption > Reporter: Alaykumar Barochia > Priority: Normal > Attachments: In-place-upgrade.zip, cassandra-env.sh_3114, > cassandra-env.sh_404, cassandra.yaml_10.110.44.207_explicitely_set_port, > cassandra.yaml_10.110.49.242_explicitely_set_port, cassandra.yaml_3114, > cassandra.yaml_404, system.log_10.110.44.207, > system.log_10.110.44.207_after_explicitely_set_port, > system.log_10.110.49.242_after_explicitely_set_port > > > We are testing upgrade from Cassandra 3.11.4 to 4.0.4 on our test cluster > which is SSL enabled and facing an issue. > Our cluster size is 3x3. > {noformat} > Datacenter: abssl_dev_tap_ttc > ============================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.109.6.153 94.27 KiB 16 100.0% > 130e59d2-2a9a-4039-a42f-deb20afcf288 rack1 > UN 10.109.45.8 104.43 KiB 16 100.0% > 35274a2c-f915-4308-9981-d207a4e2108f rack1 > UN 10.109.66.149 104.23 KiB 16 100.0% > ea0151bc-fb6c-425d-af42-75c10e52f941 rack1 > Datacenter: abssl_dev_tap_tte > ============================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.110.4.110 104.44 KiB 16 100.0% > fd4a9fa8-f2a9-494c-afb8-7cb8a08c7554 rack1 > UN 10.110.44.220 99.33 KiB 16 100.0% > f1dc35c0-a1c2-45fe-9f65-b1cc3d7f6947 rack1 > UN 10.110.49.242 65.57 KiB 16 100.0% > 72bc4ae5-876d-4d0a-91ac-6cf8b531b4dd rack1 > dbaasprod-ca-abssl-de-393671-v001-yqlvf:~# nodetool describecluster > Cluster Information: > Name: abssl_dev > Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch > DynamicEndPointSnitch: enabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > f68fbc0c-c9d6-3709-8075-c5a0d74192f2: [10.110.4.110, > 10.110.44.220, 10.109.6.153, 10.109.45.8, 10.109.66.149, 10.110.49.242] > {noformat} > During the upgrade, we re-run the pipeline in which we get new server (with > different IP) that will have Cassandra 4.0.4 binary. > Disk '/data' (contains data files, commitlogs etc.) will get detached from > the old server and get attached to the new server. > This process works fine on non-SSL cluster but when we perform this on SSL > cluster, new node stops communicating with the rest of the nodes. > In this example, after upgrade, node 10.110.4.110 got replaced with new > server with new IP 10.110.44.207. > *Output from 3.11.4 node:* > {noformat} > dbaasprod-ca-abssl-dc-437097-v001-7mump:~# hostname -i > 10.109.6.153 > dbaasprod-ca-abssl-dc-437097-v001-7mump:~# java -version > openjdk version "1.8.0_322" > OpenJDK Runtime Environment (Temurin)(build 1.8.0_322-b06) > OpenJDK 64-Bit Server VM (Temurin)(build 25.322-b06, mixed mode) > dbaasprod-ca-abssl-dc-437097-v001-7mump:~# nodetool status > Datacenter: abssl_dev_tap_ttc > ============================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.109.6.153 135.24 KiB 16 100.0% > 130e59d2-2a9a-4039-a42f-deb20afcf288 rack1 > UN 10.109.45.8 135.35 KiB 16 100.0% > 35274a2c-f915-4308-9981-d207a4e2108f rack1 > UN 10.109.66.149 135.25 KiB 16 100.0% > ea0151bc-fb6c-425d-af42-75c10e52f941 rack1 > Datacenter: abssl_dev_tap_tte > ============================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > DN 10.110.4.110 104.44 KiB 16 100.0% > fd4a9fa8-f2a9-494c-afb8-7cb8a08c7554 rack1 > UN 10.110.44.220 104.44 KiB 16 100.0% > f1dc35c0-a1c2-45fe-9f65-b1cc3d7f6947 rack1 > UN 10.110.49.242 65.57 KiB 16 100.0% > 72bc4ae5-876d-4d0a-91ac-6cf8b531b4dd rack1 > dbaasprod-ca-abssl-dc-437097-v001-7mump:~# nodetool describecluster > Cluster Information: > Name: abssl_dev > Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch > DynamicEndPointSnitch: enabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > f68fbc0c-c9d6-3709-8075-c5a0d74192f2: [10.110.44.220, > 10.109.6.153, 10.109.45.8, 10.109.66.149, 10.110.49.242] > UNREACHABLE: [10.110.4.110] > {noformat} > *Output from 4.0.4 node:* > {noformat} > dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# hostname -i > 10.110.44.207 > dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# java -version > openjdk version "11.0.15" 2022-04-19 > OpenJDK Runtime Environment Temurin-11.0.15+10 (build 11.0.15+10) > OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (build 11.0.15+10, mixed mode) > dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# nodetool status > Datacenter: DC1 > =============== > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > DN 10.109.6.153 ? 16 0.0% > 130e59d2-2a9a-4039-a42f-deb20afcf288 r1 > DN 10.109.45.8 ? 16 0.0% > 35274a2c-f915-4308-9981-d207a4e2108f r1 > DN 10.109.66.149 ? 16 0.0% > ea0151bc-fb6c-425d-af42-75c10e52f941 r1 > DN 10.110.44.220 ? 16 0.0% > f1dc35c0-a1c2-45fe-9f65-b1cc3d7f6947 r1 > DN 10.110.49.242 ? 16 0.0% > 72bc4ae5-876d-4d0a-91ac-6cf8b531b4dd r1 > Datacenter: abssl_dev_tap_tte > ============================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 10.110.44.207 146.27 KiB 16 100.0% > fd4a9fa8-f2a9-494c-afb8-7cb8a08c7554 rack1 > dbaasprod-ca-abssl-de-393671-v003-dxpyv:~# nodetool describecluster > Cluster Information: > Name: abssl_dev > Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch > DynamicEndPointSnitch: disabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > 1ccaeb62-5816-3599-897f-de59fd56eef2: [10.110.44.207] > UNREACHABLE: [10.109.45.8, 10.109.66.149, 10.110.44.220, > 10.109.6.153, 10.110.49.242] > Stats for all nodes: > Live: 1 > Joining: 0 > Moving: 0 > Leaving: 0 > Unreachable: 5 > Data Centers: > DC1 #Nodes: 5 #Down: 0 > abssl_dev_tap_tte #Nodes: 1 #Down: 0 > Database versions: > : [10.109.45.8:7000, 10.109.66.149:7000, 10.110.44.220:7000, > 10.109.6.153:7000, 10.110.49.242:7000] > 4.0.4: [10.110.44.207:7000] > Keyspaces: > system_schema -> Replication class: LocalStrategy {} > system -> Replication class: LocalStrategy {} > system_auth -> Replication class: NetworkTopologyStrategy > {abssl_dev_tap_tte=3, abssl_dev_tap_ttc=3} > system_distributed -> Replication class: NetworkTopologyStrategy > {abssl_dev_tap_tte=3, abssl_dev_tap_ttc=3} > system_traces -> Replication class: NetworkTopologyStrategy > {abssl_dev_tap_tte=3, abssl_dev_tap_ttc=3} > {noformat} > Getting below error in system.log file of new node 10.110.44.207 which has > Cassandra version 4.0.4. > {noformat} > WARN [Messaging-EventLoop-3-6] 2022-11-28 06:20:49,577 NoSpamLogger.java:95 > - /10.110.44.207:7000->/10.109.45.8:7000-URGENT_MESSAGES-[no-channel] > dropping message of type GOSSIP_DIGEST_SYN whose timeout expired before > reaching the network > INFO [Messaging-EventLoop-3-6] 2022-11-28 06:21:17,921 NoSpamLogger.java:92 > - /10.110.44.207:7000->/10.110.49.242:7000-URGENT_MESSAGES-[no-channel] > failed to connect > io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) > failed: Connection refused: /10.110.49.242:7000 > Caused by: java.net.ConnectException: finishConnect(..) failed: Connection > refused > at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124) > at io.netty.channel.unix.Socket.finishConnect(Socket.java:251) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:673) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:650) > at > io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:530) > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470) > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > at > io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > at > io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:829) > {noformat} > I am attaching the cassandra.yaml, cassandra-env.sh files from both versions > (3.11.4 and 4.0.4). > Also attaching the system.log file from upgraded node 10.110.44.207. > It seems like some bug and hence raising this Jira. Can you please have a > look? > Let me know if you need any more details. > Thanks, > Alaykumar Barochia -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org