Thanks all for the help. On Mon, Sep 6, 2021 at 2:20 PM manish khandelwal < manishkhandelwa...@gmail.com> wrote:
> Totally agree with Jeff and Bowen there. Don't try to achieve something > faster by cutting corners. Migration to GCP from physical DC should be done > on the same versions. > > On Mon, Sep 6, 2021 at 2:11 PM Bowen Song <bo...@bso.ng> wrote: > >> Hello Ashish, >> >> >> I'm slightly worried about this: >> >> *Since I won't be needing physical DC anymore so instead of upgrading it >> I will simply discard that DC* >> >> This sounds like you are planning to add GCP 3.x to existing cluster, and >> upgrade GCP to 4.0, then decommission the existing DC without upgrading. If >> so, you need to think twice. Adding or removing nodes (or DCs) in a cluster >> with different versions is not recommended. I'd highly recommend you >> upgrade the existing DC before decommissioning it. Of course, you can skip >> the upgrade sstables on it which is often the most time consuming part. >> >> >> Cheers, >> >> Bowen >> >> On 06/09/2021 03:29, MyWorld wrote: >> >> Hi Jeff, >> >> *When you’re upgrading or rebuilding you want all copies on the same >> version with proper sstables . So either add GCP then upgrade to 4.0 or >> upgrade to 4.0 and then expand to GCP. Don’t do them at the same time.* >> >> I think I forgot to mention one thing that after completion of step 1 our >> GCP data center will be added with rebuild done on all nodes. So our >> complete cluster would be on 3.0.9 after step 1. Will change num_tokens >> from current 256 to 16 in GCP data center in this step only. >> >> DC1 - >> 5nodes (physical) - version 3.0.9 >> numtokens256 >> DC2 - >> 5nodes (GCP) - version 3.0.9 >> numtokens16 >> >> Rest all step from 2-5 are meant for upgradation in which I am planning >> to go DC wise upgradation and running upgradesstables on GCP first. >> >> DC1 - >> 5nodes (physical) - version 3.0.9 >> numtokens256 >> DC2 - >> 5nodes (GCP) - version 4.0.0 >> numtokens16 >> >> Since I won't be needing physical DC anymore so instead of upgrading it I >> will simply discard that DC >> >> Regards, >> Ashish >> >> On Mon, Sep 6, 2021, 7:31 AM Jeff Jirsa <jji...@gmail.com> wrote: >> >>> In-line >>> >>> On Sep 3, 2021, at 11:12 AM, MyWorld <timeplus.1...@gmail.com> wrote: >>> >>> >>> Hi Jeff, >>> Thanks for your response. >>> To answer your question, Yes, we have created dev environment by >>> restoring them from snapshot/CSV files. >>> >>> Just one follow up question, I have a 5-node single DC on production on >>> version 3.0.9on physical server. >>> We are planning to migrate to GCP along with upgradation using below >>> steps. >>> 1. Setup GCP data center with same version 3.0.9 and rebuild complete >>> data >>> 2. Now install and configure 4.0 version in new GCP data center on all 5 >>> nodes >>> 3. Stop version 3.0.9 and start 4.0 on all 5 nodes of GCP one by one >>> 4. Run upgradesstables one by one on all 5 nodes of GCP >>> 5.Later move read/write traffic to GCP and remove old datacenter which >>> is still on version 3.0.9 >>> >>> Please guide on few things: >>> 1. Is the above mention approach right? >>> >>> >>> When you’re upgrading or rebuilding you want all copies on the same >>> version with proper sstables . So either add GCP then upgrade to 4.0 or >>> upgrade to 4.0 and then expand to GCP. Don’t do them at the same time. >>> >>> >>> 2. OR should we update 4.0 on only one node on GCP at a time and run >>> upgrade sstables on just one node first >>> >>> >>> I usually do upgradesstables after all bounces are done >>> >>> The only exception is perhaps doing upgradesstables with exactly one >>> copy via backup/restore to make sure 4.0 works with your data files, which >>> it sounds like you’ve already done. >>> >>> 3. OR should we migrate to GCP first and then think of upgrade 4.0 later >>> 4. OR Is there any reason I should upgrade to 3.11.x first >>> >>> >>> Not 3.11 but maybe latest 3.0 instead >>> >>> >>> >>> Regards, >>> Ashish >>> >>> On Fri, Sep 3, 2021, 11:11 PM Jeff Jirsa <jji...@gmail.com> wrote: >>> >>>> >>>> >>>> On Fri, Sep 3, 2021 at 10:33 AM MyWorld <timeplus.1...@gmail.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> We are doing a POC on dev environment to upgrade apache cassandra >>>>> 3.0.9 to 4.0.0. We have the below setup currently on cassandra 3.0.9 >>>>> DC1 - GCP(india) - 1 node >>>>> DC2 - GCP(US) - 1 node >>>>> >>>> >>>> 3.0.9 is very old. It's got older version of data files and some known >>>> correctness bugs. >>>> >>>> >>>>> >>>>> For upgradation, we carried out below steps on DC2 - GCP(US) node: >>>>> Step1. Install apache cassandra 4.0.0 >>>>> Step2. Did all Configuration settings >>>>> Step3. Stop apache cassandra 3.0.9 >>>>> Step4. Start apache cassandra 4.0.0 and monitor logs >>>>> Step5. Run nodetool upgradesstables and monitor logs >>>>> >>>>> After monitoring logs, I had below observations: >>>>> *1. Initially during bootstrap at Step4, received below exceptions:* >>>>> a) Exception (java.lang.IllegalArgumentException) encountered during >>>>> startup: Invalid sstable file manifest.json: the name doesn't look like a >>>>> supported sstable file name >>>>> java.lang.IllegalArgumentException: Invalid sstable file >>>>> manifest.json: the name doesn't look like a supported sstable file name >>>>> b) ERROR [main] 2021-08-29 06:25:52,120 CassandraDaemon.java:909 - >>>>> Exception encountered during startup >>>>> java.lang.IllegalArgumentException: Invalid sstable file schema.cql: >>>>> the name doesn't look like a supported sstable file name >>>>> >>>>> >>>> *In order to resolve, we removed manifest.json and schema.cql files >>>>> from each table directory and the issue was resolved. * >>>>> >>>> >>>> Did you restore these from backup/snapshot? >>>> >>>> >>>>> >>>>> *2. After resolving the above issue, we received below WARN messages >>>>> during bootstrap(step 4).* >>>>> *WARN * [main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:305 - >>>>> Origin of 1 sstables is unknown or doesn't match the local node; >>>>> commitLogIntervals for them were ignored >>>>> *DEBUG *[main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:306 - >>>>> Ignored commitLogIntervals from the following sstables: >>>>> [/opt1/cassandra_poc/data/clickstream/glcat_mcat_by_flname-af4e3ac0ace511ebaf9ec13e37d013c2/mc-1-big-Data.db] >>>>> *WARN *[main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:305 - >>>>> Origin of 2 sstables is unknown or doesn't match the local node; >>>>> commitLogIntervals for them were ignored >>>>> *DEBUG *[main] 2021-08-29 06:33:25,738 CommitLogReplayer.java:306 - >>>>> Ignored commitLogIntervals from the following sstables: >>>>> [/opt1/cassandra_poc/data/clickstream/gl_city_map >>>>> >>>>> >>>> Your data files dont match the commitlog files it expects to see. >>>> Either you restored these from backup, or it's because 3.0.9 is much older >>>> than 3.0.x that is more commonly used. >>>> >>>> >>>>> *3. While upgrading sstables (step 5), we received below messages:* >>>>> *WARN* [CompactionExecutor:3] 2021-08-29 07:47:32,828 >>>>> DuplicateRowChecker.java:96 - Detected 2 duplicate rows for 29621439 >>>>> during >>>>> Upgrade sstables. >>>>> *WARN* [CompactionExecutor:3] 2021-08-29 07:47:32,831 >>>>> DuplicateRowChecker.java:96 - Detected 4 duplicate rows for 45016570 >>>>> during >>>>> Upgrade sstables. >>>>> *WARN* [CompactionExecutor:3] 2021-08-29 07:47:32,833 >>>>> DuplicateRowChecker.java:96 - Detected 3 duplicate rows for 61260692 >>>>> during >>>>> Upgrade sstables. >>>>> >>>>> >>>> This says you have corrupt data from an old bug. Probably related to >>>> 2.1 -> 3.0 upgrades, if this was originally on 2.1. If you read those keys, >>>> you would find that the data returns 2-4 rows where it should be exactly 1. >>>> >>>> >>>>> 4.* Also, received below messages during upgrade* >>>>> *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,347 >>>>> InitialConnectionHandler.java:77 - OPTIONS received 5/v5 >>>>> *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,349 >>>>> InitialConnectionHandler.java:121 - Response to STARTUP sent, configuring >>>>> pipeline for 5/v5 >>>>> *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,350 >>>>> InitialConnectionHandler.java:153 - Configured pipeline: >>>>> DefaultChannelPipeline{(frameDecoder = >>>>> org.apache.cassandra.net.FrameDecoderCrc), (frameEncoder = >>>>> org.apache.cassandra.net.FrameEncoderCrc), (cqlProcessor = >>>>> org.apache.cassandra.transport.CQLMessageHandler), (exceptionHandler = >>>>> org.apache.cassandra.transport.ExceptionHandlers$PostV5ExceptionHandler)} >>>>> >>>>> >>>> Logs of debug stuff, normal. It's the netty connection pipelines being >>>> setup. >>>> >>>> >>>>> *5. After upgrade, we are regularly getting below messages:* >>>>> *DEBUG* [ScheduledTasks:1] 2021-09-02 00:03:20,910 >>>>> SSLFactory.java:354 - Checking whether certificates have been updated [] >>>>> *DEBUG* [ScheduledTasks:1] 2021-09-02 00:13:20,910 >>>>> SSLFactory.java:354 - Checking whether certificates have been updated [] >>>>> *DEBUG* [ScheduledTasks:1] 2021-09-02 00:23:20,911 >>>>> SSLFactory.java:354 - Checking whether certificates have been updated [] >>>>> >>>>> Normal. It's checking to see if the ssl cert changed, and if it did, >>>> it would reload it. >>>> >>>> >>>>> *Can someone please explain what these above ERROR / WARN / DEBUG >>>>> messages refer to? Is there anything to be concerned about?* >>>>> >>>>> *Also, received 2 READ_REQ dropped messages (may be due to nw >>>>> latency) * >>>>> *INFO* [ScheduledTasks:1] 2021-09-03 11:40:10,009 >>>>> MessagingMetrics.java:206 - READ_REQ messages were dropped in last 5000 >>>>> ms: >>>>> 0 internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean >>>>> cross-node dropped latency: 12359 ms >>>>> *INFO* [ScheduledTasks:1] 2021-09-03 13:27:15,291 >>>>> MessagingMetrics.java:206 - READ_REQ messages were dropped in last 5000 >>>>> ms: >>>>> 0 internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean >>>>> cross-node dropped latency: 5960 ms >>>>> >>>>> >>>> 12s and 6s cross-node latency isn't hugely surprising from US to India, >>>> given the geographical distance and likelihood of packet loss across that >>>> distance. Losing 1 read request every few hours seems like it's within >>>> normal expectations. >>>> >>>> >>>> >>>>> Rest of the stats are pretty much normal (tpstats, status, info, >>>>> tablestats, etc) >>>>> >>>>> Regards, >>>>> Ashish >>>>> >>>>>