Totally agree with Jeff and Bowen there. Don't try to achieve something
faster by cutting corners. Migration to GCP from physical DC should be done
on the same versions.

On Mon, Sep 6, 2021 at 2:11 PM Bowen Song <bo...@bso.ng> wrote:

> Hello Ashish,
>
>
> I'm slightly worried about this:
>
> *Since I won't be needing physical DC anymore so instead of upgrading it I
> will simply discard that DC*
>
> This sounds like you are planning to add GCP 3.x to existing cluster, and
> upgrade GCP to 4.0, then decommission the existing DC without upgrading. If
> so, you need to think twice. Adding or removing nodes (or DCs) in a cluster
> with different versions is not recommended. I'd highly recommend you
> upgrade the existing DC before decommissioning it. Of course, you can skip
> the upgrade sstables on it which is often the most time consuming part.
>
>
> Cheers,
>
> Bowen
>
> On 06/09/2021 03:29, MyWorld wrote:
>
> Hi Jeff,
>
> *When you’re upgrading or rebuilding you want all copies on the same
> version with proper sstables . So either add GCP then upgrade to 4.0 or
> upgrade to 4.0 and then expand to GCP. Don’t do them at the same time.*
>
> I think I forgot to mention one thing that after completion of step 1 our
> GCP data center will be added with rebuild done on all nodes. So our
> complete cluster would be on 3.0.9 after step 1. Will change num_tokens
> from current 256 to 16 in GCP data center in this step only.
>
> DC1 -
> 5nodes (physical) - version 3.0.9
> numtokens256
> DC2 -
> 5nodes (GCP) - version 3.0.9
> numtokens16
>
> Rest all step from 2-5 are meant for upgradation in which I am planning to
> go DC wise upgradation and running upgradesstables on GCP first.
>
> DC1 -
> 5nodes (physical) - version 3.0.9
> numtokens256
> DC2 -
> 5nodes (GCP) - version 4.0.0
> numtokens16
>
> Since I won't be needing physical DC anymore so instead of upgrading it I
> will simply discard that DC
>
> Regards,
> Ashish
>
> On Mon, Sep 6, 2021, 7:31 AM Jeff Jirsa <jji...@gmail.com> wrote:
>
>> In-line
>>
>> On Sep 3, 2021, at 11:12 AM, MyWorld <timeplus.1...@gmail.com> wrote:
>>
>> 
>> Hi Jeff,
>> Thanks for your response.
>> To answer your question, Yes, we have created dev environment by
>> restoring them from snapshot/CSV files.
>>
>> Just one follow up question, I have a 5-node single DC on production on
>> version 3.0.9on physical server.
>> We are planning to migrate to GCP along with upgradation using below
>> steps.
>> 1. Setup GCP data center with same version 3.0.9 and rebuild complete data
>> 2. Now install and configure 4.0 version in new GCP data center on all 5
>> nodes
>> 3. Stop version 3.0.9 and start 4.0 on all 5 nodes of GCP one by one
>> 4. Run upgradesstables one by one on all 5 nodes of GCP
>> 5.Later move read/write traffic to GCP and remove old datacenter which is
>> still on version 3.0.9
>>
>> Please guide on few things:
>> 1. Is the above mention approach right?
>>
>>
>> When you’re upgrading or rebuilding you want all copies on the same
>> version with proper sstables . So either add GCP then upgrade to 4.0 or
>> upgrade to 4.0 and then expand to GCP. Don’t do them at the same time.
>>
>>
>> 2. OR should we update 4.0 on only one node on GCP at a time and run
>> upgrade sstables on just one node first
>>
>>
>> I usually do upgradesstables after all bounces are done
>>
>> The only exception is perhaps doing upgradesstables with exactly one copy
>> via backup/restore to make sure 4.0 works with your data files, which it
>> sounds like you’ve already done.
>>
>> 3. OR should we migrate to GCP first and then think of upgrade 4.0 later
>> 4. OR Is there any reason I should upgrade to 3.11.x first
>>
>>
>> Not 3.11 but maybe latest 3.0 instead
>>
>>
>>
>> Regards,
>> Ashish
>>
>> On Fri, Sep 3, 2021, 11:11 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>>
>>>
>>> On Fri, Sep 3, 2021 at 10:33 AM MyWorld <timeplus.1...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>> We are doing a POC on dev environment to upgrade apache cassandra 3.0.9
>>>> to 4.0.0. We have the below setup currently on cassandra 3.0.9
>>>> DC1 - GCP(india) - 1 node
>>>> DC2 - GCP(US) - 1 node
>>>>
>>>
>>> 3.0.9 is very old. It's got older version of data files and some known
>>> correctness bugs.
>>>
>>>
>>>>
>>>> For upgradation, we carried out below steps on DC2 - GCP(US) node:
>>>> Step1. Install apache cassandra 4.0.0
>>>> Step2. Did all Configuration settings
>>>> Step3. Stop apache cassandra 3.0.9
>>>> Step4. Start apache cassandra 4.0.0 and monitor logs
>>>> Step5. Run nodetool upgradesstables and monitor logs
>>>>
>>>> After monitoring logs, I had below observations:
>>>> *1. Initially during bootstrap at Step4, received below exceptions:*
>>>> a) Exception (java.lang.IllegalArgumentException) encountered during
>>>> startup: Invalid sstable file manifest.json: the name doesn't look like a
>>>> supported sstable file name
>>>> java.lang.IllegalArgumentException: Invalid sstable file manifest.json:
>>>> the name doesn't look like a supported sstable file name
>>>> b) ERROR [main] 2021-08-29 06:25:52,120 CassandraDaemon.java:909 -
>>>> Exception encountered during startup
>>>> java.lang.IllegalArgumentException: Invalid sstable file schema.cql:
>>>> the name doesn't look like a supported sstable file name
>>>>
>>>>
>>> *In order to resolve, we removed manifest.json and schema.cql files from
>>>> each table directory and the issue was resolved. *
>>>>
>>>
>>> Did you restore these from backup/snapshot?
>>>
>>>
>>>>
>>>> *2. After resolving the above issue, we received below WARN messages
>>>> during bootstrap(step 4).*
>>>> *WARN * [main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:305 -
>>>> Origin of 1 sstables is unknown or doesn't match the local node;
>>>> commitLogIntervals for them were ignored
>>>> *DEBUG *[main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:306 -
>>>> Ignored commitLogIntervals from the following sstables:
>>>> [/opt1/cassandra_poc/data/clickstream/glcat_mcat_by_flname-af4e3ac0ace511ebaf9ec13e37d013c2/mc-1-big-Data.db]
>>>> *WARN  *[main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:305 -
>>>> Origin of 2 sstables is unknown or doesn't match the local node;
>>>> commitLogIntervals for them were ignored
>>>> *DEBUG *[main] 2021-08-29 06:33:25,738 CommitLogReplayer.java:306 -
>>>> Ignored commitLogIntervals from the following sstables:
>>>> [/opt1/cassandra_poc/data/clickstream/gl_city_map
>>>>
>>>>
>>> Your data files dont match the commitlog files it expects to see. Either
>>> you restored these from backup, or it's because 3.0.9 is much older than
>>> 3.0.x that is more commonly used.
>>>
>>>
>>>> *3. While upgrading sstables (step 5), we received below messages:*
>>>> *WARN*  [CompactionExecutor:3] 2021-08-29 07:47:32,828
>>>> DuplicateRowChecker.java:96 - Detected 2 duplicate rows for 29621439 during
>>>> Upgrade sstables.
>>>> *WARN*  [CompactionExecutor:3] 2021-08-29 07:47:32,831
>>>> DuplicateRowChecker.java:96 - Detected 4 duplicate rows for 45016570 during
>>>> Upgrade sstables.
>>>> *WARN*  [CompactionExecutor:3] 2021-08-29 07:47:32,833
>>>> DuplicateRowChecker.java:96 - Detected 3 duplicate rows for 61260692 during
>>>> Upgrade sstables.
>>>>
>>>>
>>> This says you have corrupt data from an old bug. Probably related to 2.1
>>> -> 3.0 upgrades, if this was originally on 2.1. If you read those keys, you
>>> would find that the data returns 2-4 rows where it should be exactly 1.
>>>
>>>
>>>> 4.* Also, received below messages during upgrade*
>>>> *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,347
>>>> InitialConnectionHandler.java:77 - OPTIONS received 5/v5
>>>> *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,349
>>>> InitialConnectionHandler.java:121 - Response to STARTUP sent, configuring
>>>> pipeline for 5/v5
>>>> *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,350
>>>> InitialConnectionHandler.java:153 - Configured pipeline:
>>>> DefaultChannelPipeline{(frameDecoder =
>>>> org.apache.cassandra.net.FrameDecoderCrc), (frameEncoder =
>>>> org.apache.cassandra.net.FrameEncoderCrc), (cqlProcessor =
>>>> org.apache.cassandra.transport.CQLMessageHandler), (exceptionHandler =
>>>> org.apache.cassandra.transport.ExceptionHandlers$PostV5ExceptionHandler)}
>>>>
>>>>
>>> Logs of debug stuff, normal. It's the netty connection pipelines being
>>> setup.
>>>
>>>
>>>> *5. After upgrade, we are regularly getting below messages:*
>>>> *DEBUG* [ScheduledTasks:1] 2021-09-02 00:03:20,910 SSLFactory.java:354
>>>> - Checking whether certificates have been updated []
>>>> *DEBUG* [ScheduledTasks:1] 2021-09-02 00:13:20,910 SSLFactory.java:354
>>>> - Checking whether certificates have been updated []
>>>> *DEBUG* [ScheduledTasks:1] 2021-09-02 00:23:20,911 SSLFactory.java:354
>>>> - Checking whether certificates have been updated []
>>>>
>>>> Normal. It's checking to see if the ssl cert changed, and if it did, it
>>> would reload it.
>>>
>>>
>>>> *Can someone please explain what these above ERROR / WARN / DEBUG
>>>> messages refer to? Is there anything to be concerned about?*
>>>>
>>>> *Also, received 2 READ_REQ dropped messages (may be due to nw latency) *
>>>> *INFO*  [ScheduledTasks:1] 2021-09-03 11:40:10,009
>>>> MessagingMetrics.java:206 - READ_REQ messages were dropped in last 5000 ms:
>>>> 0 internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>>>> cross-node dropped latency: 12359 ms
>>>> *INFO*  [ScheduledTasks:1] 2021-09-03 13:27:15,291
>>>> MessagingMetrics.java:206 - READ_REQ messages were dropped in last 5000 ms:
>>>> 0 internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>>>> cross-node dropped latency: 5960 ms
>>>>
>>>>
>>> 12s and 6s cross-node latency isn't hugely surprising from US to India,
>>> given the geographical distance and likelihood of packet loss across that
>>> distance. Losing 1 read request every few hours seems like it's within
>>> normal expectations.
>>>
>>>
>>>
>>>> Rest of the stats are pretty much normal (tpstats, status, info,
>>>> tablestats, etc)
>>>>
>>>> Regards,
>>>> Ashish
>>>>
>>>>

Reply via email to