Thanks all for the help.

On Mon, Sep 6, 2021 at 2:20 PM manish khandelwal <
manishkhandelwa...@gmail.com> wrote:

> Totally agree with Jeff and Bowen there. Don't try to achieve something
> faster by cutting corners. Migration to GCP from physical DC should be done
> on the same versions.
>
> On Mon, Sep 6, 2021 at 2:11 PM Bowen Song <bo...@bso.ng> wrote:
>
>> Hello Ashish,
>>
>>
>> I'm slightly worried about this:
>>
>> *Since I won't be needing physical DC anymore so instead of upgrading it
>> I will simply discard that DC*
>>
>> This sounds like you are planning to add GCP 3.x to existing cluster, and
>> upgrade GCP to 4.0, then decommission the existing DC without upgrading. If
>> so, you need to think twice. Adding or removing nodes (or DCs) in a cluster
>> with different versions is not recommended. I'd highly recommend you
>> upgrade the existing DC before decommissioning it. Of course, you can skip
>> the upgrade sstables on it which is often the most time consuming part.
>>
>>
>> Cheers,
>>
>> Bowen
>>
>> On 06/09/2021 03:29, MyWorld wrote:
>>
>> Hi Jeff,
>>
>> *When you’re upgrading or rebuilding you want all copies on the same
>> version with proper sstables . So either add GCP then upgrade to 4.0 or
>> upgrade to 4.0 and then expand to GCP. Don’t do them at the same time.*
>>
>> I think I forgot to mention one thing that after completion of step 1 our
>> GCP data center will be added with rebuild done on all nodes. So our
>> complete cluster would be on 3.0.9 after step 1. Will change num_tokens
>> from current 256 to 16 in GCP data center in this step only.
>>
>> DC1 -
>> 5nodes (physical) - version 3.0.9
>> numtokens256
>> DC2 -
>> 5nodes (GCP) - version 3.0.9
>> numtokens16
>>
>> Rest all step from 2-5 are meant for upgradation in which I am planning
>> to go DC wise upgradation and running upgradesstables on GCP first.
>>
>> DC1 -
>> 5nodes (physical) - version 3.0.9
>> numtokens256
>> DC2 -
>> 5nodes (GCP) - version 4.0.0
>> numtokens16
>>
>> Since I won't be needing physical DC anymore so instead of upgrading it I
>> will simply discard that DC
>>
>> Regards,
>> Ashish
>>
>> On Mon, Sep 6, 2021, 7:31 AM Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> In-line
>>>
>>> On Sep 3, 2021, at 11:12 AM, MyWorld <timeplus.1...@gmail.com> wrote:
>>>
>>> 
>>> Hi Jeff,
>>> Thanks for your response.
>>> To answer your question, Yes, we have created dev environment by
>>> restoring them from snapshot/CSV files.
>>>
>>> Just one follow up question, I have a 5-node single DC on production on
>>> version 3.0.9on physical server.
>>> We are planning to migrate to GCP along with upgradation using below
>>> steps.
>>> 1. Setup GCP data center with same version 3.0.9 and rebuild complete
>>> data
>>> 2. Now install and configure 4.0 version in new GCP data center on all 5
>>> nodes
>>> 3. Stop version 3.0.9 and start 4.0 on all 5 nodes of GCP one by one
>>> 4. Run upgradesstables one by one on all 5 nodes of GCP
>>> 5.Later move read/write traffic to GCP and remove old datacenter which
>>> is still on version 3.0.9
>>>
>>> Please guide on few things:
>>> 1. Is the above mention approach right?
>>>
>>>
>>> When you’re upgrading or rebuilding you want all copies on the same
>>> version with proper sstables . So either add GCP then upgrade to 4.0 or
>>> upgrade to 4.0 and then expand to GCP. Don’t do them at the same time.
>>>
>>>
>>> 2. OR should we update 4.0 on only one node on GCP at a time and run
>>> upgrade sstables on just one node first
>>>
>>>
>>> I usually do upgradesstables after all bounces are done
>>>
>>> The only exception is perhaps doing upgradesstables with exactly one
>>> copy via backup/restore to make sure 4.0 works with your data files, which
>>> it sounds like you’ve already done.
>>>
>>> 3. OR should we migrate to GCP first and then think of upgrade 4.0 later
>>> 4. OR Is there any reason I should upgrade to 3.11.x first
>>>
>>>
>>> Not 3.11 but maybe latest 3.0 instead
>>>
>>>
>>>
>>> Regards,
>>> Ashish
>>>
>>> On Fri, Sep 3, 2021, 11:11 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Sep 3, 2021 at 10:33 AM MyWorld <timeplus.1...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>> We are doing a POC on dev environment to upgrade apache cassandra
>>>>> 3.0.9 to 4.0.0. We have the below setup currently on cassandra 3.0.9
>>>>> DC1 - GCP(india) - 1 node
>>>>> DC2 - GCP(US) - 1 node
>>>>>
>>>>
>>>> 3.0.9 is very old. It's got older version of data files and some known
>>>> correctness bugs.
>>>>
>>>>
>>>>>
>>>>> For upgradation, we carried out below steps on DC2 - GCP(US) node:
>>>>> Step1. Install apache cassandra 4.0.0
>>>>> Step2. Did all Configuration settings
>>>>> Step3. Stop apache cassandra 3.0.9
>>>>> Step4. Start apache cassandra 4.0.0 and monitor logs
>>>>> Step5. Run nodetool upgradesstables and monitor logs
>>>>>
>>>>> After monitoring logs, I had below observations:
>>>>> *1. Initially during bootstrap at Step4, received below exceptions:*
>>>>> a) Exception (java.lang.IllegalArgumentException) encountered during
>>>>> startup: Invalid sstable file manifest.json: the name doesn't look like a
>>>>> supported sstable file name
>>>>> java.lang.IllegalArgumentException: Invalid sstable file
>>>>> manifest.json: the name doesn't look like a supported sstable file name
>>>>> b) ERROR [main] 2021-08-29 06:25:52,120 CassandraDaemon.java:909 -
>>>>> Exception encountered during startup
>>>>> java.lang.IllegalArgumentException: Invalid sstable file schema.cql:
>>>>> the name doesn't look like a supported sstable file name
>>>>>
>>>>>
>>>> *In order to resolve, we removed manifest.json and schema.cql files
>>>>> from each table directory and the issue was resolved. *
>>>>>
>>>>
>>>> Did you restore these from backup/snapshot?
>>>>
>>>>
>>>>>
>>>>> *2. After resolving the above issue, we received below WARN messages
>>>>> during bootstrap(step 4).*
>>>>> *WARN * [main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:305 -
>>>>> Origin of 1 sstables is unknown or doesn't match the local node;
>>>>> commitLogIntervals for them were ignored
>>>>> *DEBUG *[main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:306 -
>>>>> Ignored commitLogIntervals from the following sstables:
>>>>> [/opt1/cassandra_poc/data/clickstream/glcat_mcat_by_flname-af4e3ac0ace511ebaf9ec13e37d013c2/mc-1-big-Data.db]
>>>>> *WARN  *[main] 2021-08-29 06:33:25,737 CommitLogReplayer.java:305 -
>>>>> Origin of 2 sstables is unknown or doesn't match the local node;
>>>>> commitLogIntervals for them were ignored
>>>>> *DEBUG *[main] 2021-08-29 06:33:25,738 CommitLogReplayer.java:306 -
>>>>> Ignored commitLogIntervals from the following sstables:
>>>>> [/opt1/cassandra_poc/data/clickstream/gl_city_map
>>>>>
>>>>>
>>>> Your data files dont match the commitlog files it expects to see.
>>>> Either you restored these from backup, or it's because 3.0.9 is much older
>>>> than 3.0.x that is more commonly used.
>>>>
>>>>
>>>>> *3. While upgrading sstables (step 5), we received below messages:*
>>>>> *WARN*  [CompactionExecutor:3] 2021-08-29 07:47:32,828
>>>>> DuplicateRowChecker.java:96 - Detected 2 duplicate rows for 29621439 
>>>>> during
>>>>> Upgrade sstables.
>>>>> *WARN*  [CompactionExecutor:3] 2021-08-29 07:47:32,831
>>>>> DuplicateRowChecker.java:96 - Detected 4 duplicate rows for 45016570 
>>>>> during
>>>>> Upgrade sstables.
>>>>> *WARN*  [CompactionExecutor:3] 2021-08-29 07:47:32,833
>>>>> DuplicateRowChecker.java:96 - Detected 3 duplicate rows for 61260692 
>>>>> during
>>>>> Upgrade sstables.
>>>>>
>>>>>
>>>> This says you have corrupt data from an old bug. Probably related to
>>>> 2.1 -> 3.0 upgrades, if this was originally on 2.1. If you read those keys,
>>>> you would find that the data returns 2-4 rows where it should be exactly 1.
>>>>
>>>>
>>>>> 4.* Also, received below messages during upgrade*
>>>>> *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,347
>>>>> InitialConnectionHandler.java:77 - OPTIONS received 5/v5
>>>>> *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,349
>>>>> InitialConnectionHandler.java:121 - Response to STARTUP sent, configuring
>>>>> pipeline for 5/v5
>>>>> *DEBUG* [epollEventLoopGroup-5-8] 2021-09-03 12:27:31,350
>>>>> InitialConnectionHandler.java:153 - Configured pipeline:
>>>>> DefaultChannelPipeline{(frameDecoder =
>>>>> org.apache.cassandra.net.FrameDecoderCrc), (frameEncoder =
>>>>> org.apache.cassandra.net.FrameEncoderCrc), (cqlProcessor =
>>>>> org.apache.cassandra.transport.CQLMessageHandler), (exceptionHandler =
>>>>> org.apache.cassandra.transport.ExceptionHandlers$PostV5ExceptionHandler)}
>>>>>
>>>>>
>>>> Logs of debug stuff, normal. It's the netty connection pipelines being
>>>> setup.
>>>>
>>>>
>>>>> *5. After upgrade, we are regularly getting below messages:*
>>>>> *DEBUG* [ScheduledTasks:1] 2021-09-02 00:03:20,910
>>>>> SSLFactory.java:354 - Checking whether certificates have been updated []
>>>>> *DEBUG* [ScheduledTasks:1] 2021-09-02 00:13:20,910
>>>>> SSLFactory.java:354 - Checking whether certificates have been updated []
>>>>> *DEBUG* [ScheduledTasks:1] 2021-09-02 00:23:20,911
>>>>> SSLFactory.java:354 - Checking whether certificates have been updated []
>>>>>
>>>>> Normal. It's checking to see if the ssl cert changed, and if it did,
>>>> it would reload it.
>>>>
>>>>
>>>>> *Can someone please explain what these above ERROR / WARN / DEBUG
>>>>> messages refer to? Is there anything to be concerned about?*
>>>>>
>>>>> *Also, received 2 READ_REQ dropped messages (may be due to nw
>>>>> latency) *
>>>>> *INFO*  [ScheduledTasks:1] 2021-09-03 11:40:10,009
>>>>> MessagingMetrics.java:206 - READ_REQ messages were dropped in last 5000 
>>>>> ms:
>>>>> 0 internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>>>>> cross-node dropped latency: 12359 ms
>>>>> *INFO*  [ScheduledTasks:1] 2021-09-03 13:27:15,291
>>>>> MessagingMetrics.java:206 - READ_REQ messages were dropped in last 5000 
>>>>> ms:
>>>>> 0 internal and 1 cross node. Mean internal dropped latency: 0 ms and Mean
>>>>> cross-node dropped latency: 5960 ms
>>>>>
>>>>>
>>>> 12s and 6s cross-node latency isn't hugely surprising from US to India,
>>>> given the geographical distance and likelihood of packet loss across that
>>>> distance. Losing 1 read request every few hours seems like it's within
>>>> normal expectations.
>>>>
>>>>
>>>>
>>>>> Rest of the stats are pretty much normal (tpstats, status, info,
>>>>> tablestats, etc)
>>>>>
>>>>> Regards,
>>>>> Ashish
>>>>>
>>>>>

Reply via email to