Run the async java profiler on the node to determine what it's doing:
https://github.com/jvm-profiling-tools/async-profiler

On Wed, Apr 17, 2019 at 11:31 AM Carl Mueller
<carl.muel...@smartthings.com.invalid> wrote:
>
> No, we just did the package upgrade 2.1.9 --> 2.2.13
>
> It definitely feels like some indexes are being recalculated or the entire 
> sstables are being scanned due to suspected corruption.
>
>
> On Wed, Apr 17, 2019 at 12:32 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>
>> There was a time when changing some of the parameters (especially bloom 
>> filter FP ratio) would cause the bloom filters to be rebuilt on startup if 
>> the sstables didnt match what was in the schema, leading to a delay like 
>> that and similar logs. Any chance you changed the schema on that table since 
>> the last time you restarted it?
>>
>>
>>
>> On Wed, Apr 17, 2019 at 10:30 AM Carl Mueller 
>> <carl.muel...@smartthings.com.invalid> wrote:
>>>
>>> Oh, the table in question is SizeTiered, had about 10 sstables total, it 
>>> was JBOD across two data directories.
>>>
>>> On Wed, Apr 17, 2019 at 12:26 PM Carl Mueller 
>>> <carl.muel...@smartthings.com> wrote:
>>>>
>>>> We are doing a ton of upgrades to get out of 2.1.x. We've done probably 
>>>> 20-30 clusters so far and have not encountered anything like this yet.
>>>>
>>>> After upgrade of a node, the restart takes a long time. like 10 minutes 
>>>> long. ALmost all of our other nodes took less than 2 minutes to upgrade 
>>>> (aside from sstableupgrades).
>>>>
>>>> The startup stalls on a particular table, it is the largest table at about 
>>>> 300GB, but we have upgraded other clusters with about that much data 
>>>> without this 8-10 minute delay. We have the ability to roll back the node, 
>>>> and the restart as a 2.1.x node is normal with no delays.
>>>>
>>>> Alas this is a prod cluster so we are going to try to sstable load the 
>>>> data on a lower environment and try to replicate the delay. If we can, we 
>>>> will turn on debug logging.
>>>>
>>>> This occurred on the first node we tried to upgrade. It is possible it is 
>>>> limited to only this node, but we are gunshy to play around with upgrades 
>>>> in prod.
>>>>
>>>> We have an automated upgrading program that flushes, snapshots, shuts down 
>>>> gossip, drains before upgrade, suppressed autostart on upgrade, and has 
>>>> worked about as flawlessly as one could hope for so far for 2.1->2.2 and 
>>>> 2.2-> 3.11 upgrades.
>>>>
>>>> INFO  [main] 2019-04-16 17:22:17,004 ColumnFamilyStore.java:389 - 
>>>> Initializing zzzz.access_token
>>>> INFO  [main] 2019-04-16 17:22:17,096 ColumnFamilyStore.java:389 - 
>>>> Initializing zzzz.refresh_token
>>>> INFO  [main] 2019-04-16 17:28:52,929 ColumnFamilyStore.java:389 - 
>>>> Initializing zzzz.userid
>>>> INFO  [main] 2019-04-16 17:28:52,930 ColumnFamilyStore.java:389 - 
>>>> Initializing zzzz.access_token_by_auth
>>>>
>>>> You can see the 6:30 delay in the startup log above. All the other 
>>>> keyspace/tables initialize in under a second.
>>>>
>>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to