Re: Soon After Starting c* Process: CPU 100% for java Process

Fred Habash Thu, 01 Jul 2021 05:42:23 -0700

Great. Thanks.

I examined the debug logs and from the time c* starts till it crashes the
30 minute duration log spitting the exact same message.


Google research leads to no clear answers as to ...

- What is 'evicting cold readers' mean?
- Why is evicting the same sstable throughout i.e. from start to crash?
- What is it mean for the AbstractQueryPager to say 'remaining rows to
page: 2147483646'.
- How is all the above related?

My Google research revealed a similar inquiry where the response eluded to
a race condition and recommendation to upgrade to c* 3

Depending on the actual version, you may be running into a race condition
or NPE that's not allowing the files to close properly. Try upgrading to
the latest version of 3.x.

In another hit it referenced a page callback.

* page callback that does not have an executor assigned to it*


This message repeats from start to crash
-----------------------------------------------------------------------
DEBUG [SharedPool-Worker-21] 2021-06-30 12:34:51,542
AbstractQueryPager.java:95 - Fetched 1 live rows
DEBUG [SharedPool-Worker-21] 2021-06-30 12:34:51,542
AbstractQueryPager.java:112 - Got result (1) smaller than page size (5000),
considering pager exhausted
DEBUG [SharedPool-Worker-21] 2021-06-30 12:34:51,543
AbstractQueryPager.java:133 - Remaining rows to page: 2147483646
DEBUG [SharedPool-Worker-8] 2021-06-30 12:34:51,543
AbstractQueryPager.java:95 - Fetched 1 live rows
DEBUG [SharedPool-Worker-8] 2021-06-30 12:34:51,543
AbstractQueryPager.java:112 - Got result (1) smaller than page size (5000),
considering pager exhausted
DEBUG [SharedPool-Worker-8] 2021-06-30 12:34:51,543
AbstractQueryPager.java:133 - Remaining rows to page: 2147483646
DEBUG [SharedPool-Worker-2] 2021-06-30 12:34:51,543
FileCacheService.java:102 - Evicting cold readers for
/data/cassandra/XXXX/YYYYY-cf0c43b028e811e68f2b1b695a8d5b2c/

Eventually, the shared pool worker crashes
--------------------------------------------------------------
b-3223-big-Data.db
WARN  [SharedPool-Worker-55] 2021-06-30 19:55:41,677
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread
Thread[SharedPool-Worker-55,5,main]: {}
java.lang.IllegalArgumentException: Not enough bytes. Offset: 39289.
Length: 20585. Buffer size: 57984
at
org.apache.cassandra.db.composites.AbstractCType.checkRemaining(AbstractCType.java:362)
~[apache-cassandra-2.2.8.jar:2.2.8]

On Wed, Jun 30, 2021 at 7:53 PM Kane Wilson <k...@raft.so> wrote:

> Looks like it's doing a lot of reads immediately on startup
> (AbstractQueryPager) which is potentially causing a lot of GC (guessing
> that's what caused the StatusLogger).
>
> DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766
> AbstractQueryPager.java:133 - Remaining rows to page: 2147483646
>
> is quite suspicious. You'll want to find out what query is causing a
> massive scan at startup, you probably need to have a look through the start
> of the logs to get a better idea at what's happening at startup.
>
> On Thu, Jul 1, 2021 at 5:14 AM Fred Habash <fmhab...@gmail.com> wrote:
>
>> I have node in cluster when I start c, the cpu reaches 100% with java
>> process on top. Within a few minutes, jvm crashes (jvm instability)
>> messages in system.log and c* crashes.
>>
>> Once c* is up, cluster average read latency reaches multi-seconds and
>> client apps are unhappy. For now, the only way out is to drain the node and
>> let the cluster latency settle.
>>
>> None of these measures helped ...
>> 1. Rebooting the ec2
>> 2. Replacing the ec2 altogether (new ec2/ new c* install/ etc).
>> 3. Stopping compactions (as a diagnostic measure)
>> Trying to understand why the java process is chewing much cpu i.e. what
>> is actually happening ...
>>
>> I see these error messages in the debug.log. What functional task do
>> these messages relate to e.g. compactions?
>>
>>
>> DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766
>> AbstractQueryPager.java:95 - Fetched 1 live rows
>> DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766
>> AbstractQueryPager.java:112 - Got result (1) smaller than page size (5000),
>> considering pager exhausted
>> INFO  [Service Thread] 2021-06-30 13:39:04,766 StatusLogger.java:56 -
>> MemtablePostFlush                 0         0             29         0
>>             0
>>
>> DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766
>> AbstractQueryPager.java:133 - Remaining rows to page: 2147483646
>> DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766
>> SliceQueryPager.java:92 - Querying next page of slice query; new filter:
>> SliceQueryFilter [reversed=false, slices=[[, ]], count=5000, toGroup = 0]
>> INFO  [Service Thread] 2021-06-30 13:39:04,766 StatusLogger.java:56 -
>> ValidationExecutor                0         0              0         0
>>             0
>> INFO  [Service Thread] 2021-06-30 13:39:04,766 StatusLogger.java:56 -
>> Sampler                           0         0              0         0
>>             0
>> INFO  [Service Thread] 2021-06-30 13:39:04,767 StatusLogger.java:56 -
>> MemtableFlushWriter               0         0              6         0
>>             0
>> INFO  [Service Thread] 2021-06-30 13:39:04,767 StatusLogger.java:56 -
>> InternalResponseStage             0         0              4         0
>>             0
>> DEBUG [SharedPool-Worker-131] 2021-06-30 13:39:05,078
>> StorageProxy.java:1467 - Read timeout; received 1 of 2 responses (only
>> digests)
>> DEBUG [SharedPool-Worker-131] 2021-06-30 13:39:05,079
>> SliceQueryPager.java:92 - Querying next page of slice query; new filter:
>> SliceQueryFilter [reversed=false, slices=[[, ]], count=5000, toGroup = 0]
>> DEBUG [SharedPool-Worker-158] 2021-06-30 13:39:05,079
>> StorageProxy.java:1467 - Read timeout; received 1 of 2 responses (only
>> digests)
>> DEBUG [SharedPool-Worker-158] 2021-06-30 13:39:05,079
>> SliceQueryPager.java:92 - Querying next page of slice query; new filter:
>> SliceQueryFilter [reversed=false, slices=[[, ]], count=5000, toGroup = 0]
>> DEBUG [SharedPool-Worker-90] 2021-06-30 13:39:05,080 StorageProxy.jav
>> ....
>> EBUG [SharedPool-Worker-26] 2021-06-30 13:39:01,842
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5069-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,847
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5592-big-Data.db
>> DEBUG [SharedPool-Worker-5] 2021-06-30 13:39:01,849
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-3993-big-Data.db
>> DEBUG [SharedPool-Worker-5] 2021-06-30 13:39:01,849
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5927-big-Data.db
>> DEBUG [SharedPool-Worker-5] 2021-06-30 13:39:01,849
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-1276-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5949-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-865-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5741-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-4098-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-1662-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-1339-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-4598-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,855
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-3676-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,855
>> FileCacheService.java:102 - Evicting cold readers for
>> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-2814-big-Data.db
>> DEBUG [SharedPool-Worker-12] 2021-
>>
>> We are using c* 2.2.8
>>
>>
>> ----------------------------------------
>> Thank you
>>
>>
>>
>
> --
> raft.so - Cassandra consulting, support, and managed services
>


-- 

----------------------------------------
Thank you

Re: Soon After Starting c* Process: CPU 100% for java Process

Reply via email to