Great. Thanks. I examined the debug logs and from the time c* starts till it crashes the 30 minute duration log spitting the exact same message.
Google research leads to no clear answers as to ... - What is 'evicting cold readers' mean? - Why is evicting the same sstable throughout i.e. from start to crash? - What is it mean for the AbstractQueryPager to say 'remaining rows to page: 2147483646'. - How is all the above related? My Google research revealed a similar inquiry where the response eluded to a race condition and recommendation to upgrade to c* 3 Depending on the actual version, you may be running into a race condition or NPE that's not allowing the files to close properly. Try upgrading to the latest version of 3.x. In another hit it referenced a page callback. * page callback that does not have an executor assigned to it* This message repeats from start to crash ----------------------------------------------------------------------- DEBUG [SharedPool-Worker-21] 2021-06-30 12:34:51,542 AbstractQueryPager.java:95 - Fetched 1 live rows DEBUG [SharedPool-Worker-21] 2021-06-30 12:34:51,542 AbstractQueryPager.java:112 - Got result (1) smaller than page size (5000), considering pager exhausted DEBUG [SharedPool-Worker-21] 2021-06-30 12:34:51,543 AbstractQueryPager.java:133 - Remaining rows to page: 2147483646 DEBUG [SharedPool-Worker-8] 2021-06-30 12:34:51,543 AbstractQueryPager.java:95 - Fetched 1 live rows DEBUG [SharedPool-Worker-8] 2021-06-30 12:34:51,543 AbstractQueryPager.java:112 - Got result (1) smaller than page size (5000), considering pager exhausted DEBUG [SharedPool-Worker-8] 2021-06-30 12:34:51,543 AbstractQueryPager.java:133 - Remaining rows to page: 2147483646 DEBUG [SharedPool-Worker-2] 2021-06-30 12:34:51,543 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/XXXX/YYYYY-cf0c43b028e811e68f2b1b695a8d5b2c/ Eventually, the shared pool worker crashes -------------------------------------------------------------- b-3223-big-Data.db WARN [SharedPool-Worker-55] 2021-06-30 19:55:41,677 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-55,5,main]: {} java.lang.IllegalArgumentException: Not enough bytes. Offset: 39289. Length: 20585. Buffer size: 57984 at org.apache.cassandra.db.composites.AbstractCType.checkRemaining(AbstractCType.java:362) ~[apache-cassandra-2.2.8.jar:2.2.8] On Wed, Jun 30, 2021 at 7:53 PM Kane Wilson <k...@raft.so> wrote: > Looks like it's doing a lot of reads immediately on startup > (AbstractQueryPager) which is potentially causing a lot of GC (guessing > that's what caused the StatusLogger). > > DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766 > AbstractQueryPager.java:133 - Remaining rows to page: 2147483646 > > is quite suspicious. You'll want to find out what query is causing a > massive scan at startup, you probably need to have a look through the start > of the logs to get a better idea at what's happening at startup. > > On Thu, Jul 1, 2021 at 5:14 AM Fred Habash <fmhab...@gmail.com> wrote: > >> I have node in cluster when I start c, the cpu reaches 100% with java >> process on top. Within a few minutes, jvm crashes (jvm instability) >> messages in system.log and c* crashes. >> >> Once c* is up, cluster average read latency reaches multi-seconds and >> client apps are unhappy. For now, the only way out is to drain the node and >> let the cluster latency settle. >> >> None of these measures helped ... >> 1. Rebooting the ec2 >> 2. Replacing the ec2 altogether (new ec2/ new c* install/ etc). >> 3. Stopping compactions (as a diagnostic measure) >> Trying to understand why the java process is chewing much cpu i.e. what >> is actually happening ... >> >> I see these error messages in the debug.log. What functional task do >> these messages relate to e.g. compactions? >> >> >> DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766 >> AbstractQueryPager.java:95 - Fetched 1 live rows >> DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766 >> AbstractQueryPager.java:112 - Got result (1) smaller than page size (5000), >> considering pager exhausted >> INFO [Service Thread] 2021-06-30 13:39:04,766 StatusLogger.java:56 - >> MemtablePostFlush 0 0 29 0 >> 0 >> >> DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766 >> AbstractQueryPager.java:133 - Remaining rows to page: 2147483646 >> DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766 >> SliceQueryPager.java:92 - Querying next page of slice query; new filter: >> SliceQueryFilter [reversed=false, slices=[[, ]], count=5000, toGroup = 0] >> INFO [Service Thread] 2021-06-30 13:39:04,766 StatusLogger.java:56 - >> ValidationExecutor 0 0 0 0 >> 0 >> INFO [Service Thread] 2021-06-30 13:39:04,766 StatusLogger.java:56 - >> Sampler 0 0 0 0 >> 0 >> INFO [Service Thread] 2021-06-30 13:39:04,767 StatusLogger.java:56 - >> MemtableFlushWriter 0 0 6 0 >> 0 >> INFO [Service Thread] 2021-06-30 13:39:04,767 StatusLogger.java:56 - >> InternalResponseStage 0 0 4 0 >> 0 >> DEBUG [SharedPool-Worker-131] 2021-06-30 13:39:05,078 >> StorageProxy.java:1467 - Read timeout; received 1 of 2 responses (only >> digests) >> DEBUG [SharedPool-Worker-131] 2021-06-30 13:39:05,079 >> SliceQueryPager.java:92 - Querying next page of slice query; new filter: >> SliceQueryFilter [reversed=false, slices=[[, ]], count=5000, toGroup = 0] >> DEBUG [SharedPool-Worker-158] 2021-06-30 13:39:05,079 >> StorageProxy.java:1467 - Read timeout; received 1 of 2 responses (only >> digests) >> DEBUG [SharedPool-Worker-158] 2021-06-30 13:39:05,079 >> SliceQueryPager.java:92 - Querying next page of slice query; new filter: >> SliceQueryFilter [reversed=false, slices=[[, ]], count=5000, toGroup = 0] >> DEBUG [SharedPool-Worker-90] 2021-06-30 13:39:05,080 StorageProxy.jav >> .... >> EBUG [SharedPool-Worker-26] 2021-06-30 13:39:01,842 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5069-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,847 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5592-big-Data.db >> DEBUG [SharedPool-Worker-5] 2021-06-30 13:39:01,849 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-3993-big-Data.db >> DEBUG [SharedPool-Worker-5] 2021-06-30 13:39:01,849 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5927-big-Data.db >> DEBUG [SharedPool-Worker-5] 2021-06-30 13:39:01,849 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-1276-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5949-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-865-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5741-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-4098-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-1662-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-1339-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-4598-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,855 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-3676-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,855 >> FileCacheService.java:102 - Evicting cold readers for >> /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-2814-big-Data.db >> DEBUG [SharedPool-Worker-12] 2021- >> >> We are using c* 2.2.8 >> >> >> ---------------------------------------- >> Thank you >> >> >> > > -- > raft.so - Cassandra consulting, support, and managed services > -- ---------------------------------------- Thank you