Hi again, Does anyone perhaps have an idea on what could've gone wrong here? Could it be just a calculation error on startup?
Thanks! On Sun, Jan 26, 2020 at 5:57 PM Shalom Sagges <shalomsag...@gmail.com> wrote: > Hi Jeff, > > It is happening on multiple servers and even on different DCs. > The schema contains two keyspaces as follows: > > CREATE KEYSPACE ks1 WITH replication = {'class': > 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = > true; > > CREATE TABLE ks1.tbl1 ( > account_id text, > metric_name text, > day_of_month int, > hour_of_day int, > insert_time timestamp, > metric_value text, > PRIMARY KEY ((account_id, metric_name, day_of_month, hour_of_day), > insert_time) > ) WITH CLUSTERING ORDER BY (insert_time DESC) > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND default_time_to_live = 2764800 > AND gc_grace_seconds = 864000; > > CREATE TABLE ks1.tbl2 ( > account_id text, > lambda_uuid uuid, > metric_name text, > day_of_month int, > insert_time timestamp, > hour_of_day int, > metric_value text, > PRIMARY KEY ((account_id, lambda_uuid, metric_name, day_of_month), > insert_time) > ) WITH CLUSTERING ORDER BY (insert_time DESC) > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND default_time_to_live = 2764800 > AND gc_grace_seconds = 864000; > > > CREATE KEYSPACE ks2 WITH replication = {'class': > 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = > true; > > CREATE TABLE ks2.tbl3 ( > account_id text, > lambda_uuid uuid, > insert_time timestamp, > data text, > error_count int, > info_count int, > request_id text, > warn_count int, > PRIMARY KEY ((account_id, lambda_uuid), insert_time) > ) WITH CLUSTERING ORDER BY (insert_time DESC) > AND default_time_to_live = 2764800 > AND gc_grace_seconds = 864000; > > > I'm afraid I am unable to provide the actual key cache file, but for what > it's worth, here's a data example from the above tables: > > > *ks1.tbl1:* > account_id | metric_name | > day_of_month | hour_of_day | insert_time | metric_value > > ------------+-----------------------------------------------------------+--------------+-------------+---------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > 12345678 | com.selfbuilds.faas.system.invocation-finished-successful | > 1 | 10 | 2020-01-01 10:59:03.588000+0000 | > {"invocationIdentifier":"51176dd4-aa34-4834-bcf5-7e3d5b5494d1","lambdaUUID":"ADD3A5B4-7497-499A-8364-A97108C7400B","event":"","invocationState":"SUCCEEDED","executionTimeInMs":3392} > > > *ks1.tbl2:* > account_id | lambda_uuid | metric_name > | day_of_month | insert_time > | hour_of_day | metric_value > > ------------+--------------------------------------+-------------------------------------------------+--------------+---------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > 12345678 | ed7f0629-8d5c-47dc-a66d-4a1197aa01bc | > com.selfbuilds.faas.system.invocation-triggered | 9 | 2020-01-09 > 13:03:06.653000+0000 | 13 | > {"invocationIdentifier":"3631e9ea-b230-4c4d-a051-c7b4241ee231","lambdaUUID":"ED7F0629-8D5C-47DC-A66D-4A1197AA01BC","event":"","lambdaName":"1578574875000","lastModifierOfLambda":"FaaSAll"} > > > *ks2.tbl3:* > account_id | lambda_uuid | insert_time > | data > | error_count | info_count | request_id > | warn_count > > ------------+--------------------------------------+---------------------------------+----------------------------------------------------------------------------------------+-------------+------------+--------------------------------------+------------ > 12345678 | 9f4eaa4f-42de-4c6e-9c96-02a4749281f3 | 2020-01-01 > 01:30:22.302000+0000 | > [{"timestamp":1577842222458,"level":"Info","message":"Best Logs in > town","extras":[]}] | 0 | 1 | > 2966683f-cf37-4ea3-9d82-1de46207d51e | 0 > > > > Thanks for your help on this one! > > > > On Thu, Jan 23, 2020 at 5:40 PM Jeff Jirsa <jji...@gmail.com> wrote: > >> 489MB key seems huge. Unlikely it’s really that large. >> >> If it’s happening on multiple servers it’d be interesting to see the >> schema and/or saved cache file if it’s not incredibly sensitive >> >> On Jan 23, 2020, at 3:48 AM, Shalom Sagges <shalomsag...@gmail.com> >> wrote: >> >> >> Hi All, >> >> Cassandra 3.11.4. >> On one of our clusters, during startup, I see two types of "Harmless >> error" notification regarding the keycache: >> >> *Server 1:* >> >> INFO [pool-3-thread-1] 2020-01-23 04:34:46,167 AutoSavingCache.java:263 >> - *Harmless error reading saved cache* >> /path/to/saved_caches/KeyCache-e.db >> *java.lang.NegativeArraySizeException: null* >> at >> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401) >> at >> org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:472) >> at >> org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220) >> at >> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165) >> at >> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> INFO [pool-3-thread-1] 2020-01-23 04:34:46,168 AutoSavingCache.java:174 >> - Completed loading (188 ms; 10916 keys) KeyCache cache >> >> >> *Server 2:* >> >> INFO [pool-3-thread-1] 2020-01-15 11:18:35,797 AutoSavingCache.java:263 >> - Harmless error reading saved cache /path/to/saved_caches//KeyCache-e.db >> *java.io.IOException: Corrupted key cache. Key length of 489735494 is >> longer than maximum of 65535* >> at >> org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:469) >> at >> org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220) >> at >> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165) >> at >> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> INFO [pool-3-thread-1] 2020-01-15 11:18:35,799 AutoSavingCache.java:174 >> - Completed loading (285 ms; 20681 keys) KeyCache cache >> >> The KeyCache size is a lot less than max default (100MB). So does this >> mean that the key itself is the culprit? >> >> I'd really appreciate some guidance on what this message actually means >> in the logs during startup and what problems may arise from it. >> I see that one of the nodes experiences high CPU load and even gets >> kernel panic from time to time, so I want to understand if the above may >> cause such issues. >> >> Thanks a lot! >> >>