Hi again,

Does anyone perhaps have an idea on what could've gone wrong here?
Could it be just a calculation error on startup?

Thanks!

On Sun, Jan 26, 2020 at 5:57 PM Shalom Sagges <shalomsag...@gmail.com>
wrote:

> Hi Jeff,
>
> It is happening on multiple servers and even on different DCs.
> The schema contains two keyspaces as follows:
>
> CREATE KEYSPACE ks1 WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}  AND durable_writes =
> true;
>
> CREATE TABLE ks1.tbl1 (
>     account_id text,
>     metric_name text,
>     day_of_month int,
>     hour_of_day int,
>     insert_time timestamp,
>     metric_value text,
>     PRIMARY KEY ((account_id, metric_name, day_of_month, hour_of_day),
> insert_time)
> ) WITH CLUSTERING ORDER BY (insert_time DESC)
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND default_time_to_live = 2764800
>     AND gc_grace_seconds = 864000;
>
> CREATE TABLE ks1.tbl2 (
>     account_id text,
>     lambda_uuid uuid,
>     metric_name text,
>     day_of_month int,
>     insert_time timestamp,
>     hour_of_day int,
>     metric_value text,
>     PRIMARY KEY ((account_id, lambda_uuid, metric_name, day_of_month),
> insert_time)
> ) WITH CLUSTERING ORDER BY (insert_time DESC)
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND default_time_to_live = 2764800
>     AND gc_grace_seconds = 864000;
>
>
> CREATE KEYSPACE ks2 WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}  AND durable_writes =
> true;
>
> CREATE TABLE ks2.tbl3 (
>     account_id text,
>     lambda_uuid uuid,
>     insert_time timestamp,
>     data text,
>     error_count int,
>     info_count int,
>     request_id text,
>     warn_count int,
>     PRIMARY KEY ((account_id, lambda_uuid), insert_time)
> ) WITH CLUSTERING ORDER BY (insert_time DESC)
>     AND default_time_to_live = 2764800
>     AND gc_grace_seconds = 864000;
>
>
> I'm afraid I am unable to provide the actual key cache file, but for what
> it's worth, here's a data example from the above tables:
>
>
> *ks1.tbl1:*
>  account_id | metric_name                                               |
> day_of_month | hour_of_day | insert_time                     | metric_value
>
> ------------+-----------------------------------------------------------+--------------+-------------+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>    12345678 | com.selfbuilds.faas.system.invocation-finished-successful |
>            1 |          10 | 2020-01-01 10:59:03.588000+0000 |
> {"invocationIdentifier":"51176dd4-aa34-4834-bcf5-7e3d5b5494d1","lambdaUUID":"ADD3A5B4-7497-499A-8364-A97108C7400B","event":"","invocationState":"SUCCEEDED","executionTimeInMs":3392}
>
>
> *ks1.tbl2:*
>  account_id | lambda_uuid                          | metric_name
>                           | day_of_month | insert_time
> | hour_of_day | metric_value
>
> ------------+--------------------------------------+-------------------------------------------------+--------------+---------------------------------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>    12345678 | ed7f0629-8d5c-47dc-a66d-4a1197aa01bc |
> com.selfbuilds.faas.system.invocation-triggered |            9 | 2020-01-09
> 13:03:06.653000+0000 |          13 |
> {"invocationIdentifier":"3631e9ea-b230-4c4d-a051-c7b4241ee231","lambdaUUID":"ED7F0629-8D5C-47DC-A66D-4A1197AA01BC","event":"","lambdaName":"1578574875000","lastModifierOfLambda":"FaaSAll"}
>
>
> *ks2.tbl3:*
>  account_id | lambda_uuid                          | insert_time
>           | data
>                         | error_count | info_count | request_id
>               | warn_count
>
> ------------+--------------------------------------+---------------------------------+----------------------------------------------------------------------------------------+-------------+------------+--------------------------------------+------------
>    12345678 | 9f4eaa4f-42de-4c6e-9c96-02a4749281f3 | 2020-01-01
> 01:30:22.302000+0000 |
> [{"timestamp":1577842222458,"level":"Info","message":"Best Logs in
> town","extras":[]}] |           0 |          1 |
> 2966683f-cf37-4ea3-9d82-1de46207d51e |          0
>
>
>
> Thanks for your help on this one!
>
>
>
> On Thu, Jan 23, 2020 at 5:40 PM Jeff Jirsa <jji...@gmail.com> wrote:
>
>> 489MB key seems huge. Unlikely it’s really that large.
>>
>> If it’s happening on multiple servers it’d be interesting to see the
>> schema and/or saved cache file if it’s not incredibly sensitive
>>
>> On Jan 23, 2020, at 3:48 AM, Shalom Sagges <shalomsag...@gmail.com>
>> wrote:
>>
>> 
>> Hi All,
>>
>> Cassandra 3.11.4.
>> On one of our clusters, during startup, I see two types of  "Harmless
>> error" notification regarding the keycache:
>>
>> *Server 1:*
>>
>> INFO  [pool-3-thread-1] 2020-01-23 04:34:46,167 AutoSavingCache.java:263
>> - *Harmless error reading saved cache*
>> /path/to/saved_caches/KeyCache-e.db
>> *java.lang.NegativeArraySizeException: null*
>>         at
>> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401)
>>         at
>> org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:472)
>>         at
>> org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220)
>>         at
>> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165)
>>         at
>> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>         at java.lang.Thread.run(Thread.java:745)
>> INFO  [pool-3-thread-1] 2020-01-23 04:34:46,168 AutoSavingCache.java:174
>> - Completed loading (188 ms; 10916 keys) KeyCache cache
>>
>>
>> *Server 2:*
>>
>> INFO  [pool-3-thread-1] 2020-01-15 11:18:35,797 AutoSavingCache.java:263
>> - Harmless error reading saved cache /path/to/saved_caches//KeyCache-e.db
>> *java.io.IOException: Corrupted key cache. Key length of 489735494 is
>> longer than maximum of 65535*
>>         at
>> org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:469)
>>         at
>> org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:220)
>>         at
>> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:165)
>>         at
>> org.apache.cassandra.cache.AutoSavingCache$3.call(AutoSavingCache.java:161)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>         at java.lang.Thread.run(Thread.java:745)
>> INFO  [pool-3-thread-1] 2020-01-15 11:18:35,799 AutoSavingCache.java:174
>> - Completed loading (285 ms; 20681 keys) KeyCache cache
>>
>> The KeyCache size is a lot less than max default (100MB). So does this
>> mean that the key itself is the culprit?
>>
>> I'd really appreciate some guidance on what this message actually means
>> in the logs during startup and what problems may arise from it.
>> I see that one of the nodes experiences high CPU load and even gets
>> kernel panic from time to time, so I want to understand if the above may
>> cause such issues.
>>
>> Thanks a lot!
>>
>>

Reply via email to