Ben, you may find this helpful:

https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/


From: Ben Mills <b...@bitbrew.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Thursday, October 24, 2019 at 3:31 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Repair Issues

Message from External Sender
Greetings,

Inherited a small Cassandra cluster with some repair issues and need some 
advice on recommended next steps. Apologies in advance for a long email.

Issue:

Intermittent repair failures on two non-system keyspaces.

- platform_users
- platform_management

Repair Type:

Full, parallel repairs are run on each of the three nodes every five days.

Repair command output for a typical failure:

[2019-10-18 00:22:09,109] Starting repair command #46, repairing keyspace 
platform_users with repair options (parallelism: parallel, primary range: 
false, incremental: false, job threads: 1, ColumnFamilies: [], dataCenters: [], 
hosts: [], # of ranges: 12)
[2019-10-18 00:22:09,242] Repair session 5282be70-f13d-11e9-9b4e-7f6db768ba9a 
for range [(-1890954128429545684,2847510199483651721], 
(8249813014782655320,-8746483007209345011], 
(4299912178579297893,6811748355903297393], 
(-8746483007209345011,-8628999431140554276], 
(-5865769407232506956,-4746990901966533744], 
(-4470950459111056725,-1890954128429545684], 
(4001531392883953257,4299912178579297893], 
(6811748355903297393,6878104809564599690], 
(6878104809564599690,8249813014782655320], 
(-4746990901966533744,-4470950459111056725], 
(-8628999431140554276,-5865769407232506956], 
(2847510199483651721,4001531392883953257]] failed with error [repair 
#5282be70-f13d-11e9-9b4e-7f6db768ba9a on platform_users/access_tokens_v2, 
[(-1890954128429545684,2847510199483651721], 
(8249813014782655320,-8746483007209345011], 
(4299912178579297893,6811748355903297393], 
(-8746483007209345011,-8628999431140554276], 
(-5865769407232506956,-4746990901966533744], 
(-4470950459111056725,-1890954128429545684], 
(4001531392883953257,4299912178579297893], 
(6811748355903297393,6878104809564599690], 
(6878104809564599690,8249813014782655320], 
(-4746990901966533744,-4470950459111056725], 
(-8628999431140554276,-5865769407232506956], 
(2847510199483651721,4001531392883953257]]] Validation failed in /10.x.x.x 
(progress: 26%)
[2019-10-18 00:22:09,246] Some repair failed
[2019-10-18 00:22:09,248] Repair command #46 finished in 0 seconds

Additional Notes:

Repairs encounter above failures more often than not. Sometimes on one node 
only, though occasionally on two. Sometimes just one of the two keyspaces, 
sometimes both. Apparently the previous repair schedule for this cluster 
included incremental repairs (script alternated between incremental and full 
repairs). After reading this TLP article:

https://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__thelastpickle.com_blog_2017_12_14_should-2Dyou-2Duse-2Dincremental-2Drepair.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=IS_T0jkqMzq1WUvU2M2bsp86B8WWcNuhUoWjudSR_t0&s=s4UG2uUbhDqyEE7itCF4vYdDQTg7kxJ6LcipRE71Jqw&e=>

the repair script was replaced with cassandra-reaper (v1.4.0), which was run 
with its default configs. Reaper was fine but only obscured the ongoing issues 
(it did not resolve them) and complicated the debugging process and so was then 
removed. The current repair schedule is as described above under Repair Type.

Attempts at Resolution:

(1) nodetool scrub was attempted on the offending keyspaces/tables to no effect.

(2) sstablescrub has not been attempted due to the current design of the Docker 
image that runs Cassandra in each Kubernetes pod - i.e. there is no way to stop 
the server to run this utility without killing the only pid running in the 
container.

Related Error:

Not sure if this is related, though sometimes, when either:

(a) Running nodetool snapshot, or
(b) Rolling a pod that runs a Cassandra node, which calls nodetool drain prior 
shutdown,

the following error is thrown:

-- StackTrace --
java.lang.RuntimeException: Last written key 
DecoratedKey(10df3ba1-6eb2-4c8e-bddd-c0c7af586bda, 
10df3ba16eb24c8ebdddc0c7af586bda) >= current key 
DecoratedKey(00000000-0000-0000-0000-000000000000, 
17343121887f480c9ba87c0e32206b74) writing into 
/cassandra_data/data/platform_management/device_by_tenant_v2-e91529202ccf11e7ab96d5693708c583/.device_by_tenant_tags_idx/mb-45-big-Data.db
            at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:114)
            at 
org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:153)
            at 
org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
            at 
org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:441)
            at 
org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:477)
            at 
org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:363)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)

Here are some details on the environment and configs in the event that 
something is relevant.

Environment: Kubernetes
Environment Config: Stateful set of 3 replicas
Storage: Persistent Volumes
Storage Class: SSD
Node OS: Container-Optimized OS
Container OS: Ubuntu 16.04.3 LTS

Version: Cassandra 3.7
Data Centers: 1
Racks: 3 (one per zone)
Nodes: 3
Tokens: 4
Replication Factor: 3
Replication Strategy: NetworkTopologyStrategy (all keyspaces)
Compaction Strategy: STCS (all tables)
Read/Write Requirements: Blend of both
Data Load: <1GB per node
gc_grace_seconds: default (10 days - all tables)

Memory: 4Gi per node
CPU: 3.5 per node (3500m)

Java Version: 1.8.0_144

Heap Settings:

-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap
-XX:MaxRAMFraction=2

GC Settings: (CMS)

-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=30000
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways

Any ideas are much appreciated.

Reply via email to