Hi Reid,

Many thanks - I have seen that article though will definitely give it
another read.

Note that nodetool scrub has been tried (no effect) and sstablescrub cannot
currently be run with the Cassandra image in use (though certainly a new
image that allows the server to be stopped but keeps the operating
environment available to use this utility can be built - just haven't done
so yet). Note also that none of the logs are indicating that a corrupt
data file (or files) is in play here. Noting that because the article
includes a solution where a specific data file is manually deleted and then
repairs are run to restore the file from a different node in the cluster.
Also, the way persistent volumes are mounted onto [Kubernetes] nodes
prevents this solution (manual deletion of an offending data file) from
being viable because the PV mount on the node's filesystem is detached when
the pods are down. This is a subtlety of running Cassandra in Kubernetes.

On Thu, Oct 24, 2019 at 4:24 PM Reid Pinchback <rpinchb...@tripadvisor.com>
wrote:

> Ben, you may find this helpful:
>
>
>
> https://blog.pythian.com/so-you-have-a-broken-cassandra-sstable-file/
>
>
>
>
>
> *From: *Ben Mills <b...@bitbrew.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Thursday, October 24, 2019 at 3:31 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Repair Issues
>
>
>
> *Message from External Sender*
>
> Greetings,
>
> Inherited a small Cassandra cluster with some repair issues and need some
> advice on recommended next steps. Apologies in advance for a long email.
>
> Issue:
>
> Intermittent repair failures on two non-system keyspaces.
>
> - platform_users
> - platform_management
>
> Repair Type:
>
> Full, parallel repairs are run on each of the three nodes every five days.
>
> Repair command output for a typical failure:
>
> [2019-10-18 00:22:09,109] Starting repair command #46, repairing keyspace
> platform_users with repair options (parallelism: parallel, primary range:
> false, incremental: false, job threads: 1, ColumnFamilies: [], dataCenters:
> [], hosts: [], # of ranges: 12)
> [2019-10-18 00:22:09,242] Repair session
> 5282be70-f13d-11e9-9b4e-7f6db768ba9a for range
> [(-1890954128429545684,2847510199483651721],
> (8249813014782655320,-8746483007209345011],
> (4299912178579297893,6811748355903297393],
> (-8746483007209345011,-8628999431140554276],
> (-5865769407232506956,-4746990901966533744],
> (-4470950459111056725,-1890954128429545684],
> (4001531392883953257,4299912178579297893],
> (6811748355903297393,6878104809564599690],
> (6878104809564599690,8249813014782655320],
> (-4746990901966533744,-4470950459111056725],
> (-8628999431140554276,-5865769407232506956],
> (2847510199483651721,4001531392883953257]] failed with error [repair
> #5282be70-f13d-11e9-9b4e-7f6db768ba9a on platform_users/access_tokens_v2,
> [(-1890954128429545684,2847510199483651721],
> (8249813014782655320,-8746483007209345011],
> (4299912178579297893,6811748355903297393],
> (-8746483007209345011,-8628999431140554276],
> (-5865769407232506956,-4746990901966533744],
> (-4470950459111056725,-1890954128429545684],
> (4001531392883953257,4299912178579297893],
> (6811748355903297393,6878104809564599690],
> (6878104809564599690,8249813014782655320],
> (-4746990901966533744,-4470950459111056725],
> (-8628999431140554276,-5865769407232506956],
> (2847510199483651721,4001531392883953257]]] Validation failed in /10.x.x.x
> (progress: 26%)
> [2019-10-18 00:22:09,246] Some repair failed
> [2019-10-18 00:22:09,248] Repair command #46 finished in 0 seconds
>
> Additional Notes:
>
> Repairs encounter above failures more often than not. Sometimes on one
> node only, though occasionally on two. Sometimes just one of the two
> keyspaces, sometimes both. Apparently the previous repair schedule for
> this cluster included incremental repairs (script alternated between
> incremental and full repairs). After reading this TLP article:
>
>
> https://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__thelastpickle.com_blog_2017_12_14_should-2Dyou-2Duse-2Dincremental-2Drepair.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=IS_T0jkqMzq1WUvU2M2bsp86B8WWcNuhUoWjudSR_t0&s=s4UG2uUbhDqyEE7itCF4vYdDQTg7kxJ6LcipRE71Jqw&e=>
>
> the repair script was replaced with cassandra-reaper (v1.4.0), which was
> run with its default configs. Reaper was fine but only obscured the ongoing
> issues (it did not resolve them) and complicated the debugging process and
> so was then removed. The current repair schedule is as described above
> under Repair Type.
>
> Attempts at Resolution:
>
> (1) nodetool scrub was attempted on the offending keyspaces/tables to no
> effect.
>
> (2) sstablescrub has not been attempted due to the current design of the
> Docker image that runs Cassandra in each Kubernetes pod - i.e. there is no
> way to stop the server to run this utility without killing the only pid
> running in the container.
>
> Related Error:
>
> Not sure if this is related, though sometimes, when either:
>
> (a) Running nodetool snapshot, or
> (b) Rolling a pod that runs a Cassandra node, which calls nodetool drain
> prior shutdown,
>
> the following error is thrown:
>
> -- StackTrace --
> java.lang.RuntimeException: Last written key
> DecoratedKey(10df3ba1-6eb2-4c8e-bddd-c0c7af586bda,
> 10df3ba16eb24c8ebdddc0c7af586bda) >= current key
> DecoratedKey(00000000-0000-0000-0000-000000000000,
> 17343121887f480c9ba87c0e32206b74) writing into
> /cassandra_data/data/platform_management/device_by_tenant_v2-e91529202ccf11e7ab96d5693708c583/.device_by_tenant_tags_idx/mb-45-big-Data.db
>             at
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:114)
>             at
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:153)
>             at
> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
>             at
> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:441)
>             at
> org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:477)
>             at
> org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:363)
>             at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>             at java.lang.Thread.run(Thread.java:748)
>
> Here are some details on the environment and configs in the event that
> something is relevant.
>
> Environment: Kubernetes
> Environment Config: Stateful set of 3 replicas
> Storage: Persistent Volumes
> Storage Class: SSD
> Node OS: Container-Optimized OS
> Container OS: Ubuntu 16.04.3 LTS
>
> Version: Cassandra 3.7
> Data Centers: 1
> Racks: 3 (one per zone)
> Nodes: 3
> Tokens: 4
> Replication Factor: 3
> Replication Strategy: NetworkTopologyStrategy (all keyspaces)
> Compaction Strategy: STCS (all tables)
> Read/Write Requirements: Blend of both
> Data Load: <1GB per node
> gc_grace_seconds: default (10 days - all tables)
>
> Memory: 4Gi per node
> CPU: 3.5 per node (3500m)
>
> Java Version: 1.8.0_144
>
> Heap Settings:
>
> -XX:+UnlockExperimentalVMOptions
> -XX:+UseCGroupMemoryLimitForHeap
> -XX:MaxRAMFraction=2
>
> GC Settings: (CMS)
>
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:CMSWaitDuration=30000
> -XX:+CMSParallelInitialMarkEnabled
> -XX:+CMSEdenChunksRecordAlways
>
>
>
> Any ideas are much appreciated.
>

Reply via email to