Thanks Ghiyasi. On Sat, Oct 26, 2019 at 9:17 AM Hossein Ghiyasi Mehr <ghiyasim...@gmail.com> wrote:
> If the problem exist still, and all nodes are up, reboot them one by one. > Then try to repair one node. After that repair other nodes one by one. > > On Fri, Oct 25, 2019 at 12:56 AM Ben Mills <b...@bitbrew.com> wrote: > >> >> Thanks Jon! >> >> This is very helpful - allow me to follow-up and ask a question. >> >> (1) Yes, incremental repairs will never be used (unless it becomes viable >> in Cassandra 4.x someday). >> (2) I hear you on the JVM - will look into that. >> (3) Been looking at Cassandra version 3.11.x though was unaware that 3.7 >> is considered non-viable for production use. >> >> For (4) - Question/Request: >> >> Note that with: >> >> -XX:MaxRAMFraction=2 >> >> the actual amount of memory allocated for heap space is effectively 2Gi >> (i.e. half of the 4Gi allocated on the machine type). We can definitely >> increase memory (for heap and nonheap), though can you expand a bit on your >> heap comment to help my understanding (as this is such a small cluster with >> such a small amount of data at rest)? >> >> Thanks again. >> >> On Thu, Oct 24, 2019 at 5:11 PM Jon Haddad <j...@jonhaddad.com> wrote: >> >>> There's some major warning signs for me with your environment. 4GB heap >>> is too low, and Cassandra 3.7 isn't something I would put into production. >>> >>> Your surface area for problems is massive right now. Things I'd do: >>> >>> 1. Never use incremental repair. Seems like you've already stopped >>> doing them, but it's worth mentioning. >>> 2. Upgrade to the latest JVM, that version's way out of date. >>> 3. Upgrade to Cassandra 3.11.latest (we're voting on 3.11.5 right now). >>> 4. Increase memory to 8GB minimum, preferably 12. >>> >>> I usually don't like making a bunch of changes without knowing the root >>> cause of a problem, but in your case there's so many potential problems I >>> don't think it's worth digging into, especially since the problem might be >>> one of the 500 or so bugs that were fixed since this release. >>> >>> Once you've done those things it'll be easier to narrow down the problem. >>> >>> Jon >>> >>> >>> On Thu, Oct 24, 2019 at 4:59 PM Ben Mills <b...@bitbrew.com> wrote: >>> >>>> Hi Sergio, >>>> >>>> No, not at this time. >>>> >>>> It was in use with this cluster previously, and while there were no >>>> reaper-specific issues, it was removed to help simplify investigation of >>>> the underlying repair issues I've described. >>>> >>>> Thanks. >>>> >>>> On Thu, Oct 24, 2019 at 4:21 PM Sergio <lapostadiser...@gmail.com> >>>> wrote: >>>> >>>>> Are you using Cassandra reaper? >>>>> >>>>> On Thu, Oct 24, 2019, 12:31 PM Ben Mills <b...@bitbrew.com> wrote: >>>>> >>>>>> Greetings, >>>>>> >>>>>> Inherited a small Cassandra cluster with some repair issues and need >>>>>> some advice on recommended next steps. Apologies in advance for a long >>>>>> email. >>>>>> >>>>>> Issue: >>>>>> >>>>>> Intermittent repair failures on two non-system keyspaces. >>>>>> >>>>>> - platform_users >>>>>> - platform_management >>>>>> >>>>>> Repair Type: >>>>>> >>>>>> Full, parallel repairs are run on each of the three nodes every five >>>>>> days. >>>>>> >>>>>> Repair command output for a typical failure: >>>>>> >>>>>> [2019-10-18 00:22:09,109] Starting repair command #46, repairing >>>>>> keyspace platform_users with repair options (parallelism: parallel, >>>>>> primary >>>>>> range: false, incremental: false, job threads: 1, ColumnFamilies: [], >>>>>> dataCenters: [], hosts: [], # of ranges: 12) >>>>>> [2019-10-18 00:22:09,242] Repair session >>>>>> 5282be70-f13d-11e9-9b4e-7f6db768ba9a for range >>>>>> [(-1890954128429545684,2847510199483651721], >>>>>> (8249813014782655320,-8746483007209345011], >>>>>> (4299912178579297893,6811748355903297393], >>>>>> (-8746483007209345011,-8628999431140554276], >>>>>> (-5865769407232506956,-4746990901966533744], >>>>>> (-4470950459111056725,-1890954128429545684], >>>>>> (4001531392883953257,4299912178579297893], >>>>>> (6811748355903297393,6878104809564599690], >>>>>> (6878104809564599690,8249813014782655320], >>>>>> (-4746990901966533744,-4470950459111056725], >>>>>> (-8628999431140554276,-5865769407232506956], >>>>>> (2847510199483651721,4001531392883953257]] failed with error [repair >>>>>> #5282be70-f13d-11e9-9b4e-7f6db768ba9a on platform_users/access_tokens_v2, >>>>>> [(-1890954128429545684,2847510199483651721], >>>>>> (8249813014782655320,-8746483007209345011], >>>>>> (4299912178579297893,6811748355903297393], >>>>>> (-8746483007209345011,-8628999431140554276], >>>>>> (-5865769407232506956,-4746990901966533744], >>>>>> (-4470950459111056725,-1890954128429545684], >>>>>> (4001531392883953257,4299912178579297893], >>>>>> (6811748355903297393,6878104809564599690], >>>>>> (6878104809564599690,8249813014782655320], >>>>>> (-4746990901966533744,-4470950459111056725], >>>>>> (-8628999431140554276,-5865769407232506956], >>>>>> (2847510199483651721,4001531392883953257]]] Validation failed in >>>>>> /10.x.x.x >>>>>> (progress: 26%) >>>>>> [2019-10-18 00:22:09,246] Some repair failed >>>>>> [2019-10-18 00:22:09,248] Repair command #46 finished in 0 seconds >>>>>> >>>>>> Additional Notes: >>>>>> >>>>>> Repairs encounter above failures more often than not. Sometimes on >>>>>> one node only, though occasionally on two. Sometimes just one of the two >>>>>> keyspaces, sometimes both. Apparently the previous repair schedule >>>>>> for this cluster included incremental repairs (script alternated between >>>>>> incremental and full repairs). After reading this TLP article: >>>>>> >>>>>> >>>>>> https://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html >>>>>> >>>>>> the repair script was replaced with cassandra-reaper (v1.4.0), which >>>>>> was run with its default configs. Reaper was fine but only obscured the >>>>>> ongoing issues (it did not resolve them) and complicated the debugging >>>>>> process and so was then removed. The current repair schedule is as >>>>>> described above under Repair Type. >>>>>> >>>>>> Attempts at Resolution: >>>>>> >>>>>> (1) nodetool scrub was attempted on the offending keyspaces/tables to >>>>>> no effect. >>>>>> >>>>>> (2) sstablescrub has not been attempted due to the current design of >>>>>> the Docker image that runs Cassandra in each Kubernetes pod - i.e. there >>>>>> is >>>>>> no way to stop the server to run this utility without killing the only >>>>>> pid >>>>>> running in the container. >>>>>> >>>>>> Related Error: >>>>>> >>>>>> Not sure if this is related, though sometimes, when either: >>>>>> >>>>>> (a) Running nodetool snapshot, or >>>>>> (b) Rolling a pod that runs a Cassandra node, which calls nodetool >>>>>> drain prior shutdown, >>>>>> >>>>>> the following error is thrown: >>>>>> >>>>>> -- StackTrace -- >>>>>> java.lang.RuntimeException: Last written key >>>>>> DecoratedKey(10df3ba1-6eb2-4c8e-bddd-c0c7af586bda, >>>>>> 10df3ba16eb24c8ebdddc0c7af586bda) >= current key >>>>>> DecoratedKey(00000000-0000-0000-0000-000000000000, >>>>>> 17343121887f480c9ba87c0e32206b74) writing into >>>>>> /cassandra_data/data/platform_management/device_by_tenant_v2-e91529202ccf11e7ab96d5693708c583/.device_by_tenant_tags_idx/mb-45-big-Data.db >>>>>> at >>>>>> org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:114) >>>>>> at >>>>>> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:153) >>>>>> at >>>>>> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48) >>>>>> at >>>>>> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:441) >>>>>> at >>>>>> org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:477) >>>>>> at >>>>>> org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:363) >>>>>> at >>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>>>>> at java.lang.Thread.run(Thread.java:748) >>>>>> >>>>>> Here are some details on the environment and configs in the event >>>>>> that something is relevant. >>>>>> >>>>>> Environment: Kubernetes >>>>>> Environment Config: Stateful set of 3 replicas >>>>>> Storage: Persistent Volumes >>>>>> Storage Class: SSD >>>>>> Node OS: Container-Optimized OS >>>>>> Container OS: Ubuntu 16.04.3 LTS >>>>>> >>>>>> Version: Cassandra 3.7 >>>>>> Data Centers: 1 >>>>>> Racks: 3 (one per zone) >>>>>> Nodes: 3 >>>>>> Tokens: 4 >>>>>> Replication Factor: 3 >>>>>> Replication Strategy: NetworkTopologyStrategy (all keyspaces) >>>>>> Compaction Strategy: STCS (all tables) >>>>>> Read/Write Requirements: Blend of both >>>>>> Data Load: <1GB per node >>>>>> gc_grace_seconds: default (10 days - all tables) >>>>>> >>>>>> Memory: 4Gi per node >>>>>> CPU: 3.5 per node (3500m) >>>>>> >>>>>> Java Version: 1.8.0_144 >>>>>> >>>>>> Heap Settings: >>>>>> >>>>>> -XX:+UnlockExperimentalVMOptions >>>>>> -XX:+UseCGroupMemoryLimitForHeap >>>>>> -XX:MaxRAMFraction=2 >>>>>> >>>>>> GC Settings: (CMS) >>>>>> >>>>>> -XX:+UseParNewGC >>>>>> -XX:+UseConcMarkSweepGC >>>>>> -XX:+CMSParallelRemarkEnabled >>>>>> -XX:SurvivorRatio=8 >>>>>> -XX:MaxTenuringThreshold=1 >>>>>> -XX:CMSInitiatingOccupancyFraction=75 >>>>>> -XX:+UseCMSInitiatingOccupancyOnly >>>>>> -XX:CMSWaitDuration=30000 >>>>>> -XX:+CMSParallelInitialMarkEnabled >>>>>> -XX:+CMSEdenChunksRecordAlways >>>>>> >>>>>> Any ideas are much appreciated. >>>>>> >>>>> -- Ben Mills DevOps Engineer