Re: Repair Issues

Ben Mills Sat, 26 Oct 2019 10:03:08 -0700

Thanks Ghiyasi.

On Sat, Oct 26, 2019 at 9:17 AM Hossein Ghiyasi Mehr <ghiyasim...@gmail.com>
wrote:


> If the problem exist still, and all nodes are up, reboot them one by one.
> Then try to repair one node. After that repair other nodes one by one.
>
> On Fri, Oct 25, 2019 at 12:56 AM Ben Mills <b...@bitbrew.com> wrote:
>
>>
>> Thanks Jon!
>>
>> This is very helpful - allow me to follow-up and ask a question.
>>
>> (1) Yes, incremental repairs will never be used (unless it becomes viable
>> in Cassandra 4.x someday).
>> (2) I hear you on the JVM - will look into that.
>> (3) Been looking at Cassandra version 3.11.x though was unaware that 3.7
>> is considered non-viable for production use.
>>
>> For (4) - Question/Request:
>>
>> Note that with:
>>
>> -XX:MaxRAMFraction=2
>>
>> the actual amount of memory allocated for heap space is effectively 2Gi
>> (i.e. half of the 4Gi allocated on the machine type). We can definitely
>> increase memory (for heap and nonheap), though can you expand a bit on your
>> heap comment to help my understanding (as this is such a small cluster with
>> such a small amount of data at rest)?
>>
>> Thanks again.
>>
>> On Thu, Oct 24, 2019 at 5:11 PM Jon Haddad <j...@jonhaddad.com> wrote:
>>
>>> There's some major warning signs for me with your environment.  4GB heap
>>> is too low, and Cassandra 3.7 isn't something I would put into production.
>>>
>>> Your surface area for problems is massive right now.  Things I'd do:
>>>
>>> 1. Never use incremental repair.  Seems like you've already stopped
>>> doing them, but it's worth mentioning.
>>> 2. Upgrade to the latest JVM, that version's way out of date.
>>> 3. Upgrade to Cassandra 3.11.latest (we're voting on 3.11.5 right now).
>>> 4. Increase memory to 8GB minimum, preferably 12.
>>>
>>> I usually don't like making a bunch of changes without knowing the root
>>> cause of a problem, but in your case there's so many potential problems I
>>> don't think it's worth digging into, especially since the problem might be
>>> one of the 500 or so bugs that were fixed since this release.
>>>
>>> Once you've done those things it'll be easier to narrow down the problem.
>>>
>>> Jon
>>>
>>>
>>> On Thu, Oct 24, 2019 at 4:59 PM Ben Mills <b...@bitbrew.com> wrote:
>>>
>>>> Hi Sergio,
>>>>
>>>> No, not at this time.
>>>>
>>>> It was in use with this cluster previously, and while there were no
>>>> reaper-specific issues, it was removed to help simplify investigation of
>>>> the underlying repair issues I've described.
>>>>
>>>> Thanks.
>>>>
>>>> On Thu, Oct 24, 2019 at 4:21 PM Sergio <lapostadiser...@gmail.com>
>>>> wrote:
>>>>
>>>>> Are you using Cassandra reaper?
>>>>>
>>>>> On Thu, Oct 24, 2019, 12:31 PM Ben Mills <b...@bitbrew.com> wrote:
>>>>>
>>>>>> Greetings,
>>>>>>
>>>>>> Inherited a small Cassandra cluster with some repair issues and need
>>>>>> some advice on recommended next steps. Apologies in advance for a long
>>>>>> email.
>>>>>>
>>>>>> Issue:
>>>>>>
>>>>>> Intermittent repair failures on two non-system keyspaces.
>>>>>>
>>>>>> - platform_users
>>>>>> - platform_management
>>>>>>
>>>>>> Repair Type:
>>>>>>
>>>>>> Full, parallel repairs are run on each of the three nodes every five
>>>>>> days.
>>>>>>
>>>>>> Repair command output for a typical failure:
>>>>>>
>>>>>> [2019-10-18 00:22:09,109] Starting repair command #46, repairing
>>>>>> keyspace platform_users with repair options (parallelism: parallel, 
>>>>>> primary
>>>>>> range: false, incremental: false, job threads: 1, ColumnFamilies: [],
>>>>>> dataCenters: [], hosts: [], # of ranges: 12)
>>>>>> [2019-10-18 00:22:09,242] Repair session
>>>>>> 5282be70-f13d-11e9-9b4e-7f6db768ba9a for range
>>>>>> [(-1890954128429545684,2847510199483651721],
>>>>>> (8249813014782655320,-8746483007209345011],
>>>>>> (4299912178579297893,6811748355903297393],
>>>>>> (-8746483007209345011,-8628999431140554276],
>>>>>> (-5865769407232506956,-4746990901966533744],
>>>>>> (-4470950459111056725,-1890954128429545684],
>>>>>> (4001531392883953257,4299912178579297893],
>>>>>> (6811748355903297393,6878104809564599690],
>>>>>> (6878104809564599690,8249813014782655320],
>>>>>> (-4746990901966533744,-4470950459111056725],
>>>>>> (-8628999431140554276,-5865769407232506956],
>>>>>> (2847510199483651721,4001531392883953257]] failed with error [repair
>>>>>> #5282be70-f13d-11e9-9b4e-7f6db768ba9a on platform_users/access_tokens_v2,
>>>>>> [(-1890954128429545684,2847510199483651721],
>>>>>> (8249813014782655320,-8746483007209345011],
>>>>>> (4299912178579297893,6811748355903297393],
>>>>>> (-8746483007209345011,-8628999431140554276],
>>>>>> (-5865769407232506956,-4746990901966533744],
>>>>>> (-4470950459111056725,-1890954128429545684],
>>>>>> (4001531392883953257,4299912178579297893],
>>>>>> (6811748355903297393,6878104809564599690],
>>>>>> (6878104809564599690,8249813014782655320],
>>>>>> (-4746990901966533744,-4470950459111056725],
>>>>>> (-8628999431140554276,-5865769407232506956],
>>>>>> (2847510199483651721,4001531392883953257]]] Validation failed in 
>>>>>> /10.x.x.x
>>>>>> (progress: 26%)
>>>>>> [2019-10-18 00:22:09,246] Some repair failed
>>>>>> [2019-10-18 00:22:09,248] Repair command #46 finished in 0 seconds
>>>>>>
>>>>>> Additional Notes:
>>>>>>
>>>>>> Repairs encounter above failures more often than not. Sometimes on
>>>>>> one node only, though occasionally on two. Sometimes just one of the two
>>>>>> keyspaces, sometimes both. Apparently the previous repair schedule
>>>>>> for this cluster included incremental repairs (script alternated between
>>>>>> incremental and full repairs). After reading this TLP article:
>>>>>>
>>>>>>
>>>>>> https://thelastpickle.com/blog/2017/12/14/should-you-use-incremental-repair.html
>>>>>>
>>>>>> the repair script was replaced with cassandra-reaper (v1.4.0), which
>>>>>> was run with its default configs. Reaper was fine but only obscured the
>>>>>> ongoing issues (it did not resolve them) and complicated the debugging
>>>>>> process and so was then removed. The current repair schedule is as
>>>>>> described above under Repair Type.
>>>>>>
>>>>>> Attempts at Resolution:
>>>>>>
>>>>>> (1) nodetool scrub was attempted on the offending keyspaces/tables to
>>>>>> no effect.
>>>>>>
>>>>>> (2) sstablescrub has not been attempted due to the current design of
>>>>>> the Docker image that runs Cassandra in each Kubernetes pod - i.e. there 
>>>>>> is
>>>>>> no way to stop the server to run this utility without killing the only 
>>>>>> pid
>>>>>> running in the container.
>>>>>>
>>>>>> Related Error:
>>>>>>
>>>>>> Not sure if this is related, though sometimes, when either:
>>>>>>
>>>>>> (a) Running nodetool snapshot, or
>>>>>> (b) Rolling a pod that runs a Cassandra node, which calls nodetool
>>>>>> drain prior shutdown,
>>>>>>
>>>>>> the following error is thrown:
>>>>>>
>>>>>> -- StackTrace --
>>>>>> java.lang.RuntimeException: Last written key
>>>>>> DecoratedKey(10df3ba1-6eb2-4c8e-bddd-c0c7af586bda,
>>>>>> 10df3ba16eb24c8ebdddc0c7af586bda) >= current key
>>>>>> DecoratedKey(00000000-0000-0000-0000-000000000000,
>>>>>> 17343121887f480c9ba87c0e32206b74) writing into
>>>>>> /cassandra_data/data/platform_management/device_by_tenant_v2-e91529202ccf11e7ab96d5693708c583/.device_by_tenant_tags_idx/mb-45-big-Data.db
>>>>>>             at
>>>>>> org.apache.cassandra.io.sstable.format.big.BigTableWriter.beforeAppend(BigTableWriter.java:114)
>>>>>>             at
>>>>>> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:153)
>>>>>>             at
>>>>>> org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter.append(SimpleSSTableMultiWriter.java:48)
>>>>>>             at
>>>>>> org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:441)
>>>>>>             at
>>>>>> org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:477)
>>>>>>             at
>>>>>> org.apache.cassandra.db.Memtable$FlushRunnable.call(Memtable.java:363)
>>>>>>             at
>>>>>> java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>>>             at
>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>             at
>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>             at java.lang.Thread.run(Thread.java:748)
>>>>>>
>>>>>> Here are some details on the environment and configs in the event
>>>>>> that something is relevant.
>>>>>>
>>>>>> Environment: Kubernetes
>>>>>> Environment Config: Stateful set of 3 replicas
>>>>>> Storage: Persistent Volumes
>>>>>> Storage Class: SSD
>>>>>> Node OS: Container-Optimized OS
>>>>>> Container OS: Ubuntu 16.04.3 LTS
>>>>>>
>>>>>> Version: Cassandra 3.7
>>>>>> Data Centers: 1
>>>>>> Racks: 3 (one per zone)
>>>>>> Nodes: 3
>>>>>> Tokens: 4
>>>>>> Replication Factor: 3
>>>>>> Replication Strategy: NetworkTopologyStrategy (all keyspaces)
>>>>>> Compaction Strategy: STCS (all tables)
>>>>>> Read/Write Requirements: Blend of both
>>>>>> Data Load: <1GB per node
>>>>>> gc_grace_seconds: default (10 days - all tables)
>>>>>>
>>>>>> Memory: 4Gi per node
>>>>>> CPU: 3.5 per node (3500m)
>>>>>>
>>>>>> Java Version: 1.8.0_144
>>>>>>
>>>>>> Heap Settings:
>>>>>>
>>>>>> -XX:+UnlockExperimentalVMOptions
>>>>>> -XX:+UseCGroupMemoryLimitForHeap
>>>>>> -XX:MaxRAMFraction=2
>>>>>>
>>>>>> GC Settings: (CMS)
>>>>>>
>>>>>> -XX:+UseParNewGC
>>>>>> -XX:+UseConcMarkSweepGC
>>>>>> -XX:+CMSParallelRemarkEnabled
>>>>>> -XX:SurvivorRatio=8
>>>>>> -XX:MaxTenuringThreshold=1
>>>>>> -XX:CMSInitiatingOccupancyFraction=75
>>>>>> -XX:+UseCMSInitiatingOccupancyOnly
>>>>>> -XX:CMSWaitDuration=30000
>>>>>> -XX:+CMSParallelInitialMarkEnabled
>>>>>> -XX:+CMSEdenChunksRecordAlways
>>>>>>
>>>>>> Any ideas are much appreciated.
>>>>>>
>>>>> --
Ben Mills
DevOps Engineer

Re: Repair Issues

Reply via email to