Yeah that particular table is badly designed, I intend to fix it, when the roadmap allows us to do it :) What is the recommended maximum partition size ?
Thanks for all the information. On Thu, Oct 27, 2016, at 08:14 PM, Alexander Dejanovski wrote: > 3.3GB is already too high, and it's surely not good to have well > performing compactions. Still I know changing a data model is no > easy thing to do, but you should try to do something here. > Anticompaction is a special type of compaction and if an sstable is > being anticompacted, then any attempt to run validation compaction on > it will fail, telling you that you cannot have an sstable being part > of 2 repair sessions at the same time, so incremental repair must be > run one node at a time, waiting for anticompactions to end before > moving from one node to the other. > Be mindful of running incremental repair on a regular basis once you > started as you'll have two separate pools of sstables (repaired and > unrepaired) that won't get compacted together, which could be a > problem if you want tombstones to be purged efficiently. > Cheers, > > Le jeu. 27 oct. 2016 17:57, Vincent Rischmann <m...@vrischmann.me> > a écrit : >> __ >> Ok, I think we'll give incremental repairs a try on a limited number >> of CFs first and then if it goes well we'll progressively switch more >> CFs to incremental. >> >> I'm not sure I understand the problem with anticompaction and >> validation running concurrently. As far as I can tell, right now when >> a CF is repaired (either via reaper, or via nodetool) there may be >> compactions running at the same time. In fact, it happens very often. >> Is it a problem ? >> >> As far as big partitions, the biggest one we have is around 3.3Gb. >> Some less big partitions are around 500Mb and less. >> >> >> On Thu, Oct 27, 2016, at 05:37 PM, Alexander Dejanovski wrote: >>> Oh right, that's what they advise :) >>> I'd say that you should skip the full repair phase in the migration >>> procedure as that will obviously fail, and just mark all sstables as >>> repaired (skip 1, 2 and 6). >>> Anyway you can't do better, so take a leap of faith there. >>> >>> Intensity is already very low and 10000 segments is a whole lot for >>> 9 nodes, you should not need that many. >>> >>> You can definitely pick which CF you'll run incremental repair on, >>> and still run full repair on the rest. >>> If you pick our Reaper fork, watch out for schema changes that add >>> incremental repair fields, and I do not advise to run incremental >>> repair without it, otherwise you might have issues with >>> anticompaction and validation compactions running concurrently from >>> time to time. >>> >>> One last thing : can you check if you have particularly big >>> partitions in the CFs that fail to get repaired ? You can run >>> nodetool cfhistograms to check that. >>> >>> Cheers, >>> >>> >>> >>> On Thu, Oct 27, 2016 at 5:24 PM Vincent Rischmann <m...@vrischmann.me> >>> wrote: >>>> __ >>>> Thanks for the response. >>>> >>>> We do break up repairs between tables, we also tried our best to >>>> have no overlap between repair runs. Each repair has 10000 segments >>>> (purely arbitrary number, seemed to help at the time). Some runs >>>> have an intensity of 0.4, some have as low as 0.05. >>>> >>>> Still, sometimes one particular app (which does a lot of >>>> read/modify/write batches in quorum) gets slowed down to the point >>>> we have to stop the repair run. >>>> >>>> But more annoyingly, since 2 to 3 weeks as I said, it looks like >>>> runs don't progress after some time. Every time I restart reaper, >>>> it starts to repair correctly again, up until it gets stuck. I have >>>> no idea why that happens now, but it means I have to baby sit >>>> reaper, and it's becoming annoying. >>>> >>>> Thanks for the suggestion about incremental repairs. It would >>>> probably be a good thing but it's a little challenging to setup I >>>> think. Right now running a full repair of all keyspaces (via >>>> nodetool repair) is going to take a lot of time, probably like 5 >>>> days or more. We were never able to run one to completion. I'm not >>>> sure it's a good idea to disable autocompaction for that long. >>>> >>>> But maybe I'm wrong. Is it possible to use incremental repairs on >>>> some column family only ? >>>> >>>> >>>> On Thu, Oct 27, 2016, at 05:02 PM, Alexander Dejanovski wrote: >>>>> Hi Vincent, >>>>> >>>>> most people handle repair with : >>>>> - pain (by hand running nodetool commands) >>>>> - cassandra range repair : >>>>> https://github.com/BrianGallew/cassandra_range_repair >>>>> - Spotify Reaper >>>>> - and OpsCenter repair service for DSE users >>>>> >>>>> Reaper is a good option I think and you should stick to it. If it >>>>> cannot do the job here then no other tool will. >>>>> >>>>> You have several options from here : >>>>> * Try to break up your repair table by table and see which ones >>>>> actually get stuck >>>>> * Check your logs for any repair/streaming error >>>>> * Avoid repairing everything : >>>>> * you may have expendable tables >>>>> * you may have TTLed only tables with no deletes, accessed with >>>>> QUORUM CL only >>>>> * You can try to relieve repair pressure in Reaper by lowering >>>>> repair intensity (on the tables that get stuck) >>>>> * You can try adding steps to your repair process by putting a >>>>> higher segment count in reaper (on the tables that get stuck) >>>>> * And lastly, you can turn to incremental repair. As you're >>>>> familiar with Reaper already, you might want to take a look at >>>>> our Reaper fork that handles incremental repair : >>>>> https://github.com/thelastpickle/cassandra-reaper If you go >>>>> down that way, make sure you first mark all sstables as >>>>> repaired before you run your first incremental repair, >>>>> otherwise you'll end up in anticompaction hell (bad bad place) >>>>> : >>>>> >>>>> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsRepairNodesMigration.html >>>>> Even if people say that's not necessary anymore, it'll save you >>>>> from a very bad first experience with incremental repair. >>>>> Furthermore, make sure you run repair daily after your first >>>>> inc repair run, in order to work on small sized repairs. >>>>> >>>>> Cheers, >>>>> >>>>> >>>>> On Thu, Oct 27, 2016 at 4:27 PM Vincent Rischmann >>>>> <m...@vrischmann.me> wrote: >>>>>> __ >>>>>> Hi, >>>>>> >>>>>> we have two Cassandra 2.1.15 clusters at work and are having some >>>>>> trouble with repairs. >>>>>> >>>>>> Each cluster has 9 nodes, and the amount of data is not gigantic >>>>>> but some column families have 300+Gb of data. >>>>>> We tried to use `nodetool repair` for these tables but at the >>>>>> time we tested it, it made the whole cluster load too much and it >>>>>> impacted our production apps. >>>>>> >>>>>> Next we saw https://github.com/spotify/cassandra-reaper , tried >>>>>> it and had some success until recently. Since 2 to 3 weeks it >>>>>> never completes a repair run, deadlocking itself somehow. >>>>>> >>>>>> I know DSE includes a repair service but I'm wondering how do >>>>>> other Cassandra users manage repairs ? >>>>>> >>>>>> Vincent. >>>>> -- >>>>> ----------------- >>>>> Alexander Dejanovski >>>>> France >>>>> @alexanderdeja >>>>> >>>>> Consultant >>>>> Apache Cassandra Consulting >>>>> http://www.thelastpickle.com[1] >>>> >>> -- >>> ----------------- >>> Alexander Dejanovski >>> France >>> @alexanderdeja >>> >>> Consultant >>> Apache Cassandra Consulting >>> http://www.thelastpickle.com[2] >> > -- > ----------------- > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com[3] Links: 1. http://www.thelastpickle.com/ 2. http://www.thelastpickle.com/ 3. http://www.thelastpickle.com/