Hi Alexander, There is compatibility issue raised with spotify/cassandra-reaper for cassandra version 3.x. Is it comaptible with 3.6 in fork thelastpickle/cassandra-reaper ?
There are some suggestions mentioned by *brstgt* which we can try on our side. On Thu, Sep 29, 2016 at 5:42 PM, Atul Saroha <atul.sar...@snapdeal.com> wrote: > Thanks Alexander. > > Will look into all these. > > On Thu, Sep 29, 2016 at 4:39 PM, Alexander Dejanovski < > a...@thelastpickle.com> wrote: > >> Atul, >> >> since you're using 3.6, by default you're running incremental repair, >> which doesn't like concurrency very much. >> Validation errors are not occurring on a partition or partition range >> base, but if you're trying to run both anticompaction and validation >> compaction on the same SSTable. >> >> Like advised to Robert yesterday, and if you want to keep on running >> incremental repair, I'd suggest the following : >> >> - run nodetool tpstats on all nodes in search for running/pending >> repair sessions >> - If you have some, and to be sure you will avoid conflicts, roll >> restart your cluster (all nodes) >> - Then, run "nodetool repair" on one node. >> - When repair has finished on this node (track messages in the log >> and nodetool tpstats), check if other nodes are running anticompactions >> - If so, wait until they are over >> - If not, move on to the other node >> >> You should be able to run concurrent incremental compactions on different >> tables if you wish to speed up the complete repair of the cluster, but do >> not try to repair the same table/full keyspace from two nodes at the same >> time. >> >> If you do not want to keep on using incremental repair, and fallback to >> classic full repair, I think the only way in 3.6 to avoid anticompaction >> will be to use subrange repair (Paulo mentioned that in 3.x full repair >> also triggers anticompaction). >> >> You have two options here : cassandra_range_repair ( >> https://github.com/BrianGallew/cassandra_range_repair) and Spotify >> Reaper (https://github.com/spotify/cassandra-reaper) >> >> cassandra_range_repair might scream about subrange + incremental not >> being compatible (not sure here), but you can modify the repair_range() >> method by adding a --full switch to the command line used to run repair. >> >> We have a fork of Reaper that handles both full subrange repair and >> incremental repair here : https://github.com/thelastpi >> ckle/cassandra-reaper >> It comes with a tweaked version of the UI made by Stephan Podkowinski ( >> https://github.com/spodkowinski/cassandra-reaper-ui) - that eases >> interactions to schedule, run and track repair - which adds fields to run >> incremental repair (accessible via ...:8080/webui/ in your browser). >> >> Cheers, >> >> >> >> On Thu, Sep 29, 2016 at 12:33 PM Atul Saroha <atul.sar...@snapdeal.com> >> wrote: >> >>> Hi, >>> >>> We are not sure whether this issue is linked to that node or not. Our >>> application does frequent delete and insert. >>> >>> May be our approach is not correct for nodetool repair. Yes, we >>> generally fire repair on all boxes at same time. Till now, it was manual >>> with default configuration ( command: "nodetool repair"). >>> Yes, we saw validation error but that is linked to already running >>> repair of same partition on other box for same partition range. We saw >>> error validation failed with some ip as repair in already running for the >>> same SSTable. >>> Just few days back, we had 2 DCs with 3 nodes each and replication was >>> also 3. It means all data on each node. >>> >>> On Thu, Sep 29, 2016 at 2:49 PM, Alexander Dejanovski < >>> a...@thelastpickle.com> wrote: >>> >>>> Hi Atul, >>>> >>>> could you be more specific on how you are running repair ? What's the >>>> precise command line for that, does it run on several nodes at the same >>>> time, etc... >>>> What is your gc_grace_seconds ? >>>> Do you see errors in your logs that would be linked to repairs >>>> (Validation failure or failure to create a merkle tree)? >>>> >>>> You seem to mention a single node that went down but say the whole >>>> cluster seem to have zombie data. >>>> What is the connection you see between the node that went down and the >>>> fact that deleted data comes back to life ? >>>> What is your strategy for cyclic maintenance repair (schedule, command >>>> line or tool, etc...) ? >>>> >>>> Thanks, >>>> >>>> On Thu, Sep 29, 2016 at 10:40 AM Atul Saroha <atul.sar...@snapdeal.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> We have seen a weird behaviour in cassandra 3.6. >>>>> Once our node was went down more than 10 hrs. After that, we had ran >>>>> Nodetool repair multiple times. But tombstone are not getting sync >>>>> properly >>>>> over the cluster. On day- today basis, on expiry of every grace period, >>>>> deleted records start surfacing again in cassandra. >>>>> >>>>> It seems Nodetool repair in not syncing tomebstone across cluster. >>>>> FYI, we have 3 data centres now. >>>>> >>>>> Just want the help how to verify and debug this issue. Help will be >>>>> appreciated. >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Atul Saroha >>>>> >>>>> *Lead Software Engineer | CAMS* >>>>> >>>>> M: +91 8447784271 >>>>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18, >>>>> Udyog Vihar Phase IV,Gurgaon, Haryana, India >>>>> >>>>> -- >>>> ----------------- >>>> Alexander Dejanovski >>>> France >>>> @alexanderdeja >>>> >>>> Consultant >>>> Apache Cassandra Consulting >>>> http://www.thelastpickle.com >>>> >>> >>> >>> >>> -- >>> Regards, >>> Atul Saroha >>> >>> *Lead Software Engineer | CAMS* >>> >>> M: +91 8447784271 >>> Plot #362, ASF Center - Tower A, 1st Floor, Sec-18, >>> Udyog Vihar Phase IV,Gurgaon, Haryana, India >>> >>> -- >> ----------------- >> Alexander Dejanovski >> France >> @alexanderdeja >> >> Consultant >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> > > > > -- > Regards, > Atul Saroha > > *Lead Software Engineer | CAMS* > > M: +91 8447784271 > Plot #362, ASF Center - Tower A, 1st Floor, Sec-18, > Udyog Vihar Phase IV,Gurgaon, Haryana, India > > -- Regards, Atul Saroha *Lead Software Engineer | CAMS* M: +91 8447784271 Plot #362, ASF Center - Tower A, 1st Floor, Sec-18, Udyog Vihar Phase IV,Gurgaon, Haryana, India