Thanks !

For subrange repairs I have seen two approaches. For our specific requirement, 
we want to do repairs on a small set of keyspaces.


1.       Thrift describe_local_ring(keyspace), parse and get token ranges for a 
given node, split token ranges for given keyspace + table using  
describe_splits_ex, and call nodetool repair subranges

a.       https://github.com/pauloricardomg/cassandra-list-subranges does it 
this way.

2.       Get tokens using nodetool info -T, split those, and call nodetool 
repair with subranges

a.       https://github.com/BrianGallew/cassandra_range_repair does it this way.

Can experts please help me understand the nuances between these APIs and which 
one is better / more efficient ? Since the first one is keyspace aware, I like 
that better since that lets us do repairs on specific keyspaces more 
concretely. I am leaning toward that atm.

Thanks !

From: Paulo Motta [mailto:pauloricard...@gmail.com]
Sent: Wednesday, September 28, 2016 5:16 AM
To: user@cassandra.apache.org
Subject: Re: Repairs at scale in Cassandra 2.1.13

There were a few streaming bugs fixed between 2.1.13 and 2.1.15 (see 
CHANGES.txt for more details), so I'd recommend you to upgrade to 2.1.15 in 
order to avoid having those.

2016-09-28 9:08 GMT-03:00 Alain RODRIGUEZ 
<arodr...@gmail.com<mailto:arodr...@gmail.com>>:
Hi Anubhav,

I’m considering doing subrange repairs 
(https://github.com/BrianGallew/cassandra_range_repair/blob/master/src/range_repair.py<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBrianGallew%2Fcassandra_range_repair%2Fblob%2Fmaster%2Fsrc%2Frange_repair.py&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=w53NMlnYdbYgoAnBUS95yMEeb%2Fg%2BNH09UgMJEFaw9dE%3D&reserved=0>)

I used this script a lot, and quite successfully.

An other working option that people are using is:

https://github.com/spotify/cassandra-reaper<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fspotify%2Fcassandra-reaper&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=oa8XYbIG4FtxERwioEDYw9B4tb1zHxjy5psYC6wutEs%3D&reserved=0>

Alexander, a coworker integrated an existing UI and made it compatible with 
incremental repairs:

Incremental repairs on Reaper: 
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-that-works<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fadejanovski%2Fcassandra-reaper%2Ftree%2Finc-repair-that-works&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=kv8zpJI8c8Ibj48mjqfrLHZjiaVDBYd79uC7MDpRWLw%3D&reserved=0>
UI integration with incremental repairs on Reaper: 
https://github.com/adejanovski/cassandra-reaper/tree/inc-repair-support-with-ui<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fadejanovski%2Fcassandra-reaper%2Ftree%2Finc-repair-support-with-ui&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=h%2BKZvdMRR9Oi3plUMOIiX5LfvQmPvXD0BHJeCZVw0YM%3D&reserved=0>

as I’ve heard from folks that incremental repairs simply don’t work even in 3.x 
(Yeah, that’s a strong statement but I heard that from multiple folks at the 
Summit).

Alexander also did a talk about repairs at the Summit (including incremental 
repairs) and someone from Netflix also did a good one as well, not mentioning 
incremental repairs but with some benchmarks and tips to run repairs. You might 
want to check one of those (or both):

https://www.youtube.com/playlist?list=PLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fplaylist%3Flist%3DPLm-EPIkBI3YoiA-02vufoEj4CgYvIQgIk&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=UshPKzYyUDIR8idi0JONqsZ0cghRq1f6wLdpxHIJ9oM%3D&reserved=0>

I believe they haven't been released by Datastax yet, they probably will 
sometime soon.

Repair is something all the large setups companies are struggling with, I mean, 
Spotify made the Reaper and Netflix a talk about repairs presenting the 
range_repair.py script and much more stuff. But I know there is some work going 
on to improve things.

Meanwhile, given the load per node (600 GB, it's big but not that huge) and the 
number of node (400 is quite a high number of nodes), I would say that the 
hardest part for you would be to handle the scheduling part to avoid harming 
the cluster and make sure all the nodes are repaired. I believe Reaper might be 
a better match in your case as it does that quite well from what I heard, I am 
not really sure.

C*heers,
-----------------------
Alain Rodriguez - @arodream - 
al...@thelastpickle.com<mailto:al...@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.thelastpickle.com&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=q9UnaVZgS0HekWDbwakpK3piOMdvEpQtiUuiDzly%2Bu0%3D&reserved=0>

2016-09-26 23:51 GMT+02:00 Anubhav Kale 
<anubhav.k...@microsoft.com<mailto:anubhav.k...@microsoft.com>>:
Hello,

We run Cassandra 2.1.13 (don’t have plans to upgrade yet). What is the best way 
to run repairs at scale (400 nodes, each holding ~600GB) that actually works ?

I’m considering doing subrange repairs 
(https://github.com/BrianGallew/cassandra_range_repair/blob/master/src/range_repair.py<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBrianGallew%2Fcassandra_range_repair%2Fblob%2Fmaster%2Fsrc%2Frange_repair.py&data=01%7C01%7CAnubhav.Kale%40microsoft.com%7C698bf80ea0aa4b86e85608d3e79938db%7C72f988bf86f141af91ab2d7cd011db47%7C1&sdata=w53NMlnYdbYgoAnBUS95yMEeb%2Fg%2BNH09UgMJEFaw9dE%3D&reserved=0>)
 as I’ve heard from folks that incremental repairs simply don’t work even in 
3.x (Yeah, that’s a strong statement but I heard that from multiple folks at 
the Summit).

Any guidance would be greatly appreciated !

Thanks,
Anubhav


Reply via email to