[ 
https://issues.apache.org/jira/browse/CASSANDRA-13841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16152455#comment-16152455
 ] 

Kurt Greaves commented on CASSANDRA-13841:
------------------------------------------

Patch for trunk here, however I'd like to get some opinions on fixing this in 
earlier versions as well. The patch for 3.11 is more or less the same and I 
don't see any reason why we couldn't apply this there either. More importantly 
I'd like to hear peoples thoughts on pushing this back to earlier versions, as 
it's a pretty effective way for consistent rebuilds, and thus "avoids" the need 
for a repair post rebuild (which may not always be possible). I've already 
backported it to 3.0, 2.2, and 2.1 and haven't encountered any issues in 
testing. Regardless we will probably backport it anyway to assist in some 
not-so-nice migrations that come up, but I figure if we find it useful 
potentially someone else would as well.
[trunk|https://github.com/apache/cassandra/compare/trunk...kgreav:rebuild-sources]

I haven't really touched the existing tests for this as I wanted to get 
opinions first, but this shouldn't break existing tests, and new tests should 
be relatively straightforward to write.

> Allow specific sources during rebuild
> -------------------------------------
>
>                 Key: CASSANDRA-13841
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13841
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Kurt Greaves
>            Assignee: Kurt Greaves
>            Priority: Minor
>
> CASSANDRA-10406 introduced the ability to rebuild specific ranges, and 
> CASSANDRA-9875 extended that to allow specifying a set of hosts to stream 
> from. It's not incredibly clear why you would only want to stream a subset of 
> ranges, but a possible use case for this functionality is to rebuild a node 
> from targeted replicas. 
> When doing a DC migration, if you are using racks==RF while rebuilding you 
> can ensure you rebuild from each copy of a replica in the source datacenter 
> by specifying all the hosts from a single rack to rebuild a single copy from. 
> This can be repeated for each rack in the new datacenter to ensure you have 
> each copy of the replica from the source DC, and thus maintaining consistency 
> through rebuilds. 
> For example, with the following topology for DC A and B with an RF of A:3 and 
> B:3
> ||A||||                   ||B||
> ||Node||Rack||Node||Rack||
> |A1|rack1|         B1|rack1|
> |A2|rack2|         B2|rack2|
> |A3|rack3|         B3|rack3|
> The following set of actions will result in having exactly 1 copy of every 
> replica in A in B, and B will be _at least_ as consistent as A.
> {code:java}
> Rebuild B1 from only A1
> Rebuild B2 from only A2
> Rebuild B3 from only A3
> {code}
> Unfortunately using this functionality is non-trivial at the moment, as you 
> can only specify specific sources WITH the nodes set of tokens to rebuild 
> from. To perform the above with vnodes/a large cluster, you will have to 
> specify every token range in the -ts arg, which quickly gets 
> unwieldy/impossible if you have a large cluster.
> A solution to this is to simply filter on sources first, before processing 
> ranges.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to