Hi Jeff

Thanks for a detailed answer.

We are using Cassandra 3.11.2, I believe it would fall in the new version
category and IIUC should be able to handle the issue with anti-entropy
repair by adjusting merkel tree depth ?
The cross dc read repair chance is disabled in our setup so we should be
good.

Again, in 12x3, if one replica goes down beyond the hint window, when it
> comes up it's getting 35 copies of data, which is going to overwhelm it
> when it streams and compacts.

Thanks for pointing this out. IIUC this would also increase the bootstrap
time when the node comes back up.
Will it also affect the time taken for a replaced node to report UN ? AFAIK
during node replacement a particular token range is streamed from only one
of the replicas, so the time taken for replacing a node should remain the
same ?

On Wed, Jul 14, 2021 at 8:00 PM Jeff Jirsa <jji...@gmail.com> wrote:

> Hi,
>
> So, there's two things where you'll see the impact of "lots of datacenters"
>
> On the query side, global quorum queries (and queries with cross-dc
> probabilistic read repair) may touch more DCs and be slower, and
> read-repairs during those queries get more expensive. Your geography
> matters a ton for latency, and your write consistency and network quality
> matters a ton for read repairs. During the read, the coordinator will track
> which replicas are mismatching, and build mutations to make them in sync -
> that buildup will accumulate more data if you're very out of sync.
>
> The other thing you should expect is different behavior during repairs.
> The anti-entropy repairs do pair-wise merkle trees. If you imagine 6, 8, 12
> datacenters of 3 copies each, you've got 18, 24, 36 copies of data, each of
> those holds a merkle tree. The repair coordinator will have a lot more data
> in memory (adjusting the tree depth in newer versions, or using the offheap
> option in 4.0) starts removing the GC pressure on the coordinator in those
> types of topologies. In older versions, using subrange repair and lots of
> smaller ranges will avoid very deep trees and keep memory tolerable. ALSO,
> when you do have a mismatch, you're going to stream a LOT of data. Again,
> in 12x3, if one replica goes down beyond the hint window, when it comes up
> it's getting 35 copies of data, which is going to overwhelm it when it
> streams and compacts. CASSANDRA-3200 helps this in 4.0, and incremental
> repair helps this if you're running incremental repair (again, probably
> after CASSANDRA-9143 in 4.0), but the naive approach can lead to really bad
> surprises.
>
>
>
> On Wed, Jul 14, 2021 at 7:17 AM Shaurya Gupta <shaurya.n...@gmail.com>
> wrote:
>
>> Hi, Multiple DCs are required to maintain lower latencies for requests
>> across the globe. I agree that it's a lot of redundant copies of data.
>>
>> On Wed, Jul 14, 2021, 7:00 PM Jim Shaw <jxys...@gmail.com> wrote:
>>
>>> Shaurya:
>>> What's the purpose to partise too many data centers ?
>>> RF=3,  is within a center,  you have 3 copies of data.
>>> If you have 3 DCs, means 9 copies of data.
>>> Think about space wasted, Network bandwidth wasted for number of copies.
>>> BTW, Ours just 2 DCs for regional DR.
>>>
>>> Thanks,
>>> Jim
>>>
>>> On Wed, Jul 14, 2021 at 2:27 AM Shaurya Gupta <shaurya.n...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> Does someone have any suggestions on the maximum number of Data Centers
>>>> which NetworkTopology strategy can have for a keyspace. Not only
>>>> technically but considering performance as well.
>>>> In each Data Center RF is 3.
>>>>
>>>> Thanks!
>>>> --
>>>> Shaurya Gupta
>>>>
>>>>
>>>>

-- 
Shaurya Gupta

Reply via email to