Hello,

We have run into a very interesting issue and maybe some of you have 
encountered it or just have an idea where to look.

We are working towards adding new dcs into our cluster, here's the current 
topology:
DC1 - 18 nodes
DC2 - 18 nodes
DC3 - 18 nodes
DC4 - 18 nodes
DC5 - 18 nodes

Recently we introduced a new DC6 (60 nodes) into our cluster. The joining and 
rebuilding of DC6 went smoothly, clients are using it without issue. This is 
how it looked after joining DC6:
DC1 - 18 nodes
DC2 - 18 nodes
DC3 - 18 nodes
DC4 - 18 nodes
DC5 - 18 nodes
DC6 - 60 nodes

Next we wanted to add another DC7 (also 60 nodes) making it a total of 210 
nodes in the cluster, and while joining new nodes went smoothly, once we 
changed the replication of user defined keyspaces to include DC7, no clients 
were able to connect to Cassandra (regardless of which DC is being addressed). 
They would throw an exception that I have provided at the end of the email.

Cassandra version 3.11.4.
C# driver version 3.12.0. Also tested with 3.14.0. We use dc round robin policy 
and update ring metadata for connecting clients.
Amount of vnodes per node: 256

The stack trace starts with an exception 'The source argument contains 
duplicate keys.'. Maybe you know what kind of data is in this dictionary? What 
data can be duplicated here?

Clients are unable to connect until the moment we remove DC7 from replication. 
Once replication is adjusted to exclude DC7, clients can connect normally.

Cassandra.NoHostAvailableException: All hosts tried for query failed (tried 
<<IPaddress>>:9042: ArgumentException 'The source argument contains duplicate 
keys.')2020/04/29 10:19:27.51410636
at Cassandra.Connections.ControlConnection.<Connect>d__39.MoveNext()2020/04/29 
10:19:27.51410636
--- End of stack trace from previous location where exception was thrown 
---2020/04/29 10:19:27.51410636
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29 
10:19:27.51410636
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
 task)2020/04/29 10:19:27.51410636
Cassandra.Connections.ControlConnection.<InitAsync>d__36.MoveNext()2020/04/29 
10:19:27.51410636
End of stack trace from previous location where exception was thrown 
---2020/04/29 10:19:27.51410636
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29 
10:19:27.51410636
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
 task)2020/04/29 10:19:27.51410636
Cassandra.Tasks.TaskHelper.<WaitToCompleteAsync>d__10.MoveNext()2020/04/29 
10:19:27.51410636
End of stack trace from previous location where exception was thrown 
---2020/04/29 10:19:27.51410636
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29 
10:19:27.51410636
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
 task)2020/04/29 10:19:27.51410636
Cassandra.Cluster.<Cassandra-SessionManagement-IInternalCluster-OnInitializeAsync>d__50.MoveNext()2020/04/29
 10:19:27.51410636
End of stack trace from previous location where exception was thrown 
---2020/04/29 10:19:27.51410636
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29 
10:19:27.51410636
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
 task)2020/04/29 10:19:27.51410636
Cassandra.ClusterLifecycleManager.<InitializeAsync>d__3.MoveNext()2020/04/29 
10:19:27.51410636
End of stack trace from previous location where exception was thrown 
---2020/04/29 10:19:27.51410636
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29 
10:19:27.51410636
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
 task)2020/04/29 10:19:27.51410636
Cassandra.Cluster.<Cassandra-SessionManagement-IInternalCluster-ConnectAsync>d__47`1.MoveNext()2020/04/29
 10:19:27.51410636
End of stack trace from previous location where exception was thrown 
---2020/04/29 10:19:27.51410636
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29 
10:19:27.51410636
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task
 task)2020/04/29 10:19:27.51410636
Cassandra.Cluster.<ConnectAsync>d__46.MoveNext()2020/04/29 10:19:27.51410636
End of stack trace from previous location where exception was thrown 
---2020/04/29 10:19:27.51410636
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()2020/04/29 
10:19:27.51410636
Cassandra.Tasks.TaskHelper.WaitToComplete(Task task, Int32 timeout)2020/04/29 
10:19:27.51410636
Cassandra.Cluster.Connect()2020/04/29 10:19:27.51410636

We would really appreciate your input, big thanks in advance.

Gediminas

Reply via email to