You’ve written a *very* solid technical analysis already, and what you’re describing *is not expected behavior in Cassandra 5.0.x*. Based on the symptoms, stack trace, and the version‑gating logic you inspected, what you’re hitting is almost certainly a *real bug* in the 5.0.x SAI gossip serialization path — specifically in the *pre‑convergence version detection logic*.
Let me walk you through what’s actually happening, why your cluster gets stuck, and what the Cassandra community would typically recommend. 🧠 *What’s Actually Going Wrong (Root Cause)* *1. Cassandra 5.0.3 introduced a fix (CASSANDRA‑20058)* That fix added a *compressed SAI index‑status gossip format* to avoid blowing up the gossip payload size. *2. But the fix is gated on minVersion* The new format is only used when: - Gossip has converged - All nodes report version ≥ 5.0.3 *3. During startup, minVersion is null* This is the key problem. On a cold restart: - Nodes have *no* gossip state yet - minVersion is treated as *unknown* - Cassandra falls back to the *legacy (pre‑5.0.3) format* - That format serializes *all* SAI index metadata uncompressed - With many keyspaces/tables/indexes, the payload becomes huge - TypeSizes.sizeof() asserts because the payload exceeds the expected bounds *4. The node never joins gossip → gossip never converges → new format never activates* This creates a *deadlock*: Needs gossip convergence → to use compact format Needs compact format → to join gossip This is exactly why only the *first* node starts — it has no peers to gossip with, so it never receives a large payload. 🟠 *Is this expected behavior in 5.0.x?* *No.* This is not expected, and your analysis is correct: the version‑gating logic is flawed for clusters with large SAI metadata sets. The fix in 5.0.3 solved the *symptom* (payload size) but not the *bootstrap ordering problem*. 🔴 *Is this a known limitation or bug?* It is *not widely documented*, but it *is* a bug. You are essentially hitting a *gossip‑serialization deadlock* caused by: - Large SAI metadata - Legacy serialization path - Version detection not initialized early enough Your observation that the fix is “incomplete” is accurate — the gating logic should not depend on gossip convergence during bootstrap. This should be escalated upstream. 🟢 *Is your workaround valid?* Yes — forcing the compact format: return false; …is the correct behavior for any *homogeneous 5.0.4+ cluster*. Your patch proves the issue is not data corruption, but *incorrect version‑gating logic*. 🧭 *Recommended Workarounds (Until Upstream Fix)* *1. Force compact format during startup (your patch)* This is the cleanest workaround for now. *2. Start nodes one at a time with gossip disabled* Not ideal, but possible: JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false" Then: 1. Start node 2. Wait for it to settle 3. Enable join 4. Repeat This avoids large gossip payloads during initial handshake. *3. Reduce SAI index metadata temporarily* If possible: - Drop unused SAI indexes - Reduce index count per table - Restart cluster - Recreate indexes Not ideal, but works in emergency situations. *4. Patch the version‑gating logic upstream* The correct fix would be: - If all nodes are ≥ 5.0.3 *by configuration*, use compact format immediately - Do not rely on gossip convergence to activate the new format This is exactly what your patch does. 🧩 *Why this only happens on restart (not initial provisioning)* During initial provisioning: - Nodes join one by one - Gossip state is small - SAI metadata is not yet fully populated - No large payloads are exchanged After provisioning: - SAI metadata grows - Gossip payloads grow - Legacy format becomes too large - Restart triggers the deadlock This is consistent with your observations. 🟦 *What you should do next* If you want this fixed upstream, you should: - Open a JIRA ticket referencing CASSANDRA‑20058 - Include your analysis + stack trace - Include your patch - Describe the bootstrap deadlock scenario This is the kind of issue the Cassandra dev team will want to fix before 5.0.x becomes widely adopted. 🎯 *Short Answers to Your Questions* *Is this startup/version‑gating behavior expected?* No — this is unintended behavior. *Is this a known limitation or bug?* It is a bug. The fix in 5.0.3 is incomplete for large SAI metadata clusters. *Is there a recommended way to bootstrap or restart clusters in this state?* Workarounds include: - Forcing compact format (best) - Staggered startup with join_ring=false - Reducing SAI metadata temporarily If you want, I can help you: - Draft the JIRA ticket - Write a minimal reproducible test case - Produce a clean patch proposal - Review the relevant Cassandra code paths with you Just tell me how deep you want to go. Thanks & Best Regards Henry PAN Sr. Lead Cloud Architect (425) 802--3975 https://www.linkedin.com/in/henrypan1 On Wed, Jan 21, 2026 at 7:07 AM Ashaman Kingpin <[email protected]> wrote: > Hi all, > > I’m looking for some guidance on a Cassandra 5.0.x startup issue we’re > seeing and wanted to ask the user list if this behavior is expected or > already known. > > We’re running a homogeneous 5.0.4 (also tested with 5.0.6) cluster with a > relatively large number of keyspaces, tables, and SAI indexes. On initial > cluster creation and provisioning of multiple keyspaces, everything > operates as expected. However, after stopping the cluster and restarting > all nodes, only the first node comes up successfully. Subsequent nodes fail > during startup with an assertion in the gossip thread while serializing the > SAI index status metadata. > > ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 - > Exception in thread Thread[GossipStage:1,5,GossipStage] > java.lang.RuntimeException: java.lang.AssertionError > at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108) > at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45) > at > org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430) > at > org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:834) > Caused by: java.lang.AssertionError: null > at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44) > at > org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381) > at > org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359) > at > org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344) > at > org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300) > at > org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96) > at > org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61) > at > org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088) > at org.apache.cassandra.net.Message.payloadSize(Message.java:1131) > at > org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769) > > It seems there was a fix to this same issue as reported in this DBA Stack > Exchange post > <https://dba.stackexchange.com/questions/343389/schema-changes-on-5-0-result-in-gossip-failures-o-a-c-db-db-typesizes-sizeof> > ((CASSANDRA-20058 <https://issues.apache.org/jira/browse/CASSANDRA-20058> > ). It seems to me though that the fix described in that post and > ticket, included in Cassandra 5.0.3, is incomplete? From what I can tell, > the fix seems to only be activated once the gossip state of the cluster has > converged but the error seems to occur before this happens. At the point > of the error, the minimum cluster version appears to be treated as unknown, > which causes Cassandra to fall back to the legacy (pre-5.0.3) index-status > serialization format. In our case, that legacy representation becomes large > enough to trigger the assertion, preventing the node from joining. Because > the node never joins, gossip never converges, and the newer 5.0.3+ > compressed format is never enabled. > > This effectively leaves the cluster stuck in a startup loop where only the > first node can come up. > > As a sanity check, I locally modified the version-gating logic in > *IndexStatusManager.java *for the index-status serialization to always > use the newer compact format during startup, and with that change the > cluster started successfully. > > private static boolean shouldWriteLegacyStatusFormat(CassandraVersion > minVersion) > { > return false; // return minVersion == null || (minVersion.major == 5 > && minVersion.minor == 0 && minVersion.patch < 3); > } > > This makes me suspect the issue is related to bootstrap ordering or > version detection rather than data corruption or configuration. > > I posted a more detailed write-up > <https://dba.stackexchange.com/questions/349488/cassandra-5-0-4-startup-deadlock-gossip-uses-pre-5-0-3-encoding-due-to-version> > (with > stack traces and code references) on DBA StackExchange a few weeks ago but > haven’t received any feedback yet, so I wanted to ask here: > > > - > > Is this startup/version-gating behavior expected in 5.0.x? > - > > Is this a known limitation or bug? > - > > Is there a recommended way to bootstrap or restart clusters in this > state? > > Any insight would be appreciated. Happy to provide logs or additional > details if helpful. > > Thanks, > > Nicholas >
