You’ve written a *very* solid technical analysis already, and what you’re
describing *is not expected behavior in Cassandra 5.0.x*. Based on the
symptoms, stack trace, and the version‑gating logic you inspected, what
you’re hitting is almost certainly a *real bug* in the 5.0.x SAI gossip
serialization path — specifically in the *pre‑convergence version detection
logic*.

Let me walk you through what’s actually happening, why your cluster gets
stuck, and what the Cassandra community would typically recommend.

🧠 *What’s Actually Going Wrong (Root Cause)*

*1. Cassandra 5.0.3 introduced a fix (CASSANDRA‑20058)*

That fix added a *compressed SAI index‑status gossip format* to avoid
blowing up the gossip payload size.

*2. But the fix is gated on minVersion*

The new format is only used when:

   - Gossip has converged
   - All nodes report version ≥ 5.0.3

*3. During startup, minVersion is null*

This is the key problem.

On a cold restart:

   - Nodes have *no* gossip state yet
   - minVersion is treated as *unknown*
   - Cassandra falls back to the *legacy (pre‑5.0.3) format*
   - That format serializes *all* SAI index metadata uncompressed
   - With many keyspaces/tables/indexes, the payload becomes huge
   - TypeSizes.sizeof() asserts because the payload exceeds the expected
   bounds

*4. The node never joins gossip → gossip never converges → new format never
activates*

This creates a *deadlock*:

Needs gossip convergence → to use compact format Needs compact format → to
join gossip

This is exactly why only the *first* node starts — it has no peers to
gossip with, so it never receives a large payload.

🟠 *Is this expected behavior in 5.0.x?*

*No.*
This is not expected, and your analysis is correct: the version‑gating
logic is flawed for clusters with large SAI metadata sets.

The fix in 5.0.3 solved the *symptom* (payload size) but not the *bootstrap
ordering problem*.

🔴 *Is this a known limitation or bug?*

It is *not widely documented*, but it *is* a bug.

You are essentially hitting a *gossip‑serialization deadlock* caused by:

   - Large SAI metadata
   - Legacy serialization path
   - Version detection not initialized early enough

Your observation that the fix is “incomplete” is accurate — the gating
logic should not depend on gossip convergence during bootstrap.

This should be escalated upstream.

🟢 *Is your workaround valid?*

Yes — forcing the compact format:

return false;

…is the correct behavior for any *homogeneous 5.0.4+ cluster*.

Your patch proves the issue is not data corruption, but *incorrect
version‑gating logic*.

🧭 *Recommended Workarounds (Until Upstream Fix)*

*1. Force compact format during startup (your patch)*

This is the cleanest workaround for now.

*2. Start nodes one at a time with gossip disabled*

Not ideal, but possible:

JVM_OPTS="$JVM_OPTS -Dcassandra.join_ring=false"

Then:

   1. Start node
   2. Wait for it to settle
   3. Enable join
   4. Repeat

This avoids large gossip payloads during initial handshake.

*3. Reduce SAI index metadata temporarily*

If possible:

   - Drop unused SAI indexes
   - Reduce index count per table
   - Restart cluster
   - Recreate indexes

Not ideal, but works in emergency situations.

*4. Patch the version‑gating logic upstream*

The correct fix would be:

   - If all nodes are ≥ 5.0.3 *by configuration*, use compact format
   immediately
   - Do not rely on gossip convergence to activate the new format

This is exactly what your patch does.

🧩 *Why this only happens on restart (not initial provisioning)*

During initial provisioning:

   - Nodes join one by one
   - Gossip state is small
   - SAI metadata is not yet fully populated
   - No large payloads are exchanged

After provisioning:

   - SAI metadata grows
   - Gossip payloads grow
   - Legacy format becomes too large
   - Restart triggers the deadlock

This is consistent with your observations.

🟦 *What you should do next*

If you want this fixed upstream, you should:

   - Open a JIRA ticket referencing CASSANDRA‑20058
   - Include your analysis + stack trace
   - Include your patch
   - Describe the bootstrap deadlock scenario

This is the kind of issue the Cassandra dev team will want to fix before
5.0.x becomes widely adopted.

🎯 *Short Answers to Your Questions*

*Is this startup/version‑gating behavior expected?*

No — this is unintended behavior.

*Is this a known limitation or bug?*

It is a bug. The fix in 5.0.3 is incomplete for large SAI metadata clusters.

*Is there a recommended way to bootstrap or restart clusters in this state?*

Workarounds include:

   - Forcing compact format (best)
   - Staggered startup with join_ring=false
   - Reducing SAI metadata temporarily

If you want, I can help you:

   - Draft the JIRA ticket
   - Write a minimal reproducible test case
   - Produce a clean patch proposal
   - Review the relevant Cassandra code paths with you

Just tell me how deep you want to go.

Thanks & Best Regards

Henry PAN
Sr. Lead Cloud Architect
(425) 802--3975
https://www.linkedin.com/in/henrypan1



On Wed, Jan 21, 2026 at 7:07 AM Ashaman Kingpin <[email protected]>
wrote:

> Hi all,
>
> I’m looking for some guidance on a Cassandra 5.0.x startup issue we’re
> seeing and wanted to ask the user list if this behavior is expected or
> already known.
>
> We’re running a homogeneous 5.0.4 (also tested with 5.0.6) cluster with a
> relatively large number of keyspaces, tables, and SAI indexes. On initial
> cluster creation and provisioning of multiple keyspaces, everything
> operates as expected. However, after stopping the cluster and restarting
> all nodes, only the first node comes up successfully. Subsequent nodes fail
> during startup with an assertion in the gossip thread while serializing the
> SAI index status metadata.
>
> ERROR [GossipStage:1] 2025-12-22 17:20:10,365 JVMStabilityInspector.java:70 - 
> Exception in thread Thread[GossipStage:1,5,GossipStage]
> java.lang.RuntimeException: java.lang.AssertionError
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:108)
>         at org.apache.cassandra.net.InboundSink.accept(InboundSink.java:45)
>         at 
> org.apache.cassandra.net.InboundMessageHandler$ProcessMessage.run(InboundMessageHandler.java:430)
>         at 
> org.apache.cassandra.concurrent.ExecutionFailure$1.run(ExecutionFailure.java:133)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>         at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.AssertionError: null
>         at org.apache.cassandra.db.TypeSizes.sizeof(TypeSizes.java:44)
>         at 
> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:381)
>         at 
> org.apache.cassandra.gms.VersionedValue$VersionedValueSerializer.serializedSize(VersionedValue.java:359)
>         at 
> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:344)
>         at 
> org.apache.cassandra.gms.EndpointStateSerializer.serializedSize(EndpointState.java:300)
>         at 
> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:96)
>         at 
> org.apache.cassandra.gms.GossipDigestAckSerializer.serializedSize(GossipDigestAck.java:61)
>         at 
> org.apache.cassandra.net.Message$Serializer.payloadSize(Message.java:1088)
>         at org.apache.cassandra.net.Message.payloadSize(Message.java:1131)
>         at 
> org.apache.cassandra.net.Message$Serializer.serializedSize(Message.java:769)
>
> It seems there was a fix to this same issue as reported in this DBA Stack
> Exchange post
> <https://dba.stackexchange.com/questions/343389/schema-changes-on-5-0-result-in-gossip-failures-o-a-c-db-db-typesizes-sizeof>
>  ((CASSANDRA-20058 <https://issues.apache.org/jira/browse/CASSANDRA-20058>
> ).   It seems to me though that the fix described in that post and
> ticket, included in Cassandra 5.0.3, is incomplete?  From what I can tell,
> the fix seems to only be activated once the gossip state of the cluster has
> converged but the error seems to occur before this happens.  At the point
> of the error, the minimum cluster version appears to be treated as unknown,
> which causes Cassandra to fall back to the legacy (pre-5.0.3) index-status
> serialization format. In our case, that legacy representation becomes large
> enough to trigger the assertion, preventing the node from joining. Because
> the node never joins, gossip never converges, and the newer 5.0.3+
> compressed format is never enabled.
>
> This effectively leaves the cluster stuck in a startup loop where only the
> first node can come up.
>
> As a sanity check, I locally modified the version-gating logic in
> *IndexStatusManager.java *for the index-status serialization to always
> use the newer compact format during startup, and with that change the
> cluster started successfully.
>
> private static boolean shouldWriteLegacyStatusFormat(CassandraVersion 
> minVersion)
>     {
>         return false; // return minVersion == null || (minVersion.major == 5 
> && minVersion.minor == 0 && minVersion.patch < 3);
>     }
>
> This makes me suspect the issue is related to bootstrap ordering or
> version detection rather than data corruption or configuration.
>
> I posted a more detailed write-up
> <https://dba.stackexchange.com/questions/349488/cassandra-5-0-4-startup-deadlock-gossip-uses-pre-5-0-3-encoding-due-to-version>
>  (with
> stack traces and code references) on DBA StackExchange a few weeks ago but
> haven’t received any feedback yet, so I wanted to ask here:
>
>
>    -
>
>    Is this startup/version-gating behavior expected in 5.0.x?
>    -
>
>    Is this a known limitation or bug?
>    -
>
>    Is there a recommended way to bootstrap or restart clusters in this
>    state?
>
> Any insight would be appreciated. Happy to provide logs or additional
> details if helpful.
>
> Thanks,
>
> Nicholas
>

Reply via email to