Re: Cassandra 4.0.6 token mismatch issue in production environment

Jeff Jirsa Mon, 23 Oct 2023 15:29:55 -0700

Not aware of any that survive node restart, though in the past, there were
races around starting an expansion while one node was partitioned/down (and
missing the initial gossip / UP). A heap dump could have told us a bit more
conclusively, but it's hard to guess for now.




On Mon, Oct 23, 2023 at 3:22 PM Jaydeep Chovatia <chovatia.jayd...@gmail.com>
wrote:

> The issue was persisting on a few nodes despite no changes to the
> topology. Even node restarting did not help. Only after we evacuated those
> nodes, the issue got resolved.
>
> Do you think of a possible situation under which this could happen?
>
> Jaydeep
>
> On Sat, Oct 21, 2023 at 10:25 AM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> Thanks, Jeff!
>> I will keep this thread updated on our findings.
>>
>> Jaydeep
>>
>> On Sat, Oct 21, 2023 at 9:37 AM Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> That code path was added to protect against invalid gossip states
>>>
>>> For this logger to be issued, the coordinator receiving the query must
>>> identify a set of replicas holding the data to serve the read, and one of
>>> the selected replicas must disagree that it’s a replica based on its view
>>> of the token ring
>>>
>>> This probably means that at least one node in your cluster has an
>>> invalid view of the ring - if you issue a “nodetool ring” from every host
>>> and compare them, you’ll probably notice one or more is wrong
>>>
>>> It’s also possible this happens for a few seconds during adding / moving
>>> / removing hosts
>>>
>>> If you weren’t changing the topology of the cluster, it’s  likely the
>>> case that bouncing the cluster fixes it
>>>
>>> (Im unsure of the defaults and not able to look it up, but cassandra can
>>> log or log and drop the read - you probably want to drop the read log,
>>> which is the right solution so it doesn’t accidentally return a missing /
>>> empty result set as a valid query result, instead it’ll force it to read
>>> from other replicas or time out)
>>>
>>>
>>>
>>>
>>>
>>> On Oct 20, 2023, at 10:57 PM, Jaydeep Chovatia <
>>> chovatia.jayd...@gmail.com> wrote:
>>>
>>> 
>>>
>>> Hi,
>>>
>>> I am using Cassandra 4.0.6 in production, and receiving the following 
>>> error. This indicates that Cassandra nodes have mismatch in token-owership.
>>>
>>> Has anyone seen this issue before?
>>>
>>> Received a read request from /XX.XX.XXX.XXX:YYYYY for a range that is not 
>>> owned by the current replica Read(keyspace.table columns=*/[c1] rowFilter= 
>>> limits=LIMIT 100 key=7BE78B90-AD66-406B-AA05-6A062F72F542:0 
>>> filter=slice(slices=ALL, reversed=false), nowInSec=1697751757).
>>>
>>> Jaydeep
>>>
>>>

Re: Cassandra 4.0.6 token mismatch issue in production environment

Reply via email to