Sounds good. Thanks a lot for all your help! Jaydeep
On Mon, Oct 23, 2023 at 3:30 PM Jeff Jirsa <jji...@gmail.com> wrote: > Not aware of any that survive node restart, though in the past, there were > races around starting an expansion while one node was partitioned/down (and > missing the initial gossip / UP). A heap dump could have told us a bit more > conclusively, but it's hard to guess for now. > > > > On Mon, Oct 23, 2023 at 3:22 PM Jaydeep Chovatia < > chovatia.jayd...@gmail.com> wrote: > >> The issue was persisting on a few nodes despite no changes to the >> topology. Even node restarting did not help. Only after we evacuated those >> nodes, the issue got resolved. >> >> Do you think of a possible situation under which this could happen? >> >> Jaydeep >> >> On Sat, Oct 21, 2023 at 10:25 AM Jaydeep Chovatia < >> chovatia.jayd...@gmail.com> wrote: >> >>> Thanks, Jeff! >>> I will keep this thread updated on our findings. >>> >>> Jaydeep >>> >>> On Sat, Oct 21, 2023 at 9:37 AM Jeff Jirsa <jji...@gmail.com> wrote: >>> >>>> That code path was added to protect against invalid gossip states >>>> >>>> For this logger to be issued, the coordinator receiving the query must >>>> identify a set of replicas holding the data to serve the read, and one of >>>> the selected replicas must disagree that it’s a replica based on its view >>>> of the token ring >>>> >>>> This probably means that at least one node in your cluster has an >>>> invalid view of the ring - if you issue a “nodetool ring” from every host >>>> and compare them, you’ll probably notice one or more is wrong >>>> >>>> It’s also possible this happens for a few seconds during adding / >>>> moving / removing hosts >>>> >>>> If you weren’t changing the topology of the cluster, it’s likely the >>>> case that bouncing the cluster fixes it >>>> >>>> (Im unsure of the defaults and not able to look it up, but cassandra >>>> can log or log and drop the read - you probably want to drop the read log, >>>> which is the right solution so it doesn’t accidentally return a missing / >>>> empty result set as a valid query result, instead it’ll force it to read >>>> from other replicas or time out) >>>> >>>> >>>> >>>> >>>> >>>> On Oct 20, 2023, at 10:57 PM, Jaydeep Chovatia < >>>> chovatia.jayd...@gmail.com> wrote: >>>> >>>> >>>> >>>> Hi, >>>> >>>> I am using Cassandra 4.0.6 in production, and receiving the following >>>> error. This indicates that Cassandra nodes have mismatch in token-owership. >>>> >>>> Has anyone seen this issue before? >>>> >>>> Received a read request from /XX.XX.XXX.XXX:YYYYY for a range that is not >>>> owned by the current replica Read(keyspace.table columns=*/[c1] rowFilter= >>>> limits=LIMIT 100 key=7BE78B90-AD66-406B-AA05-6A062F72F542:0 >>>> filter=slice(slices=ALL, reversed=false), nowInSec=1697751757). >>>> >>>> Jaydeep >>>> >>>>