Hi Alain , We are adding 12 tables on weekly basis job , and dropping history table . Our job is looking for schema mismatch by running "SELECT peer, schema_version, tokens FROM peers" before it adds/drops each table . nodetool describecluster looks ok , only one schema version . Cluster Information: Name: [removed] Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch DynamicEndPointSnitch: enabled Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 30cb4963-109c-3077-8bdd-df9bfb313568: [10...output truncated]
We have shutdown recently (maint job) the cluster and started it again ,but the job is daily . I have tried to correlate job time to the table corruption timestamp but did not find any relation, but this direction may be relevant . Thanks, Roy On Fri, May 10, 2019 at 3:13 PM Alain RODRIGUEZ <arodr...@gmail.com> wrote: > Hello Roy, > > The name of the table makes me think that you might be doing automated > changes to the schema. I just dug this topic for someone else and schema > changes are way less consistent than standard Cassandra operations (see > https://issues.apache.org/jira/browse/CASSANDRA-10699). > >> sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I >> >> > Idea 1: Some of these queries might have failed for multiple reasons on a > node (down for too long, race conditions, ...), leaving the cluster in an > unstable state where there is a schema disagreement. In that case, you > could have troubles when adding a new node I have seen it happening. Could > you check/share with us the output of: 'nodetool describecluster'? > > Also did you tried recently to perform a rolling restart? This often helps > synchronising local schemas and 'could' fix the issue. Another option is > 'nodetool resetlocalschema' on node(s) out of sync. > > idea 2: If you identified that you have broken second indexes, maybe give > a try at running 'nodetool rebuild_index <keyspace> <table> <indexes...>' > on all nodes before adding the next node? > https://cassandra.apache.org/doc/latest/tools/nodetool/rebuild_index.html > > Hope this helps, > C*heers, > ----------------------- > Alain Rodriguez - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > > > Le jeu. 9 mai 2019 à 17:29, Jason Wee <peich...@gmail.com> a écrit : > >> maybe print out value into the logfile and that should lead to some >> clue where it might be the problem? >> >> On Tue, May 7, 2019 at 4:58 PM Paul Chandler <p...@redshots.com> wrote: >> > >> > Roy, We spent along time trying to fix it, but didn’t find a solution, >> it was a test cluster, so we ended up rebuilding the cluster, rather than >> spending anymore time trying to fix the corruption. We have worked out what >> had caused it, so were happy it wasn’t going to occur in production. Sorry >> that is not much help, but I am not even sure it is the same issue you have. >> > >> > Paul >> > >> > >> > >> > On 7 May 2019, at 07:14, Roy Burstein <burstein....@gmail.com> wrote: >> > >> > I can say that it happens now as well ,currently no node has been >> added/removed . >> > Corrupted sstables are usually the index files and in some machines the >> sstable even does not exist on the filesystem. >> > On one machine I was able to dump the sstable to dump file without any >> issue . Any idea how to tackle this issue ? >> > >> > >> > On Tue, May 7, 2019 at 12:32 AM Paul Chandler <p...@redshots.com> >> wrote: >> >> >> >> Roy, >> >> >> >> I have seen this exception before when a column had been dropped then >> re added with the same name but a different type. In particular we dropped >> a column and re created it as static, then had this exception from the old >> sstables created prior to the ddl change. >> >> >> >> Not sure if this applies in your case. >> >> >> >> Thanks >> >> >> >> Paul >> >> >> >> On 6 May 2019, at 21:52, Nitan Kainth <nitankai...@gmail.com> wrote: >> >> >> >> can Disk have bad sectors? fccheck or something similar can help. >> >> >> >> Long shot: repair or any other operation conflicting. Would leave that >> to others. >> >> >> >> On Mon, May 6, 2019 at 3:50 PM Roy Burstein <burstein....@gmail.com> >> wrote: >> >>> >> >>> It happens on the same column families and they have the same ddl (as >> already posted) . I did not check it after cleanup >> >>> . >> >>> >> >>> On Mon, May 6, 2019, 23:43 Nitan Kainth <nitankai...@gmail.com> >> wrote: >> >>>> >> >>>> This is strange, never saw this. does it happen to same column >> family? >> >>>> >> >>>> Does it happen after cleanup? >> >>>> >> >>>> On Mon, May 6, 2019 at 3:41 PM Roy Burstein <burstein....@gmail.com> >> wrote: >> >>>>> >> >>>>> Yes. >> >>>>> >> >>>>> On Mon, May 6, 2019, 23:23 Nitan Kainth <nitankai...@gmail.com> >> wrote: >> >>>>>> >> >>>>>> Roy, >> >>>>>> >> >>>>>> You mean all nodes show corruption when you add a node to cluster?? >> >>>>>> >> >>>>>> >> >>>>>> Regards, >> >>>>>> Nitan >> >>>>>> Cell: 510 449 9629 >> >>>>>> >> >>>>>> On May 6, 2019, at 2:48 PM, Roy Burstein <burstein....@gmail.com> >> wrote: >> >>>>>> >> >>>>>> It happened on all the servers in the cluster every time I have >> added node >> >>>>>> . >> >>>>>> This is new cluster nothing was upgraded here , we have a similar >> cluster >> >>>>>> running on C* 2.1.15 with no issues . >> >>>>>> We are aware to the scrub utility just it reproduce every time we >> added >> >>>>>> node to the cluster . >> >>>>>> >> >>>>>> We have many tables there >> >> >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >>