Re: Corrupted sstables

Roy Burstein Tue, 14 May 2019 08:08:00 -0700

Hi Alain ,
We are adding 12 tables on weekly basis job  , and dropping history table
.
Our job is looking for schema mismatch by running "SELECT peer,
schema_version, tokens FROM peers"  before it adds/drops each table .
nodetool describecluster  looks ok  , only one schema version   .
Cluster Information:
Name:  [removed]
        Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
30cb4963-109c-3077-8bdd-df9bfb313568: [10...output truncated]


We have shutdown recently (maint job) the cluster and started it again ,but
the job is daily .
I have tried to correlate job time to the table corruption timestamp but
did not find any relation, but this direction may be relevant .

Thanks,
Roy

On Fri, May 10, 2019 at 3:13 PM Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Hello Roy,
>
> The name of the table makes me think that you might be doing automated
> changes to the schema. I just dug this topic for someone else and schema
> changes are way less consistent than standard Cassandra operations (see
> https://issues.apache.org/jira/browse/CASSANDRA-10699).
>
>> sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
>>
>>
> Idea 1: Some of these queries might have failed for multiple reasons on a
> node (down for too long, race conditions, ...), leaving the cluster in an
> unstable state where there is a schema disagreement. In that case, you
> could have troubles when adding a new node I have seen it happening. Could
> you check/share with us the output of: 'nodetool describecluster'?
>
> Also did you tried recently to perform a rolling restart? This often helps
> synchronising local schemas and 'could' fix the issue. Another option is
> 'nodetool resetlocalschema' on node(s) out of sync.
>
> idea 2: If you identified that you have broken second indexes, maybe give
> a try at running 'nodetool rebuild_index <keyspace> <table> <indexes...>'
> on all nodes before adding the next node?
> https://cassandra.apache.org/doc/latest/tools/nodetool/rebuild_index.html
>
> Hope this helps,
> C*heers,
> -----------------------
> Alain Rodriguez - al...@thelastpickle.com
> France / Spain
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> Le jeu. 9 mai 2019 à 17:29, Jason Wee <peich...@gmail.com> a écrit :
>
>> maybe print out value into the logfile and that should lead to some
>> clue where it might be the problem?
>>
>> On Tue, May 7, 2019 at 4:58 PM Paul Chandler <p...@redshots.com> wrote:
>> >
>> > Roy, We spent along time trying to fix it, but didn’t find a solution,
>> it was a test cluster, so we ended up rebuilding the cluster, rather than
>> spending anymore time trying to fix the corruption. We have worked out what
>> had caused it, so were happy it wasn’t going to occur in production. Sorry
>> that is not much help, but I am not even sure it is the same issue you have.
>> >
>> > Paul
>> >
>> >
>> >
>> > On 7 May 2019, at 07:14, Roy Burstein <burstein....@gmail.com> wrote:
>> >
>> > I can say that it happens now as well ,currently no node has been
>> added/removed .
>> > Corrupted sstables are usually the index files and in some machines the
>> sstable even does not exist on the filesystem.
>> > On one machine I was able to dump the sstable to dump file without any
>> issue  . Any idea how to tackle this issue ?
>> >
>> >
>> > On Tue, May 7, 2019 at 12:32 AM Paul Chandler <p...@redshots.com>
>> wrote:
>> >>
>> >> Roy,
>> >>
>> >> I have seen this exception before when a column had been dropped then
>> re added with the same name but a different type. In particular we dropped
>> a column and re created it as static, then had this exception from the old
>> sstables created prior to the ddl change.
>> >>
>> >> Not sure if this applies in your case.
>> >>
>> >> Thanks
>> >>
>> >> Paul
>> >>
>> >> On 6 May 2019, at 21:52, Nitan Kainth <nitankai...@gmail.com> wrote:
>> >>
>> >> can Disk have bad sectors? fccheck or something similar can help.
>> >>
>> >> Long shot: repair or any other operation conflicting. Would leave that
>> to others.
>> >>
>> >> On Mon, May 6, 2019 at 3:50 PM Roy Burstein <burstein....@gmail.com>
>> wrote:
>> >>>
>> >>> It happens on the same column families and they have the same ddl (as
>> already posted) . I did not check it after cleanup
>> >>> .
>> >>>
>> >>> On Mon, May 6, 2019, 23:43 Nitan Kainth <nitankai...@gmail.com>
>> wrote:
>> >>>>
>> >>>> This is strange, never saw this. does it happen to same column
>> family?
>> >>>>
>> >>>> Does it happen after cleanup?
>> >>>>
>> >>>> On Mon, May 6, 2019 at 3:41 PM Roy Burstein <burstein....@gmail.com>
>> wrote:
>> >>>>>
>> >>>>> Yes.
>> >>>>>
>> >>>>> On Mon, May 6, 2019, 23:23 Nitan Kainth <nitankai...@gmail.com>
>> wrote:
>> >>>>>>
>> >>>>>> Roy,
>> >>>>>>
>> >>>>>> You mean all nodes show corruption when you add a node to cluster??
>> >>>>>>
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>> Nitan
>> >>>>>> Cell: 510 449 9629
>> >>>>>>
>> >>>>>> On May 6, 2019, at 2:48 PM, Roy Burstein <burstein....@gmail.com>
>> wrote:
>> >>>>>>
>> >>>>>> It happened  on all the servers in the cluster every time I have
>> added node
>> >>>>>> .
>> >>>>>> This is new cluster nothing was upgraded here , we have a similar
>> cluster
>> >>>>>> running on C* 2.1.15 with no issues .
>> >>>>>> We are aware to the scrub utility just it reproduce every time we
>> added
>> >>>>>> node to the cluster .
>> >>>>>>
>> >>>>>> We have many tables there
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>
>>

Re: Corrupted sstables

Reply via email to