Re: Datafile Corruption

2019-08-08 Thread Philip Ó Condúin
Hi Jon, Good question, I'm not sure if we're using NVMe, I don't see /dev/nvme but we could still be using it. We using *Cisco UCS C220 M4 SFF* so I'm just going to check the spec. Our Kernal is the following, we're using REDHAT so I'm told we can't upgrade the version until the next major releas

Re: Datafile Corruption

2019-08-08 Thread Jon Haddad
Any chance you're using NVMe with an older Linux kernel? I've seen a *lot* filesystem errors from using older CentOS versions. You'll want to be using a version > 4.15. On Thu, Aug 8, 2019 at 9:31 AM Philip Ó Condúin wrote: > *@Jeff *- If it was hardware that would explain it all, but do you t

Re: Datafile Corruption

2019-08-08 Thread Philip Ó Condúin
*@Jeff *- If it was hardware that would explain it all, but do you think it's possible to have every server in the cluster with a hardware issue? The data is sensitive and the customer would lose their mind if I sent it off-site which is a pity cause I could really do with the help. The corruption

RE: Datafile Corruption

2019-08-08 Thread ZAIDI, ASAD A
Did you check if packets are NOT being dropped for network interfaces Cassandra instances are consuming (ifconfig –a) internode compression is set for all endpoint – may be network is playing any role here? is this corruption limited so certain keyspace/table | DCs or is that wide spread – the l

Re: Datafile Corruption

2019-08-08 Thread Jeff Jirsa
The corrupt block exception from the compressor in 2.1/2.2 is something I don’t recall ever being attributed to anything other than bad hardware, so that seems by far the most likely option. The corruption that the compressor is catching says the checksum written immediately after the compress

Re: Rebuilding a node without clients hitting it

2019-08-08 Thread Cyril Scetbon
Thanks Jeff, that’s the type of parameter I was looking for but I missed it when I first read it. We’ll ensure that dynamic snitch is enabled. — Cyril Scetbon > On Aug 5, 2019, at 11:23 PM, Jeff Jirsa wrote: > > You can make THAT less likely with some snitch trickery (setting the badness > for

Re: Datafile Corruption

2019-08-08 Thread Philip Ó Condúin
Hi All, Thank you so much for the replies. Currently, I have the following list that can potentially cause some sort of corruption in a Cassandra cluster. - Sudden Power cut - *We have had no power cuts in the datacenters* - Network Issues - *no network issues from what I can tell* -

Re: Repairs/compactions on tables with solr indexes

2019-08-08 Thread Dinesh Joshi
Hi Ayub, DSE is a DataStax product and this is the Apache Cassandra mailing list. Could you reach out to DataStax? Dinesh > On Aug 7, 2019, at 11:17 PM, Ayub M wrote: > > Hello, we are using DSE Search workload with Search and Cass running on same > nodes/jvm. > > 1. When repairs are run,